If you did not already know

Klout Score
Klout is a website and mobile app that uses social media analytics to rank its users according to online social influence via the ‘Klout Score’, which is a numerical value between 1 and 100. In determining the user score, Klout measures the size of a user’s social media network and correlates the content created to measure how other users interact with that content.
Klout Score: Measuring Influence Across Multiple Social Networks

Neural Graph Collaborative Filtering (NGCF)
Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems. Ranging from early matrix factorization to recently emerged deep learning based methods, existing efforts typically obtain a user’s (or an item’s) embedding by mapping from pre-existing features that describe the user (or the item), such as ID and attributes. We argue that an inherent drawback of such methods is that, the collaborative signal, which is latent in user-item interactions, is not encoded in the embedding process. As such, the resultant embeddings may not be sufficient to capture the collaborative filtering effect. In this work, we propose to integrate the user-item interactions — more specifically the bipartite graph structure — into the embedding process. We develop a new recommendation framework Neural Graph Collaborative Filtering (NGCF), which exploits the user-item graph structure by propagating embeddings on it. This leads to the expressive modeling of high-order connectivity in user-item graph, effectively injecting the collaborative signal into the embedding process in an explicit manner. We conduct extensive experiments on three public benchmarks, demonstrating significant improvements over several state-of-the-art models like HOP-Rec and Collaborative Memory Network. Further analysis verifies the importance of embedding propagation for learning better user and item representations, justifying the rationality and effectiveness of NGCF. Codes are available at https://…/neural_graph_collaborative_filtering.

Unremarkable AI
Clinical decision support tools (DST) promise improved healthcare outcomes by offering data-driven insights. While effective in lab settings, almost all DSTs have failed in practice. Empirical research diagnosed poor contextual fit as the cause. This paper describes the design and field evaluation of a radically new form of DST. It automatically generates slides for clinicians’ decision meetings with subtly embedded machine prognostics. This design took inspiration from the notion of ‘Unremarkable Computing’, that by augmenting the users’ routines technology/AI can have significant importance for the users yet remain unobtrusive. Our field evaluation suggests clinicians are more likely to encounter and embrace such a DST. Drawing on their responses, we discuss the importance and intricacies of finding the right level of unremarkableness in DST design, and share lessons learned in prototyping critical AI systems as a situated experience. …

Temporal Term Histogram (TTH)
Temporal text, i.e., time-stamped text data are found abundantly in a variety of data sources like newspapers, blogs and social media posts. While today’s data management systems provide facilities for searching full-text data, they do not provide any simple primitives for performing analytical operations with text. This paper proposes the temporal term histograms (TTH) as an intermediate primitive that can be used for analytical tasks. We propose an algebra, with operators and equivalence rules for TTH and present a reference implementation on a relational database system. …

If you did not already know

Model Independent Neural Decoder (MIND)
Standard decoding approaches rely on model-based channel estimation methods to compensate for varying channel effects, which degrade in performance whenever there is a model mismatch. Recently proposed Deep learning based neural decoders address this problem by leveraging a model-free approach via gradient-based training. However, they require large amounts of data to retrain to achieve the desired adaptivity, which becomes intractable in practical systems. In this paper, we propose a new decoder: Model Independent Neural Decoder (MIND), which builds on the top of neural decoders and equips them with a fast adaptation capability to varying channels. This feature is achieved via the methodology of Model-Agnostic Meta-Learning (MAML). Here the decoder: (a) learns a ‘good’ parameter initialization in the meta-training stage where the model is exposed to a set of archetypal channels and (b) updates the parameter with respect to the observed channel in the meta-testing phase using minimal adaptation data and pilot bits. Building on top of existing state-of-the-art neural Convolutional and Turbo decoders, MIND outperforms the static benchmarks by a large margin and shows minimal performance gap when compared to the neural (Convolutional or Turbo) decoders designed for that particular channel. In addition, MIND also shows strong learning capability for channels not exposed during the meta training phase. …

Hierarchical Reinforcement Learning Algorithm via Multi-Goals Abstraction (HRL-MG)
The recommender system is an important form of intelligent application, which assists users to alleviate from information redundancy. Among the metrics used to evaluate a recommender system, the metric of conversion has become more and more important. The majority of existing recommender systems perform poorly on the metric of conversion due to its extremely sparse feedback signal. To tackle this challenge, we propose a deep hierarchical reinforcement learning based recommendation framework, which consists of two components, i.e., high-level agent and low-level agent. The high-level agent catches long-term sparse conversion signals, and automatically sets abstract goals for low-level agent, while the low-level agent follows the abstract goals and interacts with real-time environment. To solve the inherent problem in hierarchical reinforcement learning, we propose a novel deep hierarchical reinforcement learning algorithm via multi-goals abstraction (HRL-MG). Our proposed algorithm contains three characteristics: 1) the high-level agent generates multiple goals to guide the low-level agent in different stages, which reduces the difficulty of approaching high-level goals; 2) different goals share the same state encoder parameters, which increases the update frequency of the high-level agent and thus accelerates the convergence of our proposed algorithm; 3) an appreciate benefit assignment function is designed to allocate rewards in each goal so as to coordinate different goals in a consistent direction. We evaluate our proposed algorithm based on a real-world e-commerce dataset and validate its effectiveness. …

Heterogeneous Simultaneous Multiscale Change Point Estimator (H-SMUCE)
We propose, a heterogeneous simultaneous multiscale change point estimator called ‘H-SMUCE’ for the detection of multiple change points of the signal in a heterogeneous Gaussian regression model. A piecewise constant function is estimated by minimizing the number of change points over the acceptance region of a multiscale test which locally adapts to changes in the variance. The multiscale test is a combination of local likelihood ratio tests which are properly calibrated by scale-dependent critical values to keep a global nominal level a, even for finite samples. We show that H-SMUCE controls the error of overestimation and underestimation of the number of change points. For this, new deviation bounds for F-type statistics are derived. Moreover, we obtain confidence sets for the whole signal. All results are non-asymptotic and uniform over a large class of heterogeneous change point models. H-SMUCE is fast to compute, achieves the optimal detection rate and estimates the number of change points at almost optimal accuracy for vanishing signals, while still being robust. We compare H-SMUCE with several state of the art methods in simulations and analyse current recordings of a transmembrane protein in the bacterial outer membrane with pronounced heterogeneity for its states. An R-package is available on line. …

Shogun
Shogun is and open-source machine learning library that offers a wide range of efficient and unified machine learning methods. …

If you did not already know

Route Constrained Optimization (RCO)
Distillation-based learning boosts the performance of the miniaturized neural network based on the hypothesis that the representation of a teacher model can be used as structured and relatively weak supervision, and thus would be easily learned by a miniaturized model. However, we find that the representation of a converged heavy model is still a strong constraint for training a small student model, which leads to a high lower bound of congruence loss. In this work, inspired by curriculum learning we consider the knowledge distillation from the perspective of curriculum learning by routing. Instead of supervising the student model with a converged teacher model, we supervised it with some anchor points selected from the route in parameter space that the teacher model passed by, as we called route constrained optimization (RCO). We experimentally demonstrate this simple operation greatly reduces the lower bound of congruence loss for knowledge distillation, hint and mimicking learning. On close-set classification tasks like CIFAR100 and ImageNet, RCO improves knowledge distillation by 2.14% and 1.5% respectively. For the sake of evaluating the generalization, we also test RCO on the open-set face recognition task MegaFace. …

Attentional Encoder Network (AEN)
Targeted sentiment classification aims at determining the sentimental tendency towards specific targets. Most of the previous approaches model context and target words using recurrent neural networks such as LSTM in conjunction with attention mechanisms. However, LSTM networks are difficult to parallelize because of their sequential nature. Moreover, since full backpropagation over the sequence requires large amounts of memory, essentially every implementation of backpropagation through time is the truncated version, which brings difficulty in remembering long-term patterns. To address these issues, this paper propose an Attentional Encoder Network (AEN) for targeted sentiment classification. Contrary to previous LSTM based works, AEN eschews complex recurrent neural networks and employs attention based encoders for the modeling between context and target, which can excavate the rich introspective and interactive semantic information from the word embeddings without considering the distance between words. This paper also raise the label unreliability issue and introduce label smoothing regularization term to the loss function for encouraging the model to be less confident with the training labels. Experimental results on three benchmark datasets demonstrate that our model achieves comparable or superior performances with a lightweight model size. …

Scattering Transforms (ScatterNets)
Scattering Transforms (or ScatterNets) introduced by Mallat are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a ‘DeScatterNet’. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) – the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-10 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets. …

Convolutional Cluster Pooling
We present a novel and hierarchical approach for supervised classification of signals spanning over a fixed graph, reflecting shared properties of the dataset. To this end, we introduce a Convolutional Cluster Pooling layer exploiting a multi-scale clustering in order to highlight, at different resolutions, locally connected regions on the input graph. Our proposal generalises well-established neural models such as Convolutional Neural Networks (CNNs) on irregular and complex domains, by means of the exploitation of the weight sharing property in a graph-oriented architecture. In this work, such property is based on the centrality of each vertex within its soft-assigned cluster. Extensive experiments on NTU RGB+D, CIFAR-10 and 20NEWS demonstrate the effectiveness of the proposed technique in capturing both local and global patterns in graph-structured data out of different domains. …

If you did not already know

TensorNetwork
TensorNetwork is an open source library for implementing tensor network algorithms. Tensor networks are sparse data structures originally designed for simulating quantum many-body physics, but are currently also applied in a number of other research areas, including machine learning. We demonstrate the use of the API with applications both physics and machine learning, with details appearing in companion papers.
Introducing TensorNetwork, an Open Source Library for Efficient Tensor Calculations

Fuzzy bag-of-Words (FBoW)
Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks. Furthermore, when averaged word vectors are trained supervised on large corpora of paraphrases, they achieve state-of-the-art results on standard STS benchmarks. Inspired by these insights, we push the limits of word embeddings even further. We propose a novel fuzzy bag-of-words (FBoW) representation for text that contains all the words in the vocabulary simultaneously but with different degrees of membership, which are derived from similarities between word vectors. We show that max-pooled word vectors are only a special case of fuzzy BoW and should be compared via fuzzy Jaccard index rather than cosine similarity. Finally, we propose DynaMax, a completely unsupervised and non-parametric similarity measure that dynamically extracts and max-pools good features depending on the sentence pair. This method is both efficient and easy to implement, yet outperforms current baselines on STS tasks by a large margin and is even competitive with supervised word vectors trained to directly optimise cosine similarity. …

Online Soft Mining (OSM)
Deep metric learning aims to learn a deep embedding that can capture the semantic similarity of data points. Given the availability of massive training samples, deep metric learning is known to suffer from slow convergence due to a large fraction of trivial samples. Therefore, most existing methods generally resort to sample mining strategies for selecting nontrivial samples to accelerate convergence and improve performance. In this work, we identify two critical limitations of the sample mining methods, and provide solutions for both of them. First, previous mining methods assign one binary score to each sample, i.e., dropping or keeping it, so they only selects a subset of relevant samples in a mini-batch. Therefore, we propose a novel sample mining method, called Online Soft Mining (OSM), which assigns one continuous score to each sample to make use of all samples in the mini-batch. OSM learns extended manifolds that preserve useful intraclass variances by focusing on more similar positives. Second, the existing methods are easily influenced by outliers as they are generally included in the mined subset. To address this, we introduce Class-Aware Attention (CAA) that assigns little attention to abnormal data samples. Furthermore, by combining OSM and CAA, we propose a novel weighted contrastive loss to learn discriminative embeddings. Extensive experiments on two fine-grained visual categorisation datasets and two video-based person re-identification benchmarks show that our method significantly outperforms the state-of-the-art. …

Seq2Seq
Multi-label text classification (MLTC) aims to assign multiple labels to each sample in the dataset. The labels usually have internal correlations. However, traditional methods tend to ignore the correlations between labels. In order to capture the correlations between labels, the sequence-to-sequence (Seq2Seq) model views the MLTC task as a sequence generation problem, which achieves excellent performance on this task. However, the Seq2Seq model is not suitable for the MLTC task in essence. The reason is that it requires humans to predefine the order of the output labels, while some of the output labels in the MLTC task are essentially an unordered set rather than an ordered sequence. This conflicts with the strict requirement of the Seq2Seq model for the label order. In this paper, we propose a novel sequence-to-set framework utilizing deep reinforcement learning, which not only captures the correlations between labels, but also reduces the dependence on the label order. Extensive experimental results show that our proposed method outperforms the competitive baselines by a large margin. …

If you did not already know

Edge Source Coding
Data compression is an efficient technique to save data storage and transmission costs. However, traditional data compression methods always ignore the impact of user preferences on the statistical distributions of symbols transmitted over the links. Notice that the development of big data technologies and popularization of smart devices enable analyses on user preferences based on data collected from personal handsets. This paper presents a user preference aware lossless data compression method, termed edge source coding, to compress data at the network edge. An optimization problem is formulated to minimize the expected number of bits needed to represent a requested content item in edge source coding. For edge source coding under discrete user preferences, DCA (difference of convex functions programming algorithm) based and k-means++ based algorithms are proposed to give codebook designs. For edge source coding under continuous user preferences, a sampling method is applied to give codebook designs. In addition, edge source coding is extended to the two-user case and codebooks are elaborately designed to utilize multicasting opportunities. Both theoretical analysis and simulations demonstrate the optimal codebook design should take into account user preferences. …

Tensor-Train RNN (TT-RNN)
We present Tensor-Train RNN (TT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics. Long-term forecasting in such systems is highly challenging, since there exist long-term temporal dependencies, higher-order correlations and sensitivity to error propagation. Our proposed tensor recurrent architecture addresses these issues by learning the nonlinear dynamics directly using higher order moments and high-order state transition functions. Furthermore, we decompose the higher-order structure using the tensor-train (TT) decomposition to reduce the number of parameters while preserving the model performance. We theoretically establish the approximation properties of Tensor-Train RNNs for general sequence inputs, and such guarantees are not available for usual RNNs. We also demonstrate significant long-term prediction improvements over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on real-world climate and traffic data. …

Deep Frame Interpolation
This work presents a supervised learning based approach to the computer vision problem of frame interpolation. The presented technique could also be used in the cartoon animations since drawing each individual frame consumes a noticeable amount of time. The most existing solutions to this problem use unsupervised methods and focus only on real life videos with already high frame rate. However, the experiments show that such methods do not work as well when the frame rate becomes low and object displacements between frames becomes large. This is due to the fact that interpolation of the large displacement motion requires knowledge of the motion structure thus the simple techniques such as frame averaging start to fail. In this work the deep convolutional neural network is used to solve the frame interpolation problem. In addition, it is shown that incorporating the prior information such as optical flow improves the interpolation quality significantly. …

Deep Distribution Regression
Due to their flexibility and predictive performance, machine-learning based regression methods have become an important tool for predictive modeling and forecasting. However, most methods focus on estimating the conditional mean or specific quantiles of the target quantity and do not provide the full conditional distribution, which contains uncertainty information that might be crucial for decision making. In this article, we provide a general solution by transforming a conditional distribution estimation problem into a constrained multi-class classification problem, in which tools such as deep neural networks. We propose a novel joint binary cross-entropy loss function to accomplish this goal. We demonstrate its performance in various simulation studies comparing to state-of-the-art competing methods. Additionally, our method shows improved accuracy in a probabilistic solar energy forecasting problem. …

If you did not already know

Typed Graph Network
Recently, the deep learning community has given growing attention to neural architectures engineered to learn problems in relational domains. Convolutional Neural Networks employ parameter sharing over the image domain, tying the weights of neural connections on a grid topology and thus enforcing the learning of a number of convolutional kernels. By instantiating trainable neural modules and assembling them in varied configurations (apart from grids), one can enforce parameter sharing over graphs, yielding models which can effectively be fed with relational data. In this context, vertices in a graph can be projected into a hyperdimensional real space and iteratively refined over many message-passing iterations in an end-to-end differentiable architecture. Architectures of this family have been referred to with several definitions in the literature, such as Graph Neural Networks, Message-passing Neural Networks, Relational Networks and Graph Networks. In this paper, we revisit the original Graph Neural Network model and show that it generalises many of the recent models, which in turn benefit from the insight of thinking about vertex \textbf{types}. To illustrate the generality of the original model, we present a Graph Neural Network formalisation, which partitions the vertices of a graph into a number of types. Each type represents an entity in the ontology of the problem one wants to learn. This allows – for instance – one to assign embeddings to edges, hyperedges, and any number of global attributes of the graph. As a companion to this paper we provide a Python/Tensorflow library to facilitate the development of such architectures, with which we instantiate the formalisation to reproduce a number of models proposed in the current literature. …

Fenchel Lifted Network
Despite the recent successes of deep neural networks, the corresponding training problem remains highly non-convex and difficult to optimize. Classes of models have been proposed that introduce greater structure to the objective function at the cost of lifting the dimension of the problem. However, these lifted methods sometimes perform poorly compared to traditional neural networks. In this paper, we introduce a new class of lifted models, Fenchel lifted networks, that enjoy the same benefits as previous lifted models, without suffering a degradation in performance over classical networks. Our model represents activation functions as equivalent biconvex constraints and uses Lagrange Multipliers to arrive at a rigorous lower bound of the traditional neural network training problem. This model is efficiently trained using block-coordinate descent and is parallelizable across data points and/or layers. We compare our model against standard fully connected and convolutional networks and show that we are able to match or beat their performance. …

Deep Complex U-Net
Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction. This is due to the difficulty of estimating the phase of clean speech. To improve speech enhancement performance, we tackle the phase estimation problem in three ways. First, we propose Deep Complex U-Net, an advanced U-Net structured model incorporating well-defined complex-valued building blocks to deal with complex-valued spectrograms. Second, we propose a polar coordinate-wise complex-valued masking method to reflect the distribution of complex ideal ratio masks. Third, we define a novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure. Our model was evaluated on a mixture of the Voice Bank corpus and DEMAND database, which has been widely used by many deep learning models for speech enhancement. Ablation experiments were conducted on the mixed dataset showing that all three proposed approaches are empirically valid. Experimental results show that the proposed method achieves state-of-the-art performance in all metrics, outperforming previous approaches by a large margin. …

Random Labeled Point Process (RLPP)
Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is to first impute the missing values, and then apply the clustering algorithm on the completed data. We consider missing values in the context of optimal clustering, which finds an optimal clustering operator with reference to an underlying random labeled point process (RLPP). We show how the missing-value problem fits neatly into the overall framework of optimal clustering by incorporating the missing value mechanism into the random labeled point process and then marginalizing out the missing-value process. In particular, we demonstrate the proposed framework for the Gaussian model with arbitrary covariance structures. Comprehensive experimental studies on both synthetic and real-world RNA-seq data show the superior performance of the proposed optimal clustering with missing values when compared to various clustering approaches. Optimal clustering with missing values obviates the need for imputation-based pre-processing of the data, while at the same time possessing smaller clustering errors. …

If you did not already know

KTBoost
In this article, we introduce a novel boosting algorithm called `KTBoost’, which combines kernel boosting and tree boosting. In each boosting iteration, the algorithm adds either a regression tree or reproducing kernel Hilbert space (RKHS) regression function to the ensemble of base learners. Intuitively, the idea is that discontinuous trees and continuous RKHS regression functions complement each other, and that this combination allows for better learning of both continuous and discontinuous functions as well as functions that exhibit parts with varying degrees of regularity. We empirically show that KTBoost outperforms both tree and kernel boosting in terms of predictive accuracy on a wide array of data sets. …

Swapout
We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFAR-100. Swapout samples from a rich set of architectures including dropout, stochastic depth and residual architectures as special cases. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting architectures and suggests a much richer set of architectures to be explored. We show that our formulation suggests an efficient training method and validate our conclusions on CIFAR-10 and CIFAR-100 matching state of the art accuracy. Remarkably, our 32 layer wider model performs similar to a 1001 layer ResNet model. …

Data Illustrator
Data Illustrator: Augmenting Vector Design Tools with Lazy Data Binding for Expressive Visualization Authoring. Building graphical user interfaces for visualization authoring is challenging as one must reconcile the tension between flexible graphics manipulation and procedural visualization generation based on a graphical grammar or declarative languages. To better support designers’ workflows and practices, we propose Data Illustrator, a novel visualization framework. In our approach, all visualizations are initially vector graphics; data binding is applied when necessary and only constrains interactive manipulation to that data bound property. The framework augments graphic design tools with new concepts and operators, and describes the structure and generation of a variety of visualizations. Based on the framework, we design and implement a visualization authoring system. The system extends interaction techniques in modern vector design tools for direct manipulation of visualization configurations and parameters. We demonstrate the expressive power of our approach through a variety of examples. A qualitative study shows that designers can use our framework to compose visualizations. …

First Story Detection (FSD)
Given a series of documents, first story is defined as the first document to discuss a specific event, which occurred at a particular time and place. First story detection (FSD) was firstly defined byAllan in 2002 in terms of topic detection and tracking.
“Novelty Detection”
“Topic Detection and Tracking”
http://…/DePaper.pdf
http://…/storm-first-story-detection

If you did not already know

Lanczos Latent Factor Recommender (LLFR)
The purpose if this master’s thesis is to study and develop a new algorithmic framework for Collaborative Filtering to produce recommendations in the top-N recommendation problem. Thus, we propose Lanczos Latent Factor Recommender (LLFR); a novel ‘big data friendly’ collaborative filtering algorithm for top-N recommendation. Using a computationally efficient Lanczos-based procedure, LLFR builds a low dimensional item similarity model, that can be readily exploited to produce personalized ranking vectors over the item space. A number of experiments on real datasets indicate that LLFR outperforms other state-of-the-art top-N recommendation methods from a computational as well as a qualitative perspective. Our experimental results also show that its relative performance gains, compared to competing methods, increase as the data get sparser, as in the Cold Start Problem. More specifically, this is true both when the sparsity is generalized – as in the New Community Problem, a very common problem faced by real recommender systems in their beginning stages, when there is not sufficient number of ratings for the collaborative filtering algorithms to uncover similarities between items or users – and in the very interesting case where the sparsity is localized in a small fraction of the dataset – as in the New Users Problem, where new users are introduced to the system, they have not rated many items and thus, the CF algorithm can not make reliable personalized recommendations yet. …

Set Aggregating Network (SAN)
We construct a general unified framework for learning representation of structured data, i.e. data which cannot be represented as the fixed-length vectors (e.g. sets, graphs, texts or images of varying sizes). The key factor is played by an intermediate network called SAN (Set Aggregating Network), which maps a structured object to a fixed length vector in a high dimensional latent space. Our main theoretical result shows that for sufficiently large dimension of the latent space, SAN is capable of learning a unique representation for every input example. Experiments demonstrate that replacing pooling operation by SAN in convolutional networks leads to better results in classifying images with different sizes. Moreover, its direct application to text and graph data allows to obtain results close to SOTA, by simpler networks with smaller number of parameters than competitive models. …

Dimensional Collapse
… When I stared at the plot, I ask myself, why not map the x-axis information of the points to the very first one according to the y-axis ‘connections’. When everything goes well and all done, all the grey points should be mapped along the red arrows to the first marks of the groups, and there should be only 4 marks leave on x-axis: a, b, d and g, instead of 9 marks in the first place. And the y-axis information, after contributing all the ‘connection rules’, can be put away now, since the left x-axis marks are exactly what I want: the final flags. It is why I like to call it ‘Dimensional Collapse’. … …

Morphed Learning
The concern of potential privacy violation has prevented efficient use of big data for improving deep learning based applications. In this paper, we propose Morphed Learning, a privacy-preserving technique for deep learning based on data morphing that, allows data owners to share their data without leaking sensitive privacy information. Morphed Learning allows the data owners to send securely morphed data and provides the server with an Augmented Convolutional layer to train the network on morphed data without performance loss. Morphed Learning has these three features: (1) Strong protection against reverse-engineering on the morphed data; (2) Acceptable computational and data transmission overhead with no correlation to the depth of the neural network; (3) No degradation of the neural network performance. Theoretical analyses on CIFAR-10 dataset and VGG-16 network show that our method is capable of providing 10^89 morphing possibilities with only 5% computational overhead and 10% transmission overhead under limited knowledge attack scenario. Further analyses also proved that our method can offer same resilience against full knowledge attack if more resources are provided. …

If you did not already know

Tree-Based Optimization (TBO)
Designing search algorithms for finding global optima is one of the most active research fields, recently. These algorithms consist of two main categories, i.e., classic mathematical and metaheuristic algorithms. This article proposes a meta-algorithm, Tree-Based Optimization (TBO), which uses other heuristic optimizers as its sub-algorithms in order to improve the performance of search. The proposed algorithm is based on mathematical tree subject and improves performance and speed of search by iteratively removing parts of the search space having low fitness, in order to minimize and purify the search space. The experimental results on several well-known benchmarks show the outperforming performance of TBO algorithm in finding the global solution. Experiments on high dimensional search spaces show significantly better performance when using the TBO algorithm. The proposed algorithm improves the search algorithms in both accuracy and speed aspects, especially for high dimensional searching such as in VLSI CAD tools for Integrated Circuit (IC) design. …

Subsampled Turbulence Removal Network
We present a deep-learning approach to restore a sequence of turbulence-distorted video frames from turbulent deformations and space-time varying blurs. Instead of requiring a massive training sample size in deep networks, we purpose a training strategy that is based on a new data augmentation method to model turbulence from a relatively small dataset. Then we introduce a subsampled method to enhance the restoration performance of the presented GAN model. The contributions of the paper is threefold: first, we introduce a simple but effective data augmentation algorithm to model the turbulence in real life for training in the deep network; Second, we firstly purpose the Wasserstein GAN combined with $\ell_1$ cost for successful restoration of turbulence-corrupted video sequence; Third, we combine the subsampling algorithm to filter out strongly corrupted frames to generate a video sequence with better quality. …

InstaNAS
Neural Architecture Search (NAS) aims at finding one ‘single’ architecture that achieves the best accuracy for a given task such as image recognition.In this paper, we study the instance-level variation,and demonstrate that instance-awareness is an important yet currently missing component of NAS. Based on this observation, we propose InstaNAS for searching toward instance-level architectures;the controller is trained to search and form a ‘distribution of architectures’ instead of a single final architecture. Then during the inference phase, the controller selects an architecture from the distribution, tailored for each unseen image to achieve both high accuracy and short latency. The experimental results show that InstaNAS reduces the inference latency without compromising classification accuracy. On average, InstaNAS achieves 48.9% latency reduction on CIFAR-10 and 40.2% latency reduction on CIFAR-100 with respect to MobileNetV2 architecture. …

Question-answering plays an important role in e-commerce as it allows potential customers to actively seek crucial information about products or services to help their purchase decision making. Inspired by the recent success of machine reading comprehension (MRC) on formal documents, this paper explores the potential of turning customer reviews into a large source of knowledge that can be exploited to answer user questions.~We call this problem Review Reading Comprehension (RRC). To the best of our knowledge, no existing work has been done on RRC. In this work, we first build an RRC dataset called ReviewRC based on a popular benchmark for aspect-based sentiment analysis. Since ReviewRC has limited training examples for RRC (and also for aspect-based sentiment analysis), we then explore a novel post-training approach on the popular language model BERT to enhance the performance of fine-tuning of BERT for RRC. To show the generality of the approach, the proposed post-training is also applied to some other review-based tasks such as aspect extraction and aspect sentiment classification in aspect-based sentiment analysis. Experimental results demonstrate that the proposed post-training is highly effective. The datasets and code are available at https://…/.

If you did not already know

Knowledge Authoring Logic Machine (KALM)
Knowledge representation and reasoning (KRR) is one of the key areas in artificial intelligence (AI) field. It is intended to represent the world knowledge in formal languages (e.g., Prolog, SPARQL) and then enhance the expert systems to perform querying and inference tasks. Currently, constructing large scale knowledge bases (KBs) with high quality is prohibited by the fact that the construction process requires many qualified knowledge engineers who not only understand the domain-specific knowledge but also have sufficient skills in knowledge representation. Unfortunately, qualified knowledge engineers are in short supply. Therefore, it would be very useful to build a tool that allows the user to construct and query the KB simply via text. Although there is a number of systems developed for knowledge extraction and question answering, they mainly fail in that these system don’t achieve high enough accuracy whereas KRR is highly sensitive to erroneous data. In this thesis proposal, I will present Knowledge Authoring Logic Machine (KALM), a rule-based system which allows the user to author knowledge and query the KB in text. The experimental results show that KALM achieved superior accuracy in knowledge authoring and question answering as compared to the state-of-the-art systems. …

Data-Driven Design
The aim of this research is to introduce a novel structural design process that allows architects and engineers to extend their typical design space horizon and thereby promoting the idea of creativity in structural design. The theoretical base of this work builds on the combination of structural form-finding and state-of-the-art machine learning algorithms. In the first step of the process, Combinatorial Equilibrium Modelling (CEM) is used to generate a large variety of spatial networks in equilibrium for given input parameters. In the second step, these networks are clustered and represented in a form-map through the implementation of a Self Organizing Map (SOM) algorithm. In the third step, the solution space is interpreted with the help of a Uniform Manifold Approximation and Projection algorithm (UMAP). This allows gaining important insights in the structure of the solution space. A specific case study is used to illustrate how the infinite equilibrium states of a given topology can be defined and represented by clusters. Furthermore, three classes, related to the non-linear interaction between the input parameters and the form space, are verified and a statement about the entire manifold of the solution space of the case study is made. To conclude, this work presents an innovative approach on how the manifold of a solution space can be grasped with a minimum amount of data and how to operate within the manifold in order to increase the diversity of solutions. …

MG-WFBP
Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks on computer clusters. With the increase of computational power, network communications have become one limiting factor on system scalability. In this paper, we observe that many deep neural networks have a large number of layers with only a small amount of data to be communicated. Based on the fact that merging some short communication tasks into a single one may reduce the overall communication time, we formulate an optimization problem to minimize the training iteration time. We develop an optimal solution named merged-gradient WFBP (MG-WFBP) and implement it in our open-source deep learning platform B-Caffe. Our experimental results on an 8-node GPU cluster with 10GbE interconnect and trace-based simulation results on a 64-node cluster both show that the MG-WFBP algorithm can achieve much better scaling efficiency than existing methods WFBP and SyncEASGD. …

Uncertainty Prediction Problem
Machine learning algorithms have been effectively applied into various real world tasks. However, it is difficult to provide high-quality machine learning solutions to accommodate an unknown distribution of input datasets; this difficulty is called the uncertainty prediction problems. …