The ability to look multiple times through a series of pose-adjusted glimpses is fundamental to human vision. This critical faculty allows us to understand highly complex visual scenes. Short term memory plays an integral role in aggregating the information obtained from these glimpses and informing our interpretation of the scene. Computational models have attempted to address glimpsing and visual attention but have failed to incorporate the notion of memory. We introduce a novel, biologically inspired visual working memory architecture that we term the Hebb-Rosenblatt memory. We subsequently introduce a fully differentiable Short Term Attentive Working Memory model (STAWM) which uses transformational attention to learn a memory over each image it sees. The state of our Hebb-Rosenblatt memory is embedded in STAWM as the weights space of a layer. By projecting different queries through this layer we can obtain goal-oriented latent representations for tasks including classification and visual reconstruction. Our model obtains highly competitive classification performance on MNIST and CIFAR-10. As demonstrated through the CelebA dataset, to perform reconstruction the model learns to make a sequence of updates to a canvas which constitute a parts-based representation. Classification with the self supervised representation obtained from MNIST is shown to be in line with the state of the art models (none of which use a visual attention mechanism). Finally, we show that STAWM can be trained under the dual constraints of classification and reconstruction to provide an interpretable visual sketchpad which helps open the ‘black-box’ of deep learning.
State of the art deep generative networks are capable of producing images with such incredible realism that they can be suspected of memorizing training images. It is why it is not uncommon to include visualizations of training set nearest neighbors, to suggest generated images are not simply memorized. We demonstrate this is not sufficient and motivates the need to study memorization/overfitting of deep generators with more scrutiny. This paper addresses this question by i) showing how simple losses are highly effective at reconstructing images for deep generators ii) analyzing the statistics of reconstruction errors when reconstructing training and validation images, which is the standard way to analyze overfitting in machine learning. Using this methodology, this paper shows that overfitting is not detectable in the pure GAN models proposed in the literature, in contrast with those using hybrid adversarial losses, which are amongst the most widely applied generative methods. The paper also shows that standard GAN evaluation metrics fail to capture memorization for some deep generators. Finally, the paper also shows how off-the-shelf GAN generators can be successfully applied to face inpainting and face super-resolution using the proposed reconstruction method, without hybrid adversarial losses.
Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is two-fold, firstly we present a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their effectiveness. We have grouped state-of-the-art research techniques into different categories based on the underlying assumptions and approach adopted. Within each category we outline the basic anomaly detection technique, along with its variants and present key assumptions, to differentiate between normal and anomalous behavior. For each category, we present we also present the advantages and limitations and discuss the computational complexity of the techniques in real application domains. Finally, we outline open issues in research and challenges faced while adopting these techniques.
We propose a principle for exploring context in machine learning models. Starting with a simple assumption that each observation may or may not depend on its context, a conditional probability distribution is decomposed into two parts: context-free and context-sensitive. Then by employing the log-linear word production model for relating random variables to their embedding space representation and making use of the convexity of natural exponential function, we show that the embedding of an observation can also be decomposed into a weighted sum of two vectors, representing its context-free and context-sensitive parts, respectively. This simple yet revolutionary treatment of context provides a unified view of almost all existing deep learning models, leading to revisions of these models able to achieve significant performance boost. Specifically, our upgraded version of a recent sentence embedding model not only outperforms the original one by a large margin, but also leads to a new, principled approach for compositing the embeddings of bag-of-words features, as well as a new architecture for modeling attention in deep neural networks. More surprisingly, our new principle provides a novel understanding of the gates and equations defined by the long short term memory model, which also leads to a new model that is able to converge significantly faster and achieve much lower prediction errors. Furthermore, our principle also inspires a new type of generic neural network layer that better resembles real biological neurons than the traditional linear mapping plus nonlinear activation based architecture. Its multi-layer extension provides a new principle for deep neural networks which subsumes residual network (ResNet) as its special case, and its extension to convolutional neutral network model accounts for irrelevant input (e.g., background in an image) in addition to filtering.
Single-subject health data are becoming increasingly available thanks to advances in self-tracking technology (e.g., mobile devices, apps, sensors, implants). Many users and health caregivers seek to use such observational time series data to recommend changing health practices in order to achieve desired health outcomes. However, there are few available causal inference approaches that are flexible enough to analyze such idiographic data. We develop a recently introduced framework, and implement a flexible random-forests g-formula approach to estimating a recurring individualized effect called the ‘average period treatment effect’. In the process, we argue that our approach essentially resembles that of a longitudinal study by partitioning a single time series into periods taking on binary treatment levels. We analyze six years of the author’s own self-tracked physical activity and weight data to demonstrate our approach, and compare the results of our analysis to one that does not properly account for confounding.
Studying networks to predict the emerging interactions is a common research problem for both fields of network science and machine learning. The problem of predicting future or missing relationships in networks is called link prediction. Machine learning studies have mostly approached to this problem as a clustering or a classification task. A few obstacles might be involved in approaching network datasets through machine learning models, including undefined euclidean distance, extracting proper features, unbalanced data classes due to the sparsity of real networks, or embedding graphs to a low dimensional vector space while preserving the structure to study networks. Extensive studies have examined these problems from different aspects and proposed methods some of which might work very well for a specific application but not as a global solution. In this survey, we review the general-purpose techniques at the heart of link prediction problem, which can be combined with domain-specific heuristic methods in practice. To the best of our knowledge, this survey is the first comprehensive study which considers all of the mentioned challenges about studying networks and approaching them through machine learning models. We provide a diverse study on feature extraction techniques for network datasets based on similarity metrics, maximum likelihood methods, probabilistic methods and graph representation learning. Our other contributions include proposing a taxonomy to classify link prediction methods and continue with introducing valuable network dataset collections to study the problem of link prediction. Our final contribution is discussing and proposing a few models, including a multi-stream feature learning model to exploit the benefits of local and quasi-local network extraction techniques combined with graph representation learning.
Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser & Sutskever, 2016). We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. In particular, neither the Transformer nor the Neural GPU requires access to an external memory to become Turing complete. Our study also reveals some minimal sets of elements needed to obtain these completeness results.
The rise of Internet-of-Things (IoT) has brought many new sensing mechanisms. Among these mechanisms, acoustic sensing attracts much attention in recent years. Acoustic sensing exploits acoustic sensors beyond their primary uses, namely recording and playing, to enable interesting applications and new user experience. In this paper, we present the first survey of recent advances in acoustic sensing using commodity hardware. We propose a general framework that categorizes main building blocks of acoustic sensing systems. This framework consists of three layers, i.e., the physical layer, processing layer, and application layer. We highlight different sensing approaches in the processing layer and fundamental design considerations in the physical layer. Many existing and potential applications including context-aware applications, human-computer interface, and aerial acoustic communications are presented in depth. Challenges and future research trends are also discussed.
This paper introduces the Seventh Dialog System Technology Challenges (DSTC), which use shared datasets to explore the problem of building dialog systems. Recently, end-to-end dialog modeling approaches have been applied to various dialog tasks. The seventh DSTC (DSTC7) focuses on developing technologies related to end-to-end dialog systems for (1) sentence selection, (2) sentence generation and (3) audio visual scene aware dialog. This paper summarizes the overall setup and results of DSTC7, including detailed descriptions of the different tracks and provided datasets. We also describe overall trends in the submitted systems and the key results. Each track introduced new datasets and participants achieved impressive results using state-of-the-art end-to-end technologies.
Nowadays, Twitter has become a great source of user-generated information about events. Very often people report causal relationships between events in their tweets. Automatic detection of causality information in these events might play an important role in predictive event analytics. Existing approaches include both rule-based and data-driven supervised methods. However, it is challenging to correctly identify event causality using only linguistic rules due to the highly unstructured nature and grammatical incorrectness of social media short text such as tweets. Also, it is difficult to develop a data-driven supervised method for event causality detection in tweets due to insufficient contextual information. This paper proposes a novel event context word extension technique based on background knowledge. To demonstrate the effectiveness of our proposed event context word extension technique, we develop a feed-forward neural network based approach to detect event causality from tweets. Extensive experiments demonstrate the superiority of our approach.
The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent’s effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.
This paper presents the Variation Network (VarNet), a generative model providing means to manipulate the high-level attributes of a given input. The originality of our approach is that VarNet is not only capable of handling pre-defined attributes but can also learn the relevant attributes of the dataset by itself. These two settings can be easily combined which makes VarNet applicable for a wide variety of tasks. Further, VarNet has a sound probabilistic interpretation which grants us with a novel way to navigate in the latent spaces as well as means to control how the attributes are learned. We demonstrate experimentally that this model is capable of performing interesting input manipulation and that the learned attributes are relevant and interpretable.
In this paper we present MLaut (Machine Learning AUtomation Toolbox) for the python data science ecosystem. MLaut automates large-scale evaluation and benchmarking of machine learning algorithms on a large number of datasets. MLaut provides a high-level workflow interface to machine algorithm algorithms, implements a local back-end to a database of dataset collections, trained algorithms, and experimental results, and provides easy-to-use interfaces to the scikit-learn and keras modelling libraries. Experiments are easy to set up with default settings in a few lines of code, while remaining fully customizable to the level of hyper-parameter tuning, pipeline composition, or deep learning architecture. As a principal test case for MLaut, we conducted a large-scale supervised classification study in order to benchmark the performance of a number of machine learning algorithms – to our knowledge also the first larger-scale study on standard supervised learning data sets to include deep learning algorithms. While corroborating a number of previous findings in literature, we found (within the limitations of our study) that deep neural networks do not perform well on basic supervised learning, i.e., outside the more specialized, image-, audio-, or text-based tasks.
Machine learning has emerged as the dominant tool for implementing complex cognitive tasks that require supervised, unsupervised, and reinforcement learning. While the resulting machines have demonstrated in some cases even super-human performance, their energy consumption has often proved to be prohibitive in the absence of costly super-computers. Most state-of-the-art machine learning solutions are based on memory-less models of neurons. This is unlike the neurons in the human brain, which encode and process information using temporal information in spike events. The different computing principles underlying biological neurons and how they combine together to efficiently process information is believed to be a key factor behind their superior efficiency compared to current machine learning systems. Inspired by the time-encoding mechanism used by the brain, third generation spiking neural networks (SNNs) are being studied for building a new class of information processing engines. Modern computing systems based on the von Neumann architecture, however, are ill-suited for efficiently implementing SNNs, since their performance is limited by the need to constantly shuttle data between physically separated logic and memory units. Hence, novel computational architectures that address the von Neumann bottleneck are necessary in order to build systems that can implement SNNs with low energy budgets. In this paper, we review some of the architectural and system level design aspects involved in developing a new class of brain-inspired information processing engines that mimic the time-based information encoding and processing aspects of the brain.