Mining Interpretable AOG Representations from Convolutional Networks via Active Question Answering

In this paper, we present a method to mine object-part patterns from conv-layers of a pre-trained convolutional neural network (CNN). The mined object-part patterns are organized by an And-Or graph (AOG). This interpretable AOG representation consists of a four-layer semantic hierarchy, i.e., semantic parts, part templates, latent patterns, and neural units. The AOG associates each object part with certain neural units in feature maps of conv-layers. The AOG is constructed in a weakly-supervised manner, i.e., very few annotations (e.g., 3-20) of object parts are used to guide the learning of AOGs. We develop a question-answering (QA) method that uses active human-computer communications to mine patterns from a pre-trained CNN, in order to incrementally explain more features in conv-layers. During the learning process, our QA method uses the current AOG for part localization. The QA method actively identifies objects, whose feature maps cannot be explained by the AOG. Then, our method asks people to annotate parts on the unexplained objects, and uses answers to discover CNN patterns corresponding to the newly labeled parts. In this way, our method gradually grows new branches and refines existing branches on the AOG to semanticize CNN representations. In experiments, our method exhibited a high learning efficiency. Our method used about 1/6-1/3 of the part annotations for training, but achieved similar or better part-localization performance than fast-RCNN methods.

Explanatory Graphs for CNNs

This paper introduces a graphical model, namely an explanatory graph, which reveals the knowledge hierarchy hidden inside conv-layers of a pre-trained CNN. Each filter in a conv-layer of a CNN for object classification usually represents a mixture of object parts. We develop a simple yet effective method to disentangle object-part pattern components from each filter. We construct an explanatory graph to organize the mined part patterns, where a node represents a part pattern, and each edge encodes co-activation relationships and spatial relationships between patterns. More crucially, given a pre-trained CNN, the explanatory graph is learned without a need of annotating object parts. Experiments show that each graph node consistently represented the same object part through different images, which boosted the transferability of CNN features. We transferred part patterns in the explanatory graph to the task of part localization, and our method significantly outperformed other approaches.

DeepLens: Towards a Visual Data Management System

Advances in deep learning have greatly widened the scope of automatic computer vision algorithms and enable users to ask questions directly about the content in images and video. This paper explores the necessary steps towards a future Visual Data Management System (VDMS), where the predictions of such deep learning models are stored, managed, queried, and indexed. We propose a query and data model that disentangles the neural network models used, the query workload, and the data source semantics from the query processing layer. Our system, DeepLens, is based on dataflow query processing systems and this research prototype presents initial experiments to elicit important open research questions in visual analytics systems. One of our main conclusions is that any future ‘declarative’ VDMS will have to revisit query optimization and automated physical design from a unified perspective of performance and accuracy tradeoffs. Physical design and query optimization choices can not only change performance by orders of magnitude, they can potentially affect the accuracy of results.

A General Theory for Large-Scale Curve Time Series via Functional Stability Measure

Modelling a large bundle of curves arises in a broad spectrum of real applications. However, existing literature relies primarily on the critical assumption of independent curve observations. In this paper, we provide a general theory for large-scale Gaussian curve time series, where the temporal and cross-sectional dependence across multiple curve observations exist and the number of functional variables, p, may be large relative to the number of observations, n. We propose a novel functional stability measure for multivariate stationary processes based on their spectral properties and use it to establish some useful concentration bounds on the sample covariance matrix function. These concentration bounds serve as a fundamental tool for further theoretical analysis, in particular, for deriving nonasymptotic upper bounds on the errors of the regularized estimates in high dimensional settings. As {\it functional principle component analysis} (FPCA) is one of the key techniques to handle functional data, we also investigate the concentration properties of the relevant estimated terms under a FPCA framework. To illustrate with an important application, we consider {\it vector functional autoregressive models} and develop a regularization approach to estimate autoregressive coefficient functions under the sparsity constraint. Using our derived nonasymptotic results, we investigate the theoretical properties of the regularized estimate in a ‘large p, small n‘ regime. The finite sample performance of the proposed method is examined through simulation studies.

wav2letter++: The Fastest Open-source Speech Recognition System

This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++’s training times scale linearly to 64 GPUs, the highest we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks.

Universal Successor Features Approximators

The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange of information between tasks, and the reuse of skills. We focus on one aspect in particular, namely the ability to generalise to unseen tasks. Parametric generalisation relies on the interpolation power of a function approximator that is given the task description as input; one of its most common form are universal value function approximators (UVFAs). Another way to generalise to new tasks is to exploit structure in the RL problem itself. Generalised policy improvement (GPI) combines solutions of previous tasks into a policy for the unseen task; this relies on instantaneous policy evaluation of old policies under the new reward function, which is made possible through successor features (SFs). Our proposed universal successor features approximators (USFAs) combine the advantages of all of these, namely the scalability of UVFAs, the instant inference of SFs, and the strong generalisation of GPI. We discuss the challenges involved in training a USFA, its generalisation properties and demonstrate its practical benefits and transfer abilities on a large-scale domain in which the agent has to navigate in a first-person perspective three-dimensional environment.

Clustering-Oriented Representation Learning with Attractive-Repulsive Loss

The standard loss function used to train neural network classifiers, categorical cross-entropy (CCE), seeks to maximize accuracy on the training data; building useful representations is not a necessary byproduct of this objective. In this work, we propose clustering-oriented representation learning (COREL) as an alternative to CCE in the context of a generalized attractive-repulsive loss framework. COREL has the consequence of building latent representations that collectively exhibit the quality of natural clustering within the latent space of the final hidden layer, according to a predefined similarity function. Despite being simple to implement, COREL variants outperform or perform equivalently to CCE in a variety of scenarios, including image and news article classification using both feed-forward and convolutional neural networks. Analysis of the latent spaces created with different similarity functions facilitates insights on the different use cases COREL variants can satisfy, where the Cosine-COREL variant makes a consistently clusterable latent space, while Gaussian-COREL consistently obtains better classification accuracy than CCE.

Deep Variational Sufficient Dimensionality Reduction

We consider the problem of sufficient dimensionality reduction (SDR), where the high-dimensional observation is transformed to a low-dimensional sub-space in which the information of the observations regarding the label variable is preserved. We propose DVSDR, a deep variational approach for sufficient dimensionality reduction. The deep structure in our model has a bottleneck that represent the low-dimensional embedding of the data. We explain the SDR problem using graphical models and use the framework of variational autoencoders to maximize the lower bound of the log-likelihood of the joint distribution of the observation and label. We show that such a maximization problem can be interpreted as solving the SDR problem. DVSDR can be easily adopted to semi-supervised learning setting. In our experiment we show that DVSDR performs competitively on classification tasks while being able to generate novel data samples.

Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Humans and animals can learn complex predictive models that allow them to accurately and reliably reason about real-world phenomena, and they can adapt such models extremely quickly in the face of unexpected changes. Deep neural network models allow us to represent very complex functions, but lack this capacity for rapid online adaptation. The goal in this paper is to develop a method for continual online learning from an incoming stream of data, using deep neural network models. We formulate an online learning procedure that uses stochastic gradient descent to update model parameters, and an expectation maximization algorithm with a Chinese restaurant process prior to develop and maintain a mixture of models to handle non-stationary task distributions. This allows for all models to be adapted as necessary, with new models instantiated for task changes and old models recalled when previously seen tasks are encountered again. Furthermore, we observe that meta-learning can be used to meta-train a model such that this direct online adaptation with SGD is effective, which is otherwise not the case for large function approximators. In this work, we apply our meta-learning for online learning (MOLe) approach to model-based reinforcement learning, where adapting the predictive model is critical for control; we demonstrate that MOLe outperforms alternative prior methods, and enables effective continuous adaptation in non-stationary task distributions such as varying terrains, motor failures, and unexpected disturbances.

Deep Gated Recurrent and Convolutional Network Hybrid Model for Univariate Time Series Classification

Hybrid LSTM-fully convolutional networks (LSTM-FCN) for time series classification have produced state-of-the-art classification results on univariate time series. We show that replacing the LSTM with a gated recurrent unit (GRU) to create a GRU-fully convolutional network hybrid model (GRU-FCN) can offer even better performance on many time series datasets. The proposed GRU-FCN model outperforms state-of-the-art classification performance in many univariate and multivariate time series datasets. In addition, since the GRU uses a simpler architecture than the LSTM, it has fewer training parameters, less training time, and a simpler hardware implementation, compared to the LSTM-based models.

A Comparison of LSTMs and Attention Mechanisms for Forecasting Financial Time Series

While LSTMs show increasingly promising results for forecasting Financial Time Series (FTS), this paper seeks to assess if attention mechanisms can further improve performance. The hypothesis is that attention can help prevent long-term dependencies experienced by LSTM models. To test this hypothesis, the main contribution of this paper is the implementation of an LSTM with attention. Both the benchmark LSTM and the LSTM with attention were compared and both achieved reasonable performances of up to 60% on five stocks from Kaggle’s Two Sigma dataset. This comparative analysis demonstrates that an LSTM with attention can indeed outperform standalone LSTMs but further investigation is required as issues do arise with such model architectures.

Generative One-Shot Learning (GOL): A Semi-Parametric Approach to One-Shot Learning in Autonomous Vision

Highly Autonomous Driving (HAD) systems rely on deep neural networks for the visual perception of the driving environment. Such networks are trained on large manually annotated databases. In this work, a semi-parametric approach to one-shot learning is proposed, with the aim of bypassing the manual annotation step required for training perceptions systems used in autonomous driving. The proposed generative framework, coined Generative One-Shot Learning (GOL), takes as input single one-shot objects, or generic patterns, and a small set of so-called regularization samples used to drive the generative process. New synthetic data is generated as Pareto optimal solutions from one-shot objects using a set of generalization functions built into a generalization generator. GOL has been evaluated on environment perception challenges encountered in autonomous vision.

Orchestrate: Infrastructure for Enabling Parallelism during Hyperparameter Optimization

Two key factors dominate the development of effective production grade machine learning models. First, it requires a local software implementation and iteration process. Second, it requires distributed infrastructure to efficiently conduct training and hyperparameter optimization. While modern machine learning frameworks are very effective at the former, practitioners are often left building ad hoc frameworks for the latter. We present SigOpt Orchestrate, a library for such simultaneous training in a cloud environment. We describe the motivating factors and resulting design of this library, feedback from initial testing, and future goals.

Machine Learning in Cyber-Security – Problems, Challenges and Data Sets

We present cyber-security problems of high importance. We show that in order to solve these cyber-security problems, one must cope with certain machine learning challenges. We provide novel data sets representing the problems in order to enable the academic community to investigate the problems and suggest methods to cope with the challenges. We also present a method to generate labels via pivoting, providing a solution to common problems of lack of labels in cyber-security.

PnP-AdaNet: Plug-and-Play Adversarial Domain Adaptation Network with a Benchmark at Cross-modality Cardiac Segmentation

Deep convolutional networks have demonstrated the state-of-the-art performance on various medical image computing tasks. Leveraging images from different modalities for the same analysis task holds clinical benefits. However, the generalization capability of deep models on test data with different distributions remain as a major challenge. In this paper, we propose the PnPAdaNet (plug-and-play adversarial domain adaptation network) for adapting segmentation networks between different modalities of medical images, e.g., MRI and CT. We propose to tackle the significant domain shift by aligning the feature spaces of source and target domains in an unsupervised manner. Specifically, a domain adaptation module flexibly replaces the early encoder layers of the source network, and the higher layers are shared between domains. With adversarial learning, we build two discriminators whose inputs are respectively multi-level features and predicted segmentation masks. We have validated our domain adaptation method on cardiac structure segmentation in unpaired MRI and CT. The experimental results with comprehensive ablation studies demonstrate the excellent efficacy of our proposed PnP-AdaNet. Moreover, we introduce a novel benchmark on the cardiac dataset for the task of unsupervised cross-modality domain adaptation. We will make our code and database publicly available, aiming to promote future studies on this challenging yet important research topic in medical imaging.

Progressive Data Science: Potential and Challenges

Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped up significantly by providing quick feedback on the impact of changes. The idea of progressive data science is to compute the results of changes in a progressive manner, returning a first approximation of results quickly and allow iterative refinements until converging to a final result. Enabling the user to interact with the intermediate results allows an early detection of erroneous or suboptimal choices, the guided definition of modifications to the pipeline and their quick assessment. In this paper, we discuss the progressiveness challenges arising in different steps of the data science pipeline. We describe how changes in each step of the pipeline impact the subsequent steps and outline why progressive data science will help to make the process more effective. Computing progressive approximations of outcomes resulting from changes creates numerous research challenges, especially if the changes are made in the early steps of the pipeline. We discuss these challenges and outline first steps towards progressiveness, which, we argue, will ultimately help to significantly speed-up the overall data science process.

When is sync globally stable in sparse networks of identical Kuramoto oscillators?

Synchronization in systems of coupled Kuramoto oscillators may depend on their natural frequencies, coupling, and underlying networks. In this paper, we reduce the alternatives to only one by considering identical oscillators where the only parameter that is allowed to change is the underlying network. While such a model was analyzed over the past few decades by studying the size of the basin of attraction of the synchronized state on restricted families of graphs, here we address a qualitative question on general graphs. In an analogy to resistive networks with current sources, we describe an algorithm that produces initial conditions that are often outside of the basin of attraction of the synchronized state. In particular, if a graph allows a cyclic graph clustering with a sufficient number of clusters or contains a sufficiently long induced subpath without cut vertices of the graph then there is a non-synchronous stable phase-locked solution. Thus, we provide a partial answer to when the synchronized state is not globally stable.