Learning Representations Using Complex-Valued Nets

Complex-valued neural networks (CVNNs) are an emerging field of research in neural networks due to their potential representational properties for audio, image, and physiological signals. It is common in signal processing to transform sequences of real values to the complex domain via a set of complex basis functions, such as the Fourier transform. We show how CVNNs can be used to learn complex representations of real valued time-series data. We present methods and results using a framework that can compose holomorphic and non-holomorphic functions in a multi-layer network using a theoretical result called the Wirtinger derivative. We test our methods on a representation learning task for real-valued signals, recurrent complex-valued networks and their real-valued counterparts. Our results show that recurrent complex-valued networks can perform as well as their real-valued counterparts while learning filters that are representative of the domain of the data.

Generating Sentences from a Continuous Space

The standard unsupervised recurrent neural network language model (RNNLM) generates sentences one word at a time and does not work from an explicit global distributed sentence representation. In this work, we present an RNN-based variational autoencoder language model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features. Samples from the prior over these sentence representations remarkably produce diverse and well-formed sentences through simple deterministic decoding. By examining paths through this latent space, we are able to generate coherent novel sentences that interpolate between known sentences. We present techniques for solving the difficult learning problem presented by this model, demonstrate strong performance in the imputation of missing tokens, and explore many interesting properties of the latent sentence space.

Online Batch Selection for Faster Training of Neural Networks

Deep neural networks are commonly trained using stochastic non-convex optimization procedures, which are driven by gradient information estimated on fractions (batches) of the dataset. While it is commonly accepted that batch size is an important parameter for offline tuning, the benefits of online selection of batches remain poorly understood. We investigate online batch selection strategies for two state-of-the-art methods of stochastic gradient-based optimization, AdaDelta and Adam. As the loss function to be minimized for the whole dataset is an aggregation of loss functions of individual datapoints, intuitively, datapoints with the greatest loss should be considered (selected in a batch) more frequently. However, the limitations of this intuition and the proper control of the selection pressure over time are open questions. We propose a simple strategy where all datapoints are ranked w.r.t. their latest known loss value and the probability to be selected decays exponentially as a function of rank. Our experimental results on the MNIST dataset suggest that selecting batches speeds up both AdaDelta and Adam by a factor of about 5.

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

The ability to act in multiple environments and transfer previous knowledge to new situations can be considered a critical aspect of any intelligent agent. Towards this goal, we define a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains. This method, termed ‘Actor-Mimic’, exploits the use of deep reinforcement learning and model compression techniques to train a single policy network that learns how to act in a set of distinct tasks by using the guidance of several expert teachers. We then show that the representations learnt by the deep policy network are capable of generalizing to new tasks, speeding up learning in novel environments. Although our method can in general be applied to a wide range of problems, we use Atari games as a testing environment to demonstrate these methods.

Communicating Semantics: Reference by Description

Messages often refer to entities such as people, places and events. Correct identification of the intended reference is an essential part of communication. Lack of shared unique names often complicates entity reference. Shared knowledge can be used to construct uniquely identifying descriptive references for entities with ambiguous names. We introduce a mathematical model for ‘Reference by Description’ and provide results on the conditions under which, with high probability, programs can construct unambiguous references to most entities in the domain of discourse.

Robust Classification by Pre-conditioned LASSO and Transductive Diffusion Component Analysis

Modern machine learning-based recognition approaches require large-scale datasets with large number of labelled training images. However, such datasets are inherently difficult and costly to collect and annotate. Hence there is a great and growing interest in automatic dataset collection methods that can leverage the web. % which are collected % in a cheap, efficient and yet unreliable way. Collecting datasets in this way, however, requires robust and efficient ways for detecting and excluding outliers that are common and prevalent. % Outliers are thus a % prominent treat of using these dataset. So far, there have been a limited effort in machine learning community to directly detect outliers for robust classification. Inspired by the recent work on Pre-conditioned LASSO, this paper formulates the outlier detection task using Pre-conditioned LASSO and employs \red{unsupervised} transductive diffusion component analysis to both integrate the topological structure of the data manifold, from labeled and unlabeled instances, and reduce the feature dimensionality. Synthetic experiments as well as results on two real-world classification tasks show that our framework can robustly detect the outliers and improve classification.

Unsupervised Deep Embedding for Clustering Analysis

Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods.

Network-based recommendation algorithms: A review

Recommender systems are a vital tool that helps us to overcome the information overload problem. They are being used by most e-commerce web sites and attract the interest of a broad scientific community. A recommender system uses data on users’ past preferences to choose new items that might be appreciated by a given individual user. While many approaches to recommendation exist, the approach based on a network representation of the input data has gained considerable attention in the past. We review here a broad range of network-based recommendation algorithms and for the first time compare their performance on three distinct real datasets. We present recommendation topics that go beyond the mere question of which algorithm to use – such as the possible influence of recommendation on the evolution of systems that use it – and finally discuss open research directions and challenges.

Predicting online user behaviour using deep learning algorithms

We propose a robust classifier to predict buying intentions based on user behaviour within a large e-commerce website. In this work we compare traditional machine learning techniques with the most advanced deep learning approaches. We show that both Deep Belief Networks and Stacked Denoising auto-Encoders achieved a substantial improvement by extracting features from high dimensional data during the pre-train phase. They prove also to be more convenient to deal with severe class imbalance.

Convolutional Clustering for Unsupervised Learning

The task of labeling data for training deep neural networks is daunting and tedious, requiring millions of labels to achieve the current state-of-the-art results. Such reliance on large amounts of labeled data can be relaxed by exploiting hierarchical features via unsupervised learning techniques. In this work, we propose to train a deep convolutional network based on an enhanced version of the k-means clustering algorithm, which reduces the number of correlated parameters in the form of similar filters, and thus increases test categorization accuracy. We call our algorithm convolutional k-means clustering. We further show that learning the connection between the layers of a deep convolutional neural network improves its ability to be trained on a smaller amount of labeled data. Our experiments show that the proposed algorithm outperforms other techniques that learn filters unsupervised. Specifically, we obtained a test accuracy of 74.1% on STL-10 and a test error of 1.4% on MNIST.

Knowledge Base Population using Semantic Label Propagation

A crucial aspect of a knowledge base population system that extracts new facts from text corpora, is the generation of training data for its relation extractors. In this paper, we present a method that maximizes the effectiveness of newly trained relation extractors at a minimal annotation cost. Manual labeling can be significantly reduced by Distant Supervision, which is a method to construct training data automatically by aligning a large text corpus with an existing knowledge base of known facts. For example, all sentences mentioning both ‘Barack Obama’ and ‘US’ may serve as positive training instances for the relation born_in(subject,object). However, distant supervision typically results in a highly noisy training set: many training sentences do not really express the intended relation. We propose to combine distant supervision with minimal manual supervision in a technique called feature labeling, to eliminate noise from the large and noisy initial training set, resulting in a significant increase of precision. We further improve on this approach by introducing the Semantic Label Propagation method, which uses the similarity between low-dimensional representations of candidate training instances, to extend the training set in order to increase recall while maintaining high precision. Our proposed strategy for generating training data is studied and evaluated on an established test collection designed for knowledge base population tasks. The experimental results show that the Semantic Label Propagation strategy leads to substantial performance gains when compared to existing approaches, while requiring an almost negligible manual annotation effort.

Reducing Overfitting in Deep Networks by Decorrelating Representations

One major challenge in training Deep Neural Networks is preventing overfitting. Many techniques such as data augmentation and novel regularizers such as Dropout have been proposed to prevent overfitting without requiring a massive amount of training data. In this work, we propose a new regularizer called DeCov which leads to significantly reduced overfitting (as indicated by the difference between train and val performance), and better generalization. Our regularizer encourages diverse or non-redundant representations in Deep Neural Networks by minimizing the cross-covariance of hidden activations. This simple intuition has been explored in a number of past works but surprisingly has never been applied as a regularizer in supervised learning. Experiments across a range of datasets and network architectures show that this loss always reduces overfitting while almost always maintaining or increasing generalization performance and often improving performance over Dropout.

SparkNet: Training Deep Networks in Spark

Training deep networks is a time-consuming process, with networks for object recognition often requiring multiple days to train. For this reason, leveraging the resources of a cluster to speed up training is an important area of work. However, widely-popular batch-processing computational frameworks like MapReduce and Spark were not designed to support the asynchronous and communication-intensive workloads of existing distributed deep learning systems. We introduce SparkNet, a framework for training deep networks in Spark. Our implementation includes a convenient interface for reading data from Spark RDDs, a Scala interface to the Caffe deep learning framework, and a lightweight multi-dimensional tensor library. Using a simple parallelization scheme for stochastic gradient descent, SparkNet scales well with the cluster size and tolerates very high-latency communication. Furthermore, it is easy to deploy and use with no parameter tuning, and it is compatible with existing Caffe models. We quantify the dependence of the speedup obtained by SparkNet on the number of machines, the communication frequency, and the cluster’s communication overhead, and we benchmark our system’s performance on the ImageNet dataset.

EIGENREC: An Efficient and Scalable Latent Factor Family for Top-N Recommendation

Sparsity presents one of the major challenges of Collaborative Filtering. Graph-based methods are known to alleviate its effects, however their use is often computationally prohibitive; Latent-Factor methods, on the other hand, present a reasonable and viable alternative. In this paper, we introduce EigenRec; a versatile and efficient Latent-Factor framework for Top-N Recommendations, that generalizes the well-known PureSVD algorithm (a) providing intuition about its inner structure, (b) paving the path towards improving its efficacy and, at the same time, (c) reducing its complexity. One of our central goals in this work is to ensure the applicability of our method in realistic big-data scenarios. To this end, we propose building our model using a computationally efficient Lanczos-based procedure, we discuss its Parallel Implementation in distributed computing environments, and we verify its favourable performance using real-world datasets. Furthermore, from a qualitative point of view, a comprehensive set of experiments on the MovieLens and the Yahoo!R2Music datasets based on widely applied performance metrics, indicate that EigenRec outperforms several state-of-the-art algorithms, in terms of Standard and Long-Tail recommendation accuracy, exhibiting low susceptibility to sparsity, even in its most extreme manifestations — the Cold-Start problems.

Segmental Recurrent Neural Networks

We introduce segmental recurrent neural networks (SRNNs) which define, given an input sequence, a joint probability distribution over segmentations of the input and labelings of the segments. Representations of the input segments (i.e., contiguous subsequences of the input) are computed by encoding their constituent tokens using bidirectional recurrent neural nets, and these ‘segment embeddings’ are used to define compatibility scores with output labels. These local compatibility scores are integrated using a global semi-Markov conditional random field. Both fully supervised training — in which segment boundaries and labels are observed — as well as partially supervised training — in which segment boundaries are latent — are straightforward. Experiments on handwriting recognition and joint Chinese word segmentation/POS tagging show that, compared to models that do not explicitly represent segments such as BIO tagging schemes and connectionist temporal classification (CTC), SRNNs obtain substantially higher accuracies.

Statistical Engineering: An Idea Whose Time Has Come?

Several authors, including the American Statistician (ASA), have noted the challenges facing statisticians when attacking large, complex, unstructured problems, as opposed to well-defined textbook problems. Clearly, the standard paradigm of selecting the one ‘correct’ statistical method for such problems is not sufficient; a new paradigm is needed. Statistical engineering has been proposed as a discipline that can provide a viable paradigm to attack such problems, used in conjunction with sound statistical science. Of course, in order to develop as a true discipline, statistical engineering needs a well-developed theory, not just a formal definition and successful case studies. This article documents and disseminates the current state of the underlying theory of statistical engineering. Our purpose is to provide a vehicle for applied statisticians to further enhance the practice of statistics, and for academics so interested to continue development of the underlying theory of statistical engineering.

Efficient inference in occlusion-aware generative models of images

Order-Embeddings of Images and Language

FRIST – Flipping and Rotation Invariant Sparsifying Transform Learning and Applications

Deep factorisation of the stable process II; potentials and applications

Structured Prediction Energy Networks

Medical Image Deep Learning with Hospital PACS Dataset

A Hopf algebraic approach to Schur function identities

Efficient Sum of Sparse Outer Products Dictionary Learning (SOUP-DIL)

Manifold Regularized Discriminative Neural Networks

Neural network-based clustering using pairwise constraints

Intragroup transfers, intragroup diversification and their risk assessment

Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks

Good, Better, Best: Choosing Word Embedding Context

Spatio-temporal video autoencoder with differentiable memory

Robust Convolutional Neural Networks under Adversarial Noise

Alternative structures for character-level RNNs

Conditional Computation in Neural Networks for faster models

Policy Distillation

Foveation-based Mechanisms Alleviate Adversarial Examples

Free monotone transport for infinite variables

The iterated auxiliary particle filter

Density Modeling of Images using a Generalized Normalization Transformation

Neural Programmer-Interpreters

Faster method for Deep Belief Network based Object classification using DWT

The relative effects of dimensionality and multiplicity of hypotheses on the F-test in linear regression

Multimodal Retrieval With Asymmetrically Weighted Truncated-SVD Canonical Correlation Analysis

PAC-Bayesian bounds for Principal Component Analysis in Hilbert spaces

Seasonal Linear Predictivity in National Football Championships

Robust dimension-free Gram operator estimates

Diffusing Private Data over Networks

Dynamics of Stochastic Gradient Algorithms

Discrete Bochner inequalities via the Bochner-Bakry-Emery approach for Markov chains

Critical Parameters in Particle Swarm Optimisation

Gaussian Mixture Embeddings for Multiple Word Prototypes

Multimodal sparse representation learning and applications

Towards Open Set Deep Networks

Increment stationarity of $L^2$-indexed stochastic processes: spectral representation and characterization

Empirical Research and Automatic Processing Method of Precision-specific Operation

Diffusion Representations

Adjustable Bounded Rectifiers: Towards Deep Binary Representations

Spherical Cap Packing Asymptotics and Rank-Extreme Detection

Importance Sampling: Computational Complexity and Intrinsic Dimension

Abstract Attribute Exploration with Partial Object Descriptions

Uniform Correlation Mixture of Bivariate Normal Distributions and Hypercubically-contoured Densities That Are Marginally Normal

Coreset-Based Adaptive Tracking

Counting sub-multisets of fixed cardinality

Bounds of distance Estrada index of graphs

CLT for biorthogonal ensembles and related combinatorial identities

The Kernel Two-Sample Test for Brain Networks

Hardness result for the total rainbow $k$-connection of graphs

Multifractal structure of Barkhausen noise: A signature of collective dynamics at hysteresis loop

Engineering slow light and mode crossover in a fractal-kagome waveguide network

Multi-task Sequence to Sequence Learning

Semi-supervised Learning for Convolutional Neural Networks via Online Graph Construction

Principled Parallel Mean-Field Inference for Discrete Random Fields

On Sketching Quadratic Forms

Local stability in a transient Markov chain

Variable Rate Image Compression with Recurrent Neural Networks

Learning Deep Structure-Preserving Image-Text Embeddings

Anderson transitions in disordered two-dimensional lattices

Mediated Experts for Deep Convolutional Networks

Convolutional neural networks with low-rank regularization

Transfer Learning for Speech and Language Processing

Deep Learning for Tactile Understanding From Visual and Haptic Data

A Novel Approach for Phase Identification in Smart Grids Using Graph Theory and Principal Component Analysis

Structural Dissection for Controlling Complex Networks

Cluster Variables and Perfect Matchings of Subgraphs of the $dP_3$ Lattice

Putting Things in Context: Community-specific Embedding Projections for Sentiment Analysis

A Family of Dense Mixed Graphs of Diameter $2$

What Objective Does Self-paced Learning Indeed Optimize?

Ramsey Orderly Algebras as a New Approach to Ramsey Algebras

Neural Variational Inference for Text Processing

Enumeration and Random Generation of Unlabeled Classes of Graphs: A Practical Study of Cycle Pointing and the Dissymmetry Theorem

Stochastic gradient method with accelerated stochastic dynamics

Malthusian Locks

Optimal measure transformation problems

BIRDNEST: Bayesian Inference for Ratings-Fraud Detection

A Tensor-Train accelerated solver for integral equations in complex geometries

Optimal inference in a class of regression models

On the Refracted-Reflected Spectrally Negative Lévy Processes

Packings of 3D stars: Stability and structure

Bounds, Approximation, and Hardness for the Burning Number

Simulating Branching Programs with Edit Distance and Friends or: A Polylog Shaved is a Lower Bound Made

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

A Roth type theorem for dense subsets of $\mathbb{R}^d$

Studying the control of non invasive prosthetic hands over large time spans

A pilot study on the daily control capability of s-EMG prosthetic hands by amputees

A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest

Theory of the Structural Glass Transition: A Pedagogical Review

Algorithms for Communication Problems for Mobile Agents Exchanging Energy

Infinite irredundant equational axiomatisability for a finite monoid

Parameter inference with estimated covariance matrices

Prioritized Experience Replay