Learning the Architecture of Deep Neural Networks

Deep neural networks with millions of parameters are at the heart of many state of the art machine learning models today. However, recent works have shown that models with much smaller number of parameters can also perform just as well. In this work, we introduce the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights. We introduce a new trainable parameter called tri-state ReLU, which helps in eliminating unnecessary neurons. We also propose a smooth regularizer which encourages the total number of neurons after elimination to be small. The resulting objective is differentiable and simple to optimize. We experimentally validate our method on both small and large networks, and show that it can learn models with a considerably small number of parameters without affecting prediction accuracy.


Gated Graph Sequence Neural Networks

Graph-structured data appears frequently in domains including chemistry, natural language semantics, social networks, and knowledge bases. In this work, we study feature learning techniques for graph-structured inputs. Our starting point is previous work on Graph Neural Networks (Scarselli et al., 2009), which we modify to use gated recurrent units and modern optimization techniques and then extend to output sequences. The result is a flexible and broadly useful class of neural network models that has favorable inductive biases relative to purely sequence-based models (e.g., LSTMs) when the problem is graph-structured. We demonstrate the capabilities on some simple AI (bAbI) and graph algorithm learning tasks. We then show it achieves state-of-the-art performance on a problem from program verification, in which subgraphs need to be matched to abstract data structures.


Sufficient dimension reduction for ordinal predictors

The dimension reduction approaches commonly found in applications involving ordinal predictors are either extensions of unsupervised techniques such as principal component analysis or variable selection while modeling the regression under consideration. We introduce here a supervised model-based sufficient dimension reduction method targeting ordered categorical predictors. It is developed under the inverse regression framework and assumes the existence of a latent Gaussian variable that generates the discrete data after thresholding. The reduction is chosen before modelling the response as a function of the predictors and does not impose any distributional assumption on the response or the response given the predictors. A maximum likelihood estimator of the reduction is derived and an iterative EM-type algorithm is proposed to alleviate the computational load and thus make the method more practical. A regularized estimator which achieves variable selection and dimension reduction simultaneously is also presented. We illustrate the good performance of the proposed method through simulations and real data examples, including movie recommendation and socioeconomic index construction.


Optimized Linear Imputation

Often in real-world datasets, especially in high dimensional data, some feature values are missing. Since most data analysis and statistical methods do not handle gracefully missing values, the ?rst step in the analysis requires the imputation of missing values. Indeed, there has been a long standing interest in methods for the imputation of missing values as a pre-processing step. One recent and e?ective approach, the IRMI stepwise regression imputation method, uses a linear regression model for each real-valued feature on the basis of all other features in the dataset. However, the proposed iterative formulation lacks convergence guarantee. Here we propose a closely related method, stated as a single optimization problem and a block coordinate-descent solution which is guaranteed to converge to a local minimum. Experiments show results on both synthetic and benchmark datasets, which are comparable to the results of the IRMI method whenever it converges. However, while in the set of experiments described here IRMI often does not converge, the performance of our methods is shown to be markedly superior in comparison with other methods.


Structural-RNN: Deep Learning on Spatio-Temporal Graphs

Deep Recurrent Neural Network architectures, though remarkably capable at modeling sequences, lack an intuitive high-level spatio-temporal structure. That is while many problems in computer vision inherently have an underlying high-level structure and can benefit from it. Spatio-temporal graphs are a popular flexible tool for imposing such high-level intuitions in the formulation of real world problems. In this paper, we propose an approach for combining the power of high-level spatio-temporal graphs and sequence learning success of Recurrent Neural Networks~(RNNs). We develop a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable. The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well defined steps. The evaluations of the proposed approach on a diverse set of problems, ranging from modeling human motion to object interactions, shows improvement over the state-of-the-art with a large margin. We expect this method to empower a new convenient approach to problem formulation through high-level spatio-temporal graphs and Recurrent Neural Networks, and be of broad interest to the community.


Semi-supervised Collaborative Ranking with Push at Top

Existing collaborative ranking based recommender systems tend to perform best when there is enough observed ratings for each user and the observation is made completely at random. Under this setting recommender systems can properly suggest a list of recommendations according to the user interests. However, when the observed ratings are extremely sparse (e.g. in the case of cold-start users where no rating data is available), and are not sampled uniformly at random, existing ranking methods fail to effectively leverage side information to transduct the knowledge from existing ratings to unobserved ones. We propose a semi-supervised collaborative ranking model, dubbed \texttt{S^2COR}, to improve the quality of cold-start recommendation. \texttt{S^2COR} mitigates the sparsity issue by leveraging side information about both observed and missing ratings by collaboratively learning the ranking model. This enables it to deal with the case of missing data not at random, but to also effectively incorporate the available side information in transduction. We experimentally evaluated our proposed algorithm on a number of challenging real-world datasets and compared against state-of-the-art models for cold-start recommendation. We report significantly higher quality recommendations with our algorithm compared to the state-of-the-art.


The Use of Machine Learning Algorithms in Recommender Systems: A Systematic Review

Recommender systems use algorithms to provide users product recommendations. Recently, these systems started using machine learning algorithms because of the progress and popularity of the artificial intelligence research field. However, choosing the suitable machine learning algorithm is difficult because of the sheer number of algorithms available in the literature. Researchers and practitioners are left with little information about the best approaches or the trends in algorithms usage. Moreover, the development of a recommender system featuring a machine learning algorithm has problems and open questions that must be evaluated, so software engineers know where to focus research efforts. This work presents a systematic review of the literature that analyzes the use of machine learning algorithms in recommender systems and identifies research opportunities for the software engineering research field. The study concluded that Bayesian and decision tree algorithms are widely used in recommender systems because of their low complexity, and that requirements and design phases of recommender system development must be investigated for research opportunities.


Requirements Engineering for General Recommender Systems

In requirements engineering for recommender systems, software engineers must identify the data that recommendations will be based on. This is a manual and labor-intensive task, which is error-prone and expensive. One solution to this problem is the adoption of an automatic recommender system development based on a general framework. One step towards the creation of such a framework is to determine what data is used in recommender systems. In this paper, a systematic review has been done to identify what data from the user and from a recommendation item a general recommender system needs. A user and an item model is proposed and described, and some considerations about algorithm specific parameters are explained. A further goal is to study the impact of the fields of big data and Internet of things into recommendation systems.


Controlling Bias in Adaptive Data Analysis Using Information Theory

Modern data is messy and high-dimensional, and it is often not clear a priori what are the right questions to ask. Instead, the analyst typically needs to use the data to search for interesting analyses to perform and hypotheses to test. This is an adaptive process, where the choice of analysis to be performed next depends on the results of the previous analyses on the same data. It’s widely recognized that this process, even if well-intentioned, can lead to biases and false discoveries, contributing to the crisis of reproducibility in science. But while adaptivity renders standard statistical theory invalid, folklore and experience suggest that not all types of adaptive analysis are equally at risk for false discoveries. In this paper, we propose a general information-theoretic framework to quantify and provably bound the bias and other statistics of an arbitrary adaptive analysis process. We prove that our mutual information based bound is tight in natural models, and then use it to give rigorous insights into when commonly used procedures do or do not lead to substantially biased estimation. We first consider several popular feature selection protocols, like rank selection or variance-based selection. We then consider the practice of adding random noise to the observations or to the reported statistics, which is advocated by related ideas from differential privacy and blinded data analysis. We discuss the connections between these techniques and our framework, and supplement our results with illustrative simulations.


MuProp: Unbiased Backpropagation for Stochastic Neural Networks

Deep neural networks are powerful parametric models that can be trained efficiently using the backpropagation algorithm. Stochastic neural networks combine the power of large parametric functions with that of graphical models, which makes it possible to learn very complex distributions. However, as backpropagation is not directly applicable to stochastic networks that include discrete sampling operations within their computational graph, training such networks remains difficult. We present MuProp, an unbiased gradient estimator for stochastic networks, designed to make this task easier. MuProp improves on the likelihood-ratio estimator by reducing its variance using a control variate based on the first-order Taylor expansion of a mean-field network. Crucially, unlike prior attempts at using backpropagation for training stochastic networks, the resulting estimator is unbiased and well behaved. Our experiments on structured output prediction and discrete latent variable modeling demonstrate that MuProp yields consistently good performance across a range of difficult tasks.


Return of Frustratingly Easy Domain Adaptation

Complexity and Approximability of Parameterized MAX-CSPs

Solutions to the T-systems with Principal Coefficients

On the density of the odd values of the partition function

Articulated Motion Learning via Visual and Lingual Signals

A note on Ising random currents, Ising-FK, loop-soups and the Gaussian free field

Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks

Better $s$-$t$-Tours by Gao Trees

Neurocontrol methods review

Active exploration of sensor networks from a robotics perspective

Accelerating pseudo-marginal Metropolis-Hastings by correlating auxiliary variables

A fast method to estimate speciation parameters in a model of isolation with an initial period of gene flow and to test alternative evolutionary scenarios

Concurrent enhancement of percolation and synchronization in adaptive networks

Predictive Entropy Search for Multi-objective Bayesian Optimization

The Föllmer-Schweizer decomposition under incomplete information

Extending Gossip Algorithms to Distributed Estimation of U-Statistics

On the restricted invertibility problem with an additional orthogonality constraint for random matrices

Zubieta’s Conjecture on the Enumeration of Corners in Symmetric Tree-like Tableaux

Tight Running Time Lower Bounds for Vertex Deletion Problems

Deep multi-scale video prediction beyond mean square error

Reaching consensus on a connected graph

Zubieta’s Conjecture on the Enumeration of Corners in Tree-like Tableaux

Quantile universal threshold for model selection

Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization

On the number of touching pairs in a set of planar curves

Simple, Fast and Accurate Photometric Estimation of Specific Star Formation Rate

Tail generating functions for the extendable branching processes

On the Existence of Tree Backbones that Realize the Chromatic Number on a Backbone Coloring

Identification of homophily and preferential recruitment in respondent-driven sampling

Synchronization of phase oscillators with frequency-weighted coupling

Infinite Dimensional Word Embeddings

Learning to retrieve out-of-vocabulary words in speech recognition

Crossing probability for directed polymers in random media: exact tail of the distribution

Bayesian Optimization with Dimension Scheduling: Application to Biological Systems

On path sequences of graphs

The Gibbs-plaid biclustering model

Bayesian analysis of ambulatory blood pressure dynamics with application to irregularly spaced sparse data

Constant Time EXPected Similarity Estimation using Stochastic Optimization

Small Deviations for Dependent Sequences

Using somatic mutation data to test tumors for clonal relatedness

Combining nonexchangeable functional or survival data sources in oncology using generalized mixture commensurate priors

Accelerating Random Kaczmarz Algorithm Based on Clustering Information

Ladder epochs and ladder chain of a Markov random walk with discrete driving chain

Inferring constructs of effective teaching from classroom observations: An application of Bayesian exploratory factor analysis without restrictions

A new set of asymmetric filters for tracking the short-term trend in real-time

A note on the computation of Wasserstein barycenters

Wide Consensus for Parallelized Inference

Coupling methods for multistage sampling

Uniform change point tests in high dimension

An invitation to coupling and copulas: with applications to multisensory modeling

Decomposition of a cube into nearly equal smaller cubes

On the interplay of network structure and gradient convergence in deep learning

A note on the colorful fractional Helly theorem

Classifying and Segmenting Microscopy Images Using Convolutional Multiple Instance Learning

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

Tight Logarithmic Asymptotic for the Probability of $n\times n$ Random Matrix with Uniform Distributed $\pm 1$ Entries to be Singular

Free Functional Inequalities on the Circle

Enhanced detectability of community structure in multilayer networks through layer aggregation

Spectral analysis and clustering of large stochastic networks. Application to the Lennard-Jones-75 cluster

Protein sequence labelling by AUC-maximized Deep Convolutional Neural Fields

Robust PCA via Nonconvex Rank Approximation

The Hidden Subgraph Problem

An extension of McDiarmid’s inequality

Light tails and the Hermitian dual polar graphs

Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

Fixed points and stability in the two-network frustrated Kuramoto model

Binary embeddings with structured hashed projections

HOID: Higher Order Interpolatory Decomposition for tensors based on Tucker representation

Efficient AUC Optimization for Information Ranking Applications

The capacity of Bernoulli nonadaptive group testing

Sparse-promoting Full Waveform Inversion based on Online Orthonormal Dictionary Learning

Compound Poisson process with a Poisson subordinator

Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models

Jet-Images — Deep Learning Edition

Mixture model of pottery distributions from Lake Chad Basin archaeological sites reveals ancient segregation patterns

Convolutional Models for Joint Object Categorization and Pose Estimation

Cross-scale predictive dictionaries

Conditions for permanental processes to be unbounded

Extended slow dynamical regime prefiguring the many-body localization transition

Waves in a Spatial Queue: Stop-and-Go at Airport Security