• Learning the Architecture of Deep Neural Networks
Deep neural networks with millions of parameters are at the heart of many state of the art machine learning models today. However, recent works have shown that models with much smaller number of parameters can also perform just as well. In this work, we introduce the problem of architecture-learning, i.e; learning the architecture of a neural network along with weights. We introduce a new trainable parameter called tri-state ReLU, which helps in eliminating unnecessary neurons. We also propose a smooth regularizer which encourages the total number of neurons after elimination to be small. The resulting objective is differentiable and simple to optimize. We experimentally validate our method on both small and large networks, and show that it can learn models with a considerably small number of parameters without affecting prediction accuracy.
• Gated Graph Sequence Neural Networks
Graph-structured data appears frequently in domains including chemistry, natural language semantics, social networks, and knowledge bases. In this work, we study feature learning techniques for graph-structured inputs. Our starting point is previous work on Graph Neural Networks (Scarselli et al., 2009), which we modify to use gated recurrent units and modern optimization techniques and then extend to output sequences. The result is a flexible and broadly useful class of neural network models that has favorable inductive biases relative to purely sequence-based models (e.g., LSTMs) when the problem is graph-structured. We demonstrate the capabilities on some simple AI (bAbI) and graph algorithm learning tasks. We then show it achieves state-of-the-art performance on a problem from program verification, in which subgraphs need to be matched to abstract data structures.
• Sufficient dimension reduction for ordinal predictors
The dimension reduction approaches commonly found in applications involving ordinal predictors are either extensions of unsupervised techniques such as principal component analysis or variable selection while modeling the regression under consideration. We introduce here a supervised model-based sufficient dimension reduction method targeting ordered categorical predictors. It is developed under the inverse regression framework and assumes the existence of a latent Gaussian variable that generates the discrete data after thresholding. The reduction is chosen before modelling the response as a function of the predictors and does not impose any distributional assumption on the response or the response given the predictors. A maximum likelihood estimator of the reduction is derived and an iterative EM-type algorithm is proposed to alleviate the computational load and thus make the method more practical. A regularized estimator which achieves variable selection and dimension reduction simultaneously is also presented. We illustrate the good performance of the proposed method through simulations and real data examples, including movie recommendation and socioeconomic index construction.
• Optimized Linear Imputation
Often in real-world datasets, especially in high dimensional data, some feature values are missing. Since most data analysis and statistical methods do not handle gracefully missing values, the ?rst step in the analysis requires the imputation of missing values. Indeed, there has been a long standing interest in methods for the imputation of missing values as a pre-processing step. One recent and e?ective approach, the IRMI stepwise regression imputation method, uses a linear regression model for each real-valued feature on the basis of all other features in the dataset. However, the proposed iterative formulation lacks convergence guarantee. Here we propose a closely related method, stated as a single optimization problem and a block coordinate-descent solution which is guaranteed to converge to a local minimum. Experiments show results on both synthetic and benchmark datasets, which are comparable to the results of the IRMI method whenever it converges. However, while in the set of experiments described here IRMI often does not converge, the performance of our methods is shown to be markedly superior in comparison with other methods.
• Structural-RNN: Deep Learning on Spatio-Temporal Graphs
Deep Recurrent Neural Network architectures, though remarkably capable at modeling sequences, lack an intuitive high-level spatio-temporal structure. That is while many problems in computer vision inherently have an underlying high-level structure and can benefit from it. Spatio-temporal graphs are a popular flexible tool for imposing such high-level intuitions in the formulation of real world problems. In this paper, we propose an approach for combining the power of high-level spatio-temporal graphs and sequence learning success of Recurrent Neural Networks~(RNNs). We develop a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable. The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well defined steps. The evaluations of the proposed approach on a diverse set of problems, ranging from modeling human motion to object interactions, shows improvement over the state-of-the-art with a large margin. We expect this method to empower a new convenient approach to problem formulation through high-level spatio-temporal graphs and Recurrent Neural Networks, and be of broad interest to the community.
• Semi-supervised Collaborative Ranking with Push at Top
Existing collaborative ranking based recommender systems tend to perform best when there is enough observed ratings for each user and the observation is made completely at random. Under this setting recommender systems can properly suggest a list of recommendations according to the user interests. However, when the observed ratings are extremely sparse (e.g. in the case of cold-start users where no rating data is available), and are not sampled uniformly at random, existing ranking methods fail to effectively leverage side information to transduct the knowledge from existing ratings to unobserved ones. We propose a semi-supervised collaborative ranking model, dubbed \texttt{S
COR}, to improve the quality of cold-start recommendation. \texttt{S
COR} mitigates the sparsity issue by leveraging side information about both observed and missing ratings by collaboratively learning the ranking model. This enables it to deal with the case of missing data not at random, but to also effectively incorporate the available side information in transduction. We experimentally evaluated our proposed algorithm on a number of challenging real-world datasets and compared against state-of-the-art models for cold-start recommendation. We report significantly higher quality recommendations with our algorithm compared to the state-of-the-art.
• The Use of Machine Learning Algorithms in Recommender Systems: A Systematic Review
Recommender systems use algorithms to provide users product recommendations. Recently, these systems started using machine learning algorithms because of the progress and popularity of the artificial intelligence research field. However, choosing the suitable machine learning algorithm is difficult because of the sheer number of algorithms available in the literature. Researchers and practitioners are left with little information about the best approaches or the trends in algorithms usage. Moreover, the development of a recommender system featuring a machine learning algorithm has problems and open questions that must be evaluated, so software engineers know where to focus research efforts. This work presents a systematic review of the literature that analyzes the use of machine learning algorithms in recommender systems and identifies research opportunities for the software engineering research field. The study concluded that Bayesian and decision tree algorithms are widely used in recommender systems because of their low complexity, and that requirements and design phases of recommender system development must be investigated for research opportunities.
• Requirements Engineering for General Recommender Systems
In requirements engineering for recommender systems, software engineers must identify the data that recommendations will be based on. This is a manual and labor-intensive task, which is error-prone and expensive. One solution to this problem is the adoption of an automatic recommender system development based on a general framework. One step towards the creation of such a framework is to determine what data is used in recommender systems. In this paper, a systematic review has been done to identify what data from the user and from a recommendation item a general recommender system needs. A user and an item model is proposed and described, and some considerations about algorithm specific parameters are explained. A further goal is to study the impact of the fields of big data and Internet of things into recommendation systems.
• Controlling Bias in Adaptive Data Analysis Using Information Theory
Modern data is messy and high-dimensional, and it is often not clear a priori what are the right questions to ask. Instead, the analyst typically needs to use the data to search for interesting analyses to perform and hypotheses to test. This is an adaptive process, where the choice of analysis to be performed next depends on the results of the previous analyses on the same data. It’s widely recognized that this process, even if well-intentioned, can lead to biases and false discoveries, contributing to the crisis of reproducibility in science. But while adaptivity renders standard statistical theory invalid, folklore and experience suggest that not all types of adaptive analysis are equally at risk for false discoveries. In this paper, we propose a general information-theoretic framework to quantify and provably bound the bias and other statistics of an arbitrary adaptive analysis process. We prove that our mutual information based bound is tight in natural models, and then use it to give rigorous insights into when commonly used procedures do or do not lead to substantially biased estimation. We first consider several popular feature selection protocols, like rank selection or variance-based selection. We then consider the practice of adding random noise to the observations or to the reported statistics, which is advocated by related ideas from differential privacy and blinded data analysis. We discuss the connections between these techniques and our framework, and supplement our results with illustrative simulations.
• MuProp: Unbiased Backpropagation for Stochastic Neural Networks
Deep neural networks are powerful parametric models that can be trained efficiently using the backpropagation algorithm. Stochastic neural networks combine the power of large parametric functions with that of graphical models, which makes it possible to learn very complex distributions. However, as backpropagation is not directly applicable to stochastic networks that include discrete sampling operations within their computational graph, training such networks remains difficult. We present MuProp, an unbiased gradient estimator for stochastic networks, designed to make this task easier. MuProp improves on the likelihood-ratio estimator by reducing its variance using a control variate based on the first-order Taylor expansion of a mean-field network. Crucially, unlike prior attempts at using backpropagation for training stochastic networks, the resulting estimator is unbiased and well behaved. Our experiments on structured output prediction and discrete latent variable modeling demonstrate that MuProp yields consistently good performance across a range of difficult tasks.
• Return of Frustratingly Easy Domain Adaptation
• Complexity and Approximability of Parameterized MAX-CSPs
• Solutions to the T-systems with Principal Coefficients
• On the density of the odd values of the partition function
• Articulated Motion Learning via Visual and Lingual Signals
• A note on Ising random currents, Ising-FK, loop-soups and the Gaussian free field
• Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks
• Better $s$-$t$-Tours by Gao Trees
• Neurocontrol methods review
• Active exploration of sensor networks from a robotics perspective
• Accelerating pseudo-marginal Metropolis-Hastings by correlating auxiliary variables
• A fast method to estimate speciation parameters in a model of isolation with an initial period of gene flow and to test alternative evolutionary scenarios
• Concurrent enhancement of percolation and synchronization in adaptive networks
• Predictive Entropy Search for Multi-objective Bayesian Optimization
• The Föllmer-Schweizer decomposition under incomplete information
• Extending Gossip Algorithms to Distributed Estimation of U-Statistics
• On the restricted invertibility problem with an additional orthogonality constraint for random matrices
• Zubieta’s Conjecture on the Enumeration of Corners in Symmetric Tree-like Tableaux
• Tight Running Time Lower Bounds for Vertex Deletion Problems
• Deep multi-scale video prediction beyond mean square error
• Reaching consensus on a connected graph
• Zubieta’s Conjecture on the Enumeration of Corners in Tree-like Tableaux
• Quantile universal threshold for model selection
• Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization
• On the number of touching pairs in a set of planar curves
• Simple, Fast and Accurate Photometric Estimation of Specific Star Formation Rate
• Tail generating functions for the extendable branching processes
• On the Existence of Tree Backbones that Realize the Chromatic Number on a Backbone Coloring
• Identification of homophily and preferential recruitment in respondent-driven sampling
• Synchronization of phase oscillators with frequency-weighted coupling
• Infinite Dimensional Word Embeddings
• Learning to retrieve out-of-vocabulary words in speech recognition
• Crossing probability for directed polymers in random media: exact tail of the distribution
• Bayesian Optimization with Dimension Scheduling: Application to Biological Systems
• On path sequences of graphs
• The Gibbs-plaid biclustering model
• Bayesian analysis of ambulatory blood pressure dynamics with application to irregularly spaced sparse data
• Constant Time EXPected Similarity Estimation using Stochastic Optimization
• Small Deviations for Dependent Sequences
• Using somatic mutation data to test tumors for clonal relatedness
• Combining nonexchangeable functional or survival data sources in oncology using generalized mixture commensurate priors
• Accelerating Random Kaczmarz Algorithm Based on Clustering Information
• Ladder epochs and ladder chain of a Markov random walk with discrete driving chain
• Inferring constructs of effective teaching from classroom observations: An application of Bayesian exploratory factor analysis without restrictions
• A new set of asymmetric filters for tracking the short-term trend in real-time
• A note on the computation of Wasserstein barycenters
• Wide Consensus for Parallelized Inference
• Coupling methods for multistage sampling
• Uniform change point tests in high dimension
• An invitation to coupling and copulas: with applications to multisensory modeling
• Decomposition of a cube into nearly equal smaller cubes
• On the interplay of network structure and gradient convergence in deep learning
• A note on the colorful fractional Helly theorem
• Classifying and Segmenting Microscopy Images Using Convolutional Multiple Instance Learning
• Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
• Tight Logarithmic Asymptotic for the Probability of $n\times n$ Random Matrix with Uniform Distributed $\pm 1$ Entries to be Singular
• Free Functional Inequalities on the Circle
• Enhanced detectability of community structure in multilayer networks through layer aggregation
• Spectral analysis and clustering of large stochastic networks. Application to the Lennard-Jones-75 cluster
• Protein sequence labelling by AUC-maximized Deep Convolutional Neural Fields
• Robust PCA via Nonconvex Rank Approximation
• The Hidden Subgraph Problem
• An extension of McDiarmid’s inequality
• Light tails and the Hermitian dual polar graphs
• Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets
• Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
• Fixed points and stability in the two-network frustrated Kuramoto model
• Binary embeddings with structured hashed projections
• HOID: Higher Order Interpolatory Decomposition for tensors based on Tucker representation
• Efficient AUC Optimization for Information Ranking Applications
• The capacity of Bernoulli nonadaptive group testing
• Sparse-promoting Full Waveform Inversion based on Online Orthonormal Dictionary Learning
• Compound Poisson process with a Poisson subordinator
• Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models
• Jet-Images — Deep Learning Edition
• Mixture model of pottery distributions from Lake Chad Basin archaeological sites reveals ancient segregation patterns
• Convolutional Models for Joint Object Categorization and Pose Estimation
• Cross-scale predictive dictionaries
• Conditions for permanental processes to be unbounded
• Extended slow dynamical regime prefiguring the many-body localization transition
• Waves in a Spatial Queue: Stop-and-Go at Airport Security
Like this:
Like Loading...