Poor starting points in machine learning

Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points — indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Nesterov methods not designed for stochastic approximation could hurt during later iterations. The common practice of training with nontrivial minibatches enhances the advantage of Nesterov acceleration.


BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

We introduce BinaryNet, a method which trains DNNs with binary weights and activations when computing parameters’ gradient. We show that it is possible to train a Multi Layer Perceptron (MLP) on MNIST and ConvNets on CIFAR-10 and SVHN with BinaryNet and achieve nearly state-of-the-art results. At run-time, BinaryNet drastically reduces memory usage and replaces most multiplications by 1-bit exclusive-not-or (XNOR) operations, which might have a big impact on both general-purpose and dedicated Deep Learning hardware. We wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST MLP 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The code for BinaryNet is available.


Value Iteration Networks

We introduce the value iteration network: a fully differentiable neural network with a `planning module’ embedded within. Value iteration networks are suitable for making predictions about outcomes that involve planning-based reasoning, such as predicting a desired trajectory from an observation of a map. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate our value iteration networks on the task of predicting optimal obstacle-avoiding trajectories from an image of a landscape, both on synthetic data, and on challenging raw images of the Mars terrain.


Turbocharging Mini-Batch K-Means

We propose an accelerated Mini-Batch k-means algorithm which combines three key improvements. The first is a modified center update which results in convergence to a local minimum in fewer iterations. The second is an adaptive increase of batchsize to meet an increasing requirement for centroid accuracy. The third is the inclusion of distance bounds based on the triangle inequality, which are used to eliminate distance calculations along the same lines as Elkan’s algorithm. The combination of the two latter constitutes a very powerful scheme to reuse computation already done over samples until statistical accuracy requires the use of additional data points.


Big Graph Mining: Frameworks and Techniques

Big graph mining is an important research area and it has attracted considerable attention. It allows to process, analyze, and extract meaningful information from large amounts of graph data. Big graph mining has been highly motivated not only by the tremendously increasing size of graphs but also by its huge number of applications. Such applications include bioinformatics, chemoinformatics and social networks. One of the most challenging tasks in big graph mining is pattern mining in big graphs. This task consists on using data mining algorithms to discover interesting, unexpected and useful patterns in large amounts of graph data. It aims also to provide deeper understanding of graph data. In this context, several graph processing frameworks and scaling data mining/pattern mining techniques have been proposed to deal with very big graphs. This paper gives an overview of existing data mining and graph processing frameworks that deal with very big graphs. Then it presents a survey of current researches in the field of data mining / pattern mining in big graphs and discusses the main research issues related to this field. It also gives a categorization of both distributed data mining and machine learning techniques, graph processing frameworks and large scale pattern mining approaches.


Large scale multi-objective optimization: Theoretical and practical challenges

Multi-objective optimization (MOO) is a well-studied problem for several important recommendation problems. While multiple approaches have been proposed, in this work, we focus on using constrained optimization formulations (e.g., quadratic and linear programs) to formulate and solve MOO problems. This approach can be used to pick desired operating points on the trade-off curve between multiple objectives. It also works well for internet applications which serve large volumes of online traffic, by working with Lagrangian duality formulation to connect dual solutions (computed offline) with the primal solutions (computed online). We identify some key limitations of this approach – namely the inability to handle user and item level constraints, scalability considerations and variance of dual estimates introduced by sampling processes. We propose solutions for each of the problems and demonstrate how through these solutions we significantly advance the state-of-the-art in this realm. Our proposed methods can exactly handle user and item (and other such local) constraints, achieve a 100\times scalability boost over existing packages in R and reduce variance of dual estimates by two orders of magnitude.


DCM Bandits: Learning to Rank with Multiple Clicks

Search engines recommend a list of web pages. The user examines this list, from the first page to the last, and may click on multiple attractive pages. This type of user behavior can be modeled by the \emph{dependent click model (DCM)}. In this work, we propose \emph{DCM bandits}, an online learning variant of the DCM model where the objective is to maximize the probability of recommending a satisfactory item. The main challenge of our problem is that the learning agent does not observe the reward. It only observes the clicks. This imbalance between the feedback and rewards makes our setting challenging. We propose a computationally-efficient learning algorithm for our problem, which we call dcmKL-UCB; derive gap-dependent upper bounds on its regret under reasonable assumptions; and prove a matching lower bound up to logarithmic factors. We experiment with dcmKL-UCB on both synthetic and real-world problems. Our algorithm outperforms a range of baselines and performs well even when our modeling assumptions are violated. To the best of our knowledge, this is the first regret-optimal online learning algorithm for learning to rank with multiple clicks in a cascade-like model.


Spanning Trees and Mahler Measure

Large deviations for infectious diseases models

Asymptotic *-moments of some random Vandermonde matrices

Combinatorial Scoring of Phylogenetic Networks

Collaborative filtering via sparse Markov random fields

Online Active Linear Regression via Thresholding

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Supplementary difference sets related to a certain class of complex spherical 2-codes

Compliance-Aware Bandits

On Stepwise Control of Directional Errors under Independence and Some Dependence

A model for risk assessment of a large earthquake with application to Chilean data

A Feature-Based Prediction Model of Algorithm Selection for Constrained Continuous Optimisation

Semi-External Memory Sparse Matrix Multiplication on Billion-node Graphs in a Multicore Architecture

The Role of Typicality in Object Classification: Improving The Generalization Capacity of Convolutional Neural Networks

High order steady-state diffusion approximation of the Erlang-C system

On the smallest eigenvalues of covariance matrices of multivariate spatial processes

Optilization of the gyroaverage operator based on hermite interpolation

Classification with Boosting of Extreme Learning Machine Over Arbitrarily Partitioned Data

Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM

Ergodicity of Markov chain Monte Carlo with reversible proposal

On the analysis of tuberculosis studies with intermittent missing sputum data

Multilevel branching splitting algorithm for estimating rare event probabilities

Identifying heterogeneous transgenerational DNA methylation sites via clustering in beta regression

Modeling competition between two pharmaceutical drugs using innovation diffusion models

Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Machine Algorithm Over Vertically Distributed Data

‘Virus hunting’ using radial distance weighted discrimination

A stochastic space-time model for intermittent precipitation occurrences

Long time behavior of stochastic hard ball systems

Simulation of volatility modulated Volterra processes using hyperbolic stochastic partial differential equations

Statistical development and assessment of summary measures to account for isotopic clustering of Fourier transform mass spectrometry data in clinical diagnostic studies

Searching PubMed for articles relevant to clinical interpretation of rare human genetic variants

The Degree Distribution of Random Birth-and-Death Network with Network Size Decline

Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods

Stable sets in {ISK4,wheel}-free graphs

Fluctuation of matrix entries and application to outliers of elliptic matrices

Mixing Times of Markov Chains on Degree Constrained Orientations of Planar Graphs

The connected brain: Causality, models and intrinsic dynamics

Weak oddness as an approximation of oddness and resistance in cubic graphs

Spoofing detection under noisy conditions: a preliminary investigation and an initial database

Asymptotic arbitrage in fractional mixed markets

$q$-Bernstein functions and applications

A Kernel Test of Goodness of Fit

Thompson’s group $F$ is not Liouville

Any Finite Group is the Group of Some Binary, Convex Polytope

The world as its own best controller: a case study with anthropomimetic robots

A local constant factor approximation for the minimum dominating set problem on bounded genus graphs

A Convolutional Attention Network for Extreme Summarization of Source Code

Herding as a Learning System with Edge-of-Chaos Dynamics

Minimax Lower Bounds for Realizable Transductive Classification

Associative Long Short-Term Memory

The Structured Weighted Violations Perceptron Algorithm

Bayesian nonparametric image segmentation using a generalized Swendsen-Wang algorithm

Tutte’s invariant approach for Brownian motion reflected in the quadrant

Flow towards diagonalization for Many-Body-Localization models : adaptation of the Toda matrix differential flow to random quantum spin chains

On the Local Semicircular Law for Wigner Ensembles

Point Sets with Small Integer Coordinates and with Small Convex Polygons

Hole probability for zeroes of Gaussian Taylor series with finite radii of convergence

Local Codes with Cooperative Repair in Distributed Storage System

RECKONER: Read Error Corrector Based on KMC

On the Minimum Number of Edges in Triangle-Free 5-Critical Graphs

Barbara Made the News: Mining the Behavior of Crowds for Time-Aware Learning to Rank

Cycle-based Cluster Variational Method for Direct and Inverse Inference

Graphical Model Sketch

Optimal Entry to an Irreversible Investment Plan with Non Convex Costs

Range of (1,2) random walk in random environment

Number of fixed points and disjoint cycles in monotone Boolean networks

Characterizing the parent Hamiltonian for a complete set of orthogonal wave functions: An inverse quantum problem

The facial weak order and its lattice quotients