Bayesian Tensor Regression

This article proposes a Bayesian approach to regression with a scalar response against vector and tensor covariates. Tensor covariates are commonly vectorized prior to analysis, failing to exploit the structure of the tensor, and resulting in poor estimation and predictive performance. We develop a novel class of multiway shrinkage priors for the coefficients in tensor regression models. Properties are described, including posterior consistency under mild conditions, and an efficient Markov chain Monte Carlo algorithm is developed for posterior computation. Simulation studies illustrate substantial gains over vectorizing or using existing tensor regression methods in terms of estimation and parameter inference. The approach is further illustrated in a neuroimaging application. Keywords: Dimension reduction; multiway shrinkage prior; magnetic resonance imaging (MRI); parafac decomposition; posterior consistency; tensor regression

Deep Boltzmann Machines in Estimation of Distribution Algorithms for Combinatorial Optimization

Estimation of Distribution Algorithms (EDAs) require flexible probability models that can be efficiently learned and sampled. Deep Boltzmann Machines (DBMs) are generative neural networks with these desired properties. We integrate a DBM into an EDA and evaluate the performance of this system in solving combinatorial optimization problems with a single objective. We compare the results to the Bayesian Optimization Algorithm. The performance of DBM-EDA was superior to BOA for difficult additively decomposable functions, i.e., concatenated deceptive traps of higher order. For most other benchmark problems, DBM-EDA cannot clearly outperform BOA, or other neural network-based EDAs. In particular, it often yields optimal solutions for a subset of the runs (with fewer evaluations than BOA), but is unable to provide reliable convergence to the global optimum competitively. At the same time, the model building process is computationally more expensive than that of other EDAs using probabilistic models from the neural network family, such as DAE-EDA.

Deep Reinforcement Learning with Double Q-learning

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether this harms performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Diverse Yet Efficient Retrieval using Hash Functions

Typical retrieval systems have three requirements: a) Accurate retrieval i.e., the method should have high precision, b) Diverse retrieval, i.e., the obtained set of points should be diverse, c) Retrieval time should be small. However, most of the existing methods address only one or two of the above mentioned requirements. In this work, we present a method based on randomized locality sensitive hashing which tries to address all of the above requirements simultaneously. While earlier hashing approaches considered approximate retrieval to be acceptable only for the sake of efficiency, we argue that one can further exploit approximate retrieval to provide impressive trade-offs between accuracy and diversity. We extend our method to the problem of multi-label prediction, where the goal is to output a diverse and accurate set of labels for a given document in real-time. Moreover, we introduce a new notion to simultaneously evaluate a method’s performance for both the precision and diversity measures. Finally, we present empirical results on several different retrieval tasks and show that our method retrieves diverse and accurate images/labels while ensuring 100x-speed-up over the existing diverse retrieval approaches.

Efficient Neighborhood Selection for Gaussian Graphical Models

This paper addresses the problem of neighborhood selection for Gaussian graphical models. We present two heuristic algorithms: a forward-backward greedy algorithm for general Gaussian graphical models based on mutual information test, and a threshold-based algorithm for walk summable Gaussian graphical models. Both algorithms are shown to be structurally consistent, and efficient. Numerical results show that both algorithms work very well.

Mode Identification and Data-Driven Sciences

Bump-hunting or mode identification is a fundamental problem that arises in almost every scientific field of data-driven discovery. Surprisingly, very few data modeling tools are available for automatic (not requiring manual case-by-base investigation), objective (not subjective), and nonparametric (not based on restrictive parametric model assumptions) mode discovery, which can scale to large data sets. This article introduces LPMode–an algorithm based on a new theory for detecting multimodality of a probability density. We apply LPMode to answer important research questions arising in various fields from environmental science, ecology, econometrics, analytical chemistry to astronomy and cancer genomics.

Stochastic gradient descent methods for estimation with large data sets

We develop methods for parameter estimation in settings with large-scale data sets, where traditional methods are no longer tenable. Our methods rely on stochastic approximations, which are computationally efficient as they maintain one iterate as a parameter estimate, and successively update that iterate based on a single data point. When the update is based on a noisy gradient, the stochastic approximation is known as standard stochastic gradient descent, which has been fundamental in modern applications with large data sets. Additionally, our methods are numerically stable because they employ implicit updates of the iterates. Intuitively, an implicit update is a shrinked version of a standard one, where the shrinkage factor depends on the observed Fisher information at the corresponding data point. This shrinkage prevents numerical divergence of the iterates, which can be caused either by excess noise or outliers. Our sgd package in R offers the most extensive and robust implementation of stochastic gradient descent methods. We demonstrate that sgd dominates alternative software in runtime for several estimation problems with massive data sets. Our applications include the wide class of generalized linear models as well as M-estimation for robust regression.

Tensorizing Neural Networks

Deep neural networks currently demonstrate state-of-the-art performance in several domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper we convert the dense weight matrices of the fully-connected layers to the Tensor Train format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved. In particular, for the Very Deep VGG networks we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times.

A Compositional Explanation of the Pet Fish Phenomenon

A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing

A new approach for analyzing panel AR(1) series with application to the unit root test

A New Robust Regression Method Based on Minimization of Geodesic Distances on a Probabilistic Manifold: Application to Power Laws

A Review of Features for the Discrimination of Twitter Users: Application to the Prediction of Offline Influence

A stochastic method of solution of the Parker transport equation

Another dual of MacMahon’s theorem on plane partitions

Bin Packing with Linear Usage Costs

Bloch functions and asymptotic tail variance

Classification error in multiclass discrimination from Markov data

Closed orders and closed graphs

CLTs for general branching processes related to splitting trees

Complex ordering in spin networks: Critical role of adaptation rate for dynamically evolving interactions

Consistency bands for the mean excess function and application to graphical goodness of fit test for financial data

Cosmic Web Reconstruction through Density Ridges: Catalogue

Decomposing highly edge-connected graphs into paths of any given length

Designed Sampling from Large Databases for Controlled Trials

Detecting Effects of Filaments on Galaxy Properties in the Sloan Digital Sky Survey III

Dynamic graph connectivity with improved worst case update time and sublinear space

Equidistribution and $β$ ensembles

Exchangeable Markov processes on graphs: Feller case

Excursion theory for Brownian motion indexed by the Brownian tree

Flexible Extreme Value Inference And Hill Plots For A Small, Mid And Large Samples

Generating maps on surfaces

Graph Kernels exploiting Weisfeiler-Lehman Graph Isomorphism Test Extensions

Graph-Based Lossless Markov Lumpings

Harmonic Extension

Identifying collusion groups using spectral clustering

Improved bounds and parallel algorithms for the Lovasz Local Lemma

Incremental Similarity and Turbulence

Induced subgraphs of graphs with large chromatic number. IV. Consecutive holes

Inertial Sensor Arrays, Maximum Likelihood, and Cramér-Rao Bound

Information-theoretic bounds for exact recovery in weighted stochastic block models using the Renyi divergence

Mean-field interaction of Brownian occupation measures, I: uniform tube property of the Coulomb functional

Modifying iterated Laplace approximations

Moments of the frequency spectrum of a splitting tree with neutral Poissonian mutations

Motility-driven glass and jamming transitions in biological tissues

On $t$-core towers and $t$-defects of partitions

On a conjecture of Neumann

On the Euler-Maruyama approximation for one-dimensional stochastic differential equations with irregular coefficients

Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games

Quasi-MLE for quadratic ARCH model with long memory

Random walks systems with finite lifetime on $ \bbZ $

Reasoning about Entailment with Neural Attention

Relatively exchangeable structures

Set-valued Brownian motion

Spectral analysis of stable processes on the positive half-line

Stochastic approach to the numerical solution of the non-stationary Parker’s transport equation

The coefficient of reduced Bartholdi zeta function

The Distribution of Permutation Matrix Entries Under Randomized Basis