• Application of Quantum Annealing to Training of Deep Neural Networks
In Deep Learning, a well-known approach for training a Deep Neural Network starts by training a generative Deep Belief Network model, typically using Contrastive Divergence (CD), then fine-tuning the weights using backpropagation or other discriminative techniques. However, the generative training can be time-consuming due to the slow mixing of Gibbs sampling. We investigated an alternative approach that estimates model expectations of Restricted Boltzmann Machines using samples from a D-Wave quantum annealing machine. We tested this method on a coarse-grained version of the MNIST data set. In our tests we found that the quantum sampling-based training approach achieves comparable or better accuracy with significantly fewer iterations of generative training than conventional CD-based training. Further investigation is needed to determine whether similar improvements can be achieved for other data sets, and to what extent these improvements can be attributed to quantum effects.
• Submodularity in Statistics: Comparing the Success of Model Selection Methods
We demonstrate the usefulness of submodularity in statistics. Greedy algorithms such as forward stepwise regression and the lasso perform well in situations that can be characterized by submodularity. In particular, submodularity of the coefficient of determination, R
, provides a natural way to analyze the effects of collinearity on model selection. In model selection, we encounter the search problem of identifying a subset of k covariates with predictive loss close to that of the best model of k covariates. Submodularity arises naturally in this setting due to its deep roots within combinatorial optimization. It provides structural results for discrete convexity as well as guarantees for the success of greedy algorithms. In statistics, submodularity isolates cases in which collinearity makes the choice of model features difficult from those in which this task is routine. Submodularity of R
is closely related to other statistical assumptions used in variable screening and proving the performance of the Lasso. This has important implications for the use of model selection procedures: the situations in which forward stepwise and Lasso are successful are closely related.
• Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network
Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for tagging sequential data, e.g. speech utterances or handwritten documents. While word embedding has been demoed as a powerful representation for characterizing the statistical properties of natural language. In this study, we propose to use BLSTM-RNN with word embedding for part-of-speech (POS) tagging task. When tested on Penn Treebank WSJ test set, a state-of-the-art performance of 97.40 tagging accuracy is achieved. Without using morphological features, this approach can also achieve a good performance comparable with the Stanford POS tagger.
• High Performance Latent Variable Models
Latent variable models have accumulated a considerable amount of interest from the industry and academia for their versatility in a wide range of applications. A large amount of effort has been made to develop systems that is able to extend the systems to a large scale, in the hope to make use of them on industry scale data. In this paper, we describe a system that operates at a scale orders of magnitude higher than previous works, and an order of magnitude faster than state-of-the-art system at the same scale, at the same time showing more robustness and more accurate results. Our system uses a number of advances in distributed inference: high performance in synchronization of sufficient statistics with relaxed consistency model; fast sampling, using the Metropolis-Hastings-Walker method to overcome dense generative models; statistical modeling, moving beyond Latent Dirichlet Allocation (LDA) to Pitman-Yor distributions (PDP) and Hierarchical Dirichlet Process (HDP) models; sophisticated parameter projection schemes, to resolve the conflicts within the constraint between parameters arising from the relaxed consistency model. This work significantly extends the domain of applicability of what is commonly known as the Parameter Server. We obtain results with up to hundreds billion oftokens, thousands of topics, and a vocabulary of a few million token-types, using up to 60,000 processor cores operating on a production cluster of a large Internet company. This demonstrates the feasibility to scale to problems orders of magnitude larger than any previously published work.
• Dimensionality Reduction for Binary Data through the Projection of Natural Parameters
Principal component analysis (PCA) for binary data, known as logistic PCA, has become a popular alternative to dimensionality reduction of binary data. It is motivated as an extension of ordinary PCA by means of a matrix factorization, akin to the singular value decomposition, that maximizes the Bernoulli log-likelihood. We propose a new formulation of logistic PCA which extends Pearson’s formulation of a low dimensional data representation with minimum error to binary data. Our formulation does not require a matrix factorization, as previous methods do, but instead looks for projections of the natural parameters from the saturated model. Due to this difference, the number of parameters does not grow with the number of observations and the principal component scores on new data can be computed with simple matrix multiplication. We derive explicit solutions for data matrices of special structure and provide computationally efficient algorithms for solving for the principal component loadings. Through simulation experiments and an analysis of medical diagnoses data, we compare our formulation of logistic PCA to the previous formulation as well as ordinary PCA to demonstrate its benefits.
• Spectral statistics of sparse Erdős-Rényi graph Laplacians
• Noncrossing partitions, toggles, and homomesies
• Localization of Quantum States and Landscape Functions
• Scaling limits for the critical Fortuin-Kastelyn model on a random planar map III: finite volume case
• Prevalence and recoverability of syntactic parameters in sparse distributed memories
• Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems
• The permutation class Av(4213,2143)
• Revisiting Alpha-Investing: Conditionally Valid Stepwise Regression
• Efficient measurement of point-to-set correlations and overlap fluctuations in glass-forming liquids
• A Risk Ratio Comparison of $l_0$ and $l_1$ Penalized Regression
• Approximation of delay differential equations at the verge of instability by equations without delay
• Bayesian Nonparametric Density Estimation under Length Bias
• GLASSES: Relieving The Myopia Of Bayesian Optimisation
• Universality in marginally relevant disordered systems
• Monotone and additive Markov process duality
• Propagation of chaos for the Vlasov-Poisson-Fokker-Planck system in 1D
• Computing LZ77 in Run-Compressed Space
• On the Erdős-Szekeres Conjecture
• Likelihood Ratio Tests for a Dose-Response Effect using Multiple Nonlinear Regression Models
• A generalization of Onsager’s reciprocity relations to gradient flows with nonlinear mobility
• On the discontinuity of the specific heat of the Ising model on a scale-free network
• On the number of latin hypercubes, pairs of orthogonal latin squares and MDS codes
• A preparation theorem for the kashiwara $b(\infty)$ crystal
• Functional delta-method for the bootstrap of quasi-Hadamard differentiable functionals
• Quenched localisation in the Bouchaud trap model with slowly varying traps
• Learning-based Compressive Subsampling
• There is exactly one Z2Z4-cyclic 1-perfect code
• An investigation of equilibration in small quantum systems
• The Saturation Time of Graph Bootstrap Percolation
• Creating Scalable and Interactive Web Applications Using High Performance Latent Variable Models
• Homogenization of periodic diffusion with small jumps
• Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions
• Automated Synchronization of Driving Data Using Vibration and Steering Events
• Improved Lower Bounds on the Classical Ramsey Numbers R(4,22) and R(4,25)
• When Are Nonconvex Problems Not Scary?
• The Effect of Data Swapping on Analyses of American Community Survey Data
• Quantile Versions of the Lorenz Curve
• The parabolic Taylor formula of the implied volatility
• Regularization vs. Relaxation: A conic optimization perspective of statistical variable selection
• Coagulation-fragmentation model for animal group-size statistics
• Input Sparsity and Hardness for Robust Subspace Approximation
• Staircase diagrams and enumeration of smooth Schubert varieties
• Refined Turán numbers and Ramsey numbers for the loose 3-uniform path of length three
• Scalable posterior approximations for large-scale Bayesian inverse problems via likelihood-informed parameter and state reduction
• The Complexity of Pattern Matching for $321$-Avoiding and Skew-Merged Permutations
• Nonlinear stochastic heat equation driven by spatially colored noise: moments and intermittency
Like this:
Like Loading...