A Statistical Theory of Deep Learning via Proximal Splitting

In this paper we develop a statistical theory and an implementation of deep learning models. We show that an elegant variable splitting scheme for the alternating direction method of multipliers optimises a deep learning objective. We allow for non-smooth non-convex regularisation penalties to induce sparsity in parameter weights. We provide a link between traditional shallow layer statistical models such as principal component and sliced inverse regression and deep layer models. We also define the degrees of freedom of a deep learning predictor and a predictive MSE criteria to perform model selection for comparing architecture designs. We focus on deep multiclass logistic learning although our methods apply more generally. Our results suggest an interesting and previously under-exploited relationship between deep learning and proximal splitting techniques. To illustrate our methodology, we provide a multi-class logit classification analysis of Fisher’s Iris data where we illustrate the convergence of our algorithm. Finally, we conclude with directions for future research.

Aberration in qualitative multilevel designs

Generalized Word Length Pattern (GWLP) is an important and widely-used tool for comparing fractional factorial designs. We consider qualitative factors, and we code their levels using the roots of the unity. We write the GWLP of a fraction {\mathcal F} using the polynomial indicator function, whose coefficients encode many properties of the fraction. We show that the coefficient of a simple or interaction term can be written using the counts of its levels. This apparently simple remark leads to major consequence, including a convolution formula for the counts. We also show that the mean aberration of a term over the permutation of its levels provides a connection with the variance of the level counts. Moreover, using mean aberrations for symmetric s^m designs with s prime, we derive a new formula for computing the GWLP of {\mathcal F}. It is computationally easy, does not use complex numbers and also provides a clear way to interpret the GWLP. As case studies, we consider non-isomorphic orthogonal arrays that have the same GWLP. The different distributions of the mean aberrations suggest that they could be used as a further tool to discriminate between fractions.

BLC: Private Matrix Factorization Recommenders via Automatic Group Learning

We propose a privacy-enhanced matrix factorization recommender that exploits the fact that users can often be grouped together by interest. This allows a form of ‘hiding in the crowd’ privacy. We introduce a novel matrix factorization approach suited to making recommendations in a shared group (or nym) setting and the BLC algorithm for carrying out this matrix factorization in a privacy-enhanced manner. We demonstrate that the increased privacy does not come at the cost of reduced recommendation accuracy.

Constrained Conditional Moment Restriction Models

This paper examines a general class of inferential problems in semiparametric and nonparametric models defined by conditional moment restrictions. We construct tests for the hypothesis that at least one element of the identified set satisfies a conjectured (Banach space) ‘equality’ and/or (a Banach lattice) ‘inequality’ constraint. Our procedure is applicable to identified and partially identified models, and is shown to control the level, and under some conditions the size, asymptotically uniformly in an appropriate class of distributions. The critical values are obtained by building a strong approximation to the statistic and then bootstrapping a (conservatively) relaxed form of the statistic. Sufficient conditions are provided, including strong approximations using Koltchinskii’s coupling. Leading important special cases encompassed by the framework we study include: (i) Tests of shape restrictions for infinite dimensional parameters; (ii) Confidence regions for functionals that impose shape restrictions on the underlying parameter; (iii) Inference for functionals in semiparametric and nonparametric models defined by conditional moment (in)equalities; and (iv) Uniform inference in possibly nonlinear and severely ill-posed problems.

Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability

Many studies record replicated time series epochs from different groups with the goal of using frequency domain properties to discriminate between the groups. In many applications, there exists variation in cyclical patterns from time series in the same group. Although a number of frequency domain methods for the discriminant analysis of time series have been explored, there is a dearth of models and methods that account for within-group spectral variability. This article proposes a model for groups of time series in which transfer functions are modeled as stochastic variables that can account for both between-group and within-group differences in spectra that are identified from individual replicates. An ensuing discriminant analysis of stochastic cepstra under this model is developed to obtain parsimonious measures of relative power that optimally separate groups in the presence of within-group spectral variability. The approach possess favorable properties in classifying new observations and can be consistently estimated through a simple discriminant analysis of a finite number of estimated cepstral coefficients. Benefits in accounting for within-group spectral variability are empirically illustrated in a simulation study and through an analysis of gait variability.

Early text classification: a Naive solution

Text classification is a widely studied problem, and it can be considered solved for some domains and under certain circumstances. There are scenarios, however, that have received little or no attention at all, despite its relevance and applicability. One of such scenarios is early text classification, where one needs to know the category of a document by using partial information only. A document is processed as a sequence of terms, and the goal is to devise a method that can make predictions as fast as possible. The importance of this variant of the text classification problem is evident in domains like sexual predator detection, where one wants to identify an offender as early as possible. This paper analyzes the suitability of the standard naive Bayes classifier for approaching this problem. Specifically, we assess its performance when classifying documents after seeing an increasingly number of terms. A simple modification to the standard naive Bayes implementation allows us to make predictions with partial information. To the best of our knowledge naive Bayes has not been used for this purpose before. Throughout an extensive experimental evaluation we show the effectiveness of the classifier for early text classification. What is more, we show that this simple solution is very competitive when compared with state of the art methodologies that are more elaborated. We foresee our work will pave the way for the development of more effective early text classification techniques based in the naive Bayes formulation.

Significance Analysis of High-Dimensional, Low-Sample Size Partially Labeled Data

Classification and clustering are both important topics in statistical learning. A natural question herein is whether predefined classes are really different from one another, or whether clusters are really there. Specifically, we may be interested in knowing whether the two classes defined by some class labels (when they are provided), or the two clusters tagged by a clustering algorithm (where class labels are not provided), are from the same underlying distribution. Although both are challenging questions for the high-dimensional, low-sample size data, there has been some recent development for both. However, when it is costly to manually place labels on observations, it is often that only a small portion of the class labels is available. In this article, we propose a significance analysis approach for such type of data, namely partially labeled data. Our method makes use of the whole data and tries to test the class difference as if all the labels were observed. Compared to a testing method that ignores the label information, our method provides a greater power, meanwhile, maintaining the size, illustrated by a comprehensive simulation study. Theoretical properties of the proposed method are studied with emphasis on the high-dimensional, low-sample size setting. Our simulated examples help to understand when and how the information extracted from the labeled data can be effective. A real data example further illustrates the usefulness of the proposed method.

The Utility of Clustering in Prediction Tasks

We explore the utility of clustering in reducing error in various prediction tasks. Previous work has hinted at the improvement in prediction accuracy attributed to clustering algorithms if used to pre-process the data. In this work we more deeply investigate the direct utility of using clustering to improve prediction accuracy and provide explanations for why this may be so. We look at a number of datasets, run k-means at different scales and for each scale we train predictors. This produces k sets of predictions. These predictions are then combined by a na\’ive ensemble. We observed that this use of a predictor in conjunction with clustering improved the prediction accuracy in most datasets. We believe this indicates the predictive utility of exploiting structure in the data and the data compression handed over by clustering. We also found that using this method improves upon the prediction of even a Random Forests predictor which suggests this method is providing a novel, and useful source of variance in the prediction process.

A Bayesian Compressed Sensing Kalman Filter for Direction of Arrival Estimation

A certifying and dynamic algorithm for the recognition of proper circular-arc graphs

A CLT concerning critical points of random functions on a Euclidean space

A correction to the hydrodynamic limit of boundary driven exclusion processes in a super-diffusive time scale

A greedy algorithm for the minimization of a ratio of same-index element sums from two positive arrays

A kinetic equation for repulsive coalescing random jumps in continuum

A linear approximation algorithm for the BPP with the best possible absolute approximation ratio

A New Class of Probability Distributions for Describing the Spatial Statistics of Area-averaged Rainfall

A note on enumerating colored integer partitions

A note on linear fractional set packing problem

A powerful allele based test for case-control association studies

A reverse entropy power inequality for log-concave random vectors

A sequential approach to calibrate ecosystem models with multiple time series data

A Simple Near-Optimal Subdivision Algorithm for Complex Root Isolation based on the Pellet Test and Newton Iteration

A smooth component of the fractional Brownian motion and optimal portfolio selection

An objective function for STDP

Bernoulli line percolation

Bridging Mechanistic and Phenomenological Models of Complex Biological Systems

Chocolate Numbers

Cliques in C_4-free graphs of large minimum degree

CloudSimNFV: Modeling and Simulation of Energy-Efficient NFV in Cloud Data Centers

Combinatorial Intricacies of Labeled Fano Planes

Communication Complexity (for Algorithm Designers)

Denoising without access to clean data using a partitioned autoencoder

Dependable Structural Helath Monitoring Using Wireless Sensor Networks

Diffusive scaling in energy ginzburg-Landau dynamics

Efficient Algorithms for Morphisms over Omega-Regular Languages

Estimating standard errors for importance sampling estimators with multiple Markov chains

Exact Real Arithmetic with Perturbation Analysis and Proof of Correctness

Exploiting Reduction Rules and Data Structures: Local Search for Minimum Vertex Cover in Massive Graphs

Forward Backward Doubly Stochastic Differential Equations and the Optimal Filtering of Diffusion Processes

General Cheeger inequalities for p-Laplacians on graphs

Gray coding planar maps

Hybrid Optimization Algorithm for Large-Scale QoS-Aware Service Composition

Hyperbolicity impedes emergence of chimera states in networks of nonlocally coupled chaotic oscillators

Impact of noise on a dynamical system: prediction and uncertainties from a swarm-optimized neural network

Implementing Conservative Signal Processing Systems for Asynchronous, Distributed Optimization as a Web Service

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

Learning Visual Feature Spaces for Robotic Manipulation with Deep Spatial Autoencoders

Lower bounds for approximation schemes for Closest String

Many-Body Localization : construction of the emergent local conserved operators via block real-space renormalization

Mean-Reverting Portfolios: Tradeoffs Between Sparsity and Volatility

Monte Carlo estimation of the number of tatami tilings

Mott transition in a metallic liquid – Gutzwiller molecular dynamics simulations

Multi-Eulerian tours of directed graphs

Multilayer bootstrap network for unsupervised speaker recognition

New bounds on curve tangencies and orthogonalities

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third ‘CHIME’ Challenge

Numerical Sets, Core Partitions, and Integer Points in Polytopes

On some limit theorems following from Smith’s Theorem

On space efficiency of algorithms working on structural decompositions of graphs

On the Consistency of the Crossmatch Test

On the non-existence of a $(75,32,10,16)$ strongly regular graph

On the number of lambda terms with prescribed size of their De Bruijn representation

On the roots of hypergraph chromatic polynomials

Parallel Query in the Suffix Tree

Parameter Estimation for Misspecified Regression Models with Heteroskedastic Errors

Positive definite Hankel matrix completions and Hamburger moment completions

Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks

Self-normalized moderate deviation and laws of the iterated logarithm under G-expectation

Spectral analysis of the Moore-Penrose inverse of a large dimensional sample covariance matrix

Strong converses for group testing in the finite blocklength regime

Sub-Gaussian mean estimators

Telugu OCR Framework using Deep Learning

The pricing of contingent claims and optimal positions in asymptotically complete markets

There is no strongly regular graph with parameters (460,153,32,60)

Topological stability of continuous functions with respect to averagings

Trading Accuracy for Numerical Stability: Orthogonalization, Biorthogonalization and Regularization

Using Contracted Solution Graphs for Solving Reconfiguration Problems

When can splits be drawn in the plane?

Wilson Loop diagrams and Positroids

Word, graph and manifold embedding from Markov processes