A Different Approach to the Problem of Missing Data

There is a long history of devleopment of methodology dealing with missing data in statistical analysis. Today, the most popular methods fall into two classes, Complete Cases (CC) and Multiple Imputation (MI). Another approach, Available Cases (AC), has occasionally been mentioned in the research literature, in the context of linear regression analysis, but has generally been ignored. In this paper, we revisit the AC method, showing that it can perform better than CC and MI, and we extend its breadth of application.

Dirichlet Fragmentation Processes

Tree structures are ubiquitous in data across many domains, and many datasets are naturally modelled by unobserved tree structures. In this paper, first we review the theory of random fragmentation processes [Bertoin, 2006], and a number of existing methods for modelling trees, including the popular nested Chinese restaurant process (nCRP). Then we define a general class of probability distributions over trees: the Dirichlet fragmentation process (DFP) through a novel combination of the theory of Dirichlet processes and random fragmentation processes. This DFP presents a stick-breaking construction, and relates to the nCRP in the same way the Dirichlet process relates to the Chinese restaurant process. Furthermore, we develop a novel hierarchical mixture model with the DFP, and empirically compare the new model to similar models in machine learning. Experiments show the DFP mixture model to be convincingly better than existing state-of-the-art approaches for hierarchical clustering and density modelling.

Group Membership Prediction

The group membership prediction (GMP) problem involves predicting whether or not a collection of instances share a certain semantic property. For instance, in kinship verification given a collection of images, the goal is to predict whether or not they share a {\it familial} relationship. In this context we propose a novel probability model and introduce latent {\em view-specific} and {\em view-shared} random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our model posits that data from each view is independent conditioned on the shared variables. This postulate leads to a parametric probability model that decomposes group membership likelihood into a tensor product of data-independent parameters and data-dependent factors. We propose learning the data-independent parameters in a discriminative way with bilinear classifiers, and test our prediction algorithm on challenging visual recognition tasks such as multi-camera person re-identification and kinship verification. On most benchmark datasets, our method can significantly outperform the current state-of-the-art.

Markov modeling of Twitter tweet inter-arrival times

We introduce an improved model for human communication patterns, in particular for arrival times of Twitter tweets. We introduce a concept that allows to capture the dependence of subsequent waiting times between such events. The presence of such dependence not only matches intuition, but the data shows a significantly better fit for the new model, thus confirming the proposition.

On the Expressive Power of Deep Learning: A Tensor Analysis

It has long been conjectured that hypothesis spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical architectures than with shallow ones. Despite the vast empirical evidence, formal arguments to date are limited and do not capture the kind of networks used in practice. Using tensor factorization, we derive a universal hypothesis space implemented by an arithmetic circuit over functions applied to local data structures (e.g. image patches). The resulting networks first pass the input through a representation layer, and then proceed with a sequence of layers comprising sum followed by product-pooling, where sum corresponds to the widely used convolution operator. The hierarchical structure of networks is born from factorizations of tensors based on the linear weights of the arithmetic circuits. We show that a shallow network corresponds to a rank-1 decomposition, whereas a deep network corresponds to a Hierarchical Tucker (HT) decomposition. Log-space computation for numerical stability transforms the networks into SimNets. In its basic form, our main theoretical result shows that the set of polynomially sized rank-1 decomposable tensors has measure zero in the parameter space of polynomially sized HT decomposable tensors. In deep learning terminology, this amounts to saying that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require an exponential size if one wishes to implement (or approximate) them with a shallow network. Our construction and theory shed new light on various practices and ideas employed by the deep learning community, and in that sense bear a paradigmatic contribution as well.

A note on the polynomial moments of the partition function in the SK model

A one-sided symbol for Itô-Lévy processes

A Schauder estimate for stochastic PDEs

A Survey on the Eigenvalues Local Behavior of Large Complex Correlated Wishart Matrices

A two-state mixed hidden Markov model for risky teenage driving behavior

Adapting the Number of Particles in Sequential Monte Carlo Methods through an Online Scheme for Convergence Assessment

Algebraic stability of non-homogeneous diffusion processes with and without regime-switching

amLite: Amharic Transliteration Using Key Map Dictionary

An application of the Local C(G,T) Theorem to a conjecture of Weiss

An FPT 2-Approximation for Tree-Cut Decomposition

Auslander-Reiten quiver and representation theories related to KLR-type Schur-Weyl duality

Bayesian detection of embryonic gene expression onset in C. elegans

Bayesian inference for spatio-temporal spike and slab priors

Biased sampling designs to improve research efficiency: Factors influencing pulmonary function over time in children with asthma

Causal Model Analysis using Collider v-structure with Negative Percentage Mapping

Combinatorial Auslander-Reiten quivers and reduced expressions

Comment on Article by Ferreira and Gamerman

Comment on Article by Ferreira and Gamerman

Comment on Article by Ferreira and Gamerman

Comments on ‘Detecting Outliers in Gamma Distribution’ by M. Jabbari Nooghabi et al. (2010)

Cycles in enhanced hypercubes

Decomposition of Schramm-Loewner evolution along its curve

Estimating heterogeneous graphical models for discrete data with an application to roll call voting

Examining socioeconomic health disparities using a rank-dependent Rényi index

Extending partial isometries of generalized metric spaces

Fast Template Matching by Subsampled Circulant Matrix

hmmSeq: A hidden Markov model for detecting differentially expressed genes from RNA-seq data

Jump detection in generalized error-in-variables regression with an application to Australian health tax policies

Linear Embedding of Large-Scale Brain Networks for Twin fMRI

Maximum Matching in General Graphs Without Explicit Consideration of Blossoms Revisited

Melodic Contour and Mid-Level Global Features Applied to the Analysis of Flamenco Cantes

Modeling sequences and temporal networks with dynamic community structures

Multi-species distribution modeling using penalized mixture of regressions

Narrow arithmetic progressions in the primes

New self-dual additive $\mathbb{F}_4$-codes constructed from circulant graphs

On modeling of variability in mixture experiments with noise variables

On the spectrum of the normalized Laplacian of iterated triangulations of graphs

On the Stanley depth of powers of edge ideals

Parabolic Harnack inequality on fractal-like metric measure Dirichlet spaces

Quantum extensions of dynamical systems and of Markov semigroups

Rates of Convergence for Degree Distributions in a Dynamic Network Model

Recurrent Neural Networks for Driver Activity Anticipation via Sensory-Fusion Architecture

Sample size determination for training cancer classifiers from microarray and RNA-seq data

Sign-Perturbed Sums (SPS) with Instrumental Variables for the Identification of ARX Systems – Extended Version

Study of multiband disordered systems using the typical medium dynamical cluster approximation

Tracking rapid intracellular movements: A Bayesian random set approach

Uniform Inference in Nonparametric Predictive Regression, and a Unified Limit Theory for Spatial Density Estimation

Wavelet-domain regression and predictive inference in psychiatric neuroimaging

Zero-Shot Learning via Semantic Similarity Embedding