Boosted Sparse Non-linear Distance Metric Learning

This paper proposes a boosting-based solution addressing metric learning problems for high-dimensional data. Distance measures have been used as natural measures of (dis)similarity and served as the foundation of various learning methods. The efficiency of distance-based learning methods heavily depends on the chosen distance metric. With increasing dimensionality and complexity of data, however, traditional metric learning methods suffer from poor scalability and the limitation due to linearity as the true signals are usually embedded within a low-dimensional nonlinear subspace. In this paper, we propose a nonlinear sparse metric learning algorithm via boosting. We restructure a global optimization problem into a forward stage-wise learning of weak learners based on a rank-one decomposition of the weight matrix in the Mahalanobis distance metric. A gradient boosting algorithm is devised to obtain a sparse rank-one update of the weight matrix at each step. Nonlinear features are learned by a hierarchical expansion of interactions incorporated within the boosting algorithm. Meanwhile, an early stopping rule is imposed to control the overall complexity of the learned metric. As a result, our approach guarantees three desirable properties of the final metric: positive semi-definiteness, low rank and element-wise sparsity. Numerical experiments show that our learning model compares favorably with the state-of-the-art methods in the current literature of metric learning.


Variable Selection for Latent Class Analysis with Application to Low Back Pain Diagnosis

The identification of most relevant clinical criteria related to low back pain disorders is a crucial task for a quick and correct diagnosis of the nature of pain and its treatment. Data concerning low back pain can be of categorical nature, in form of check-list in which each item denotes presence or absence of a clinical condition. Latent class analysis is a model-based clustering method for multivariate categorical responses which can be applied to such data for a preliminary diagnosis of the type of pain. In this work we propose a variable selection method for latent class analysis applied to the selection of the most useful variables in detecting the group structure in the data. The method is based on the comparison of two different models and allows the discarding of those variables with no group information and those variables carrying the same information as the already selected ones. We consider a swap-stepwise algorithm where at each step the models are compared through and approximation to their Bayes factor. The method is applied to the selection of the clinical criteria most useful for the clustering of patients in different classes of pain. It is shown to perform a parsimonious variable selection and to give a good clustering performance. The quality of the approach is also assessed on simulated data.


Guaranteed algorithms for inference in topic models

One of the core problems in statistical models is the estimation of a posterior distribution. For topic models, the problem of posterior inference for individual texts is particularly important, especially when dealing with data streams, but is often intractable in the worst case \citep{SontagR11}. As a consequence, existing methods for posterior inference are approximate and do not have any guarantee on neither quality nor convergence rate. In this paper, we introduce a provably fast algorithm, namely \textit{Online Maximum a Posterior Estimation (OPE)}, for posterior inference in topic models. OPE has more attractive properties than existing inference approaches, including theoretical guarantees on quality and fast convergence rate. The discussions about OPE are very general and hence can be easily employed in a wide class of probabilistic models. Finally, we employ OPE to design three novel methods for learning Latent Dirichlet allocation from text streams or large corpora. Extensive experiments demonstrate some superior behaviors of OPE and of our new learning methods.


Inference in topic models: sparsity and trade-off

Topic models are popular for modeling discrete data (e.g., texts, images, videos, links), and provide an efficient way to discover hidden structures/semantics in massive data. One of the core problems in this field is the posterior inference for individual data instances. This problem is particularly important in streaming environments, but is often intractable. In this paper, we investigate the use of the Frank-Wolfe algorithm (FW) for recovering sparse solutions to posterior inference. From detailed elucidation of both theoretical and practical aspects, FW exhibits many interesting properties which are beneficial to topic modeling. We then employ FW to design fast methods, including ML-FW, for learning latent Dirichlet allocation (LDA) at large scales. Extensive experiments show that to reach the same predictiveness level, ML-FW can perform tens to thousand times faster than existing state-of-the-art methods for learning LDA from massive/streaming data.


Extremal Dependence Concepts

The probabilistic characterization of the relationship between two or more random variables calls for a notion of dependence. Dependence modeling leads to mathematical and statistical challenges, and recent developments in extremal dependence concepts have drawn a lot of attention to probability and its applications in several disciplines. The aim of this paper is to review various concepts of extremal positive and negative dependence, including several recently established results, reconstruct their history, link them to probabilistic optimization problems, and provide a list of open questions in this area. While the concept of extremal positive dependence is agreed upon for random vectors of arbitrary dimensions, various notions of extremal negative dependence arise when more than two random variables are involved. We review existing popular concepts of extremal negative dependence given in literature and introduce a novel notion, which in a general sense includes the existing ones as particular cases. Even if much of the literature on dependence is focused on positive dependence, we show that negative dependence plays an equally important role in the solution of many optimization problems. While the most popular tool used nowadays to model dependence is that of a copula function, in this paper we use the equivalent concept of a set of rearrangements. This is not only for historical reasons. Rearrangement functions describe the relationship between random variables in a completely deterministic way, allow a deeper understanding of dependence itself, and have several advantages on the approximation of solutions in a broad class of optimization problems.


Worst-case Optimal Probability Updating

This paper discusses an alternative to conditioning that may be used when the probability distribution is not fully specified. It does not require any assumptions (such as CAR: coarsening at random) on the unknown distribution. The well-known Monty Hall problem is the simplest scenario where neither naive conditioning nor the CAR assumption suffice to determine an updated probability distribution. This paper thus addresses a generalization of that problem to arbitrary distributions on finite outcome spaces, arbitrary sets of `messages’, and (almost) arbitrary loss functions, and provides existence and characterization theorems for worst-case optimal probability updating strategies. We find that for logarithmic loss, optimality is characterized by an elegant condition, which we call RCAR (reverse coarsening at random). Under certain conditions, the same condition also characterizes optimality for a much larger class of loss functions, and we obtain an objective and general answer to how one should update probabilities in the light of new information.


The extended 1-perfect trades in small hypercubes

The p-filter: multi-layer FDR control for grouped hypotheses

Convolutional Monte Carlo Rollouts in Go

MDS codes in the Doob graphs

Construction of ODE systems from time series data by a highly flexible modelling approach

A New Class of Skewed Bimodal Distributions

Detecting monopole charge in Weyl semimetals via quantum interference transport

Adaptive multi-stage integrators for optimal energy conservation in molecular simulations

AcSel: selecting variables with accuracy in correlated datasets

Toward more localized local algorithms: removing assumptions concerning global knowledge

A simple explicit bijection between (n,2) Gog and Magog trapezoids

Tail distribution of arbitrary random vectors in high dimension

Generalized Weyl modules, alcove paths and Macdonald polynomials

Histogram Arithmetic under Uncertainty of Probability Density Function

On the extention of propelinear structures of Nordstrom-Robinson code to Hamming code

On maximum components of a class of perfect codes

Ramsey-type theorems for lines in 3-space

Simultaneous Avoidance of a Vincular and a Covincular Pattern of Length 3

Spectral Compressed Sensing via CANDECOMP/PARAFAC Decomposition of Incomplete Tensors

Parameterized Tractability of the Maximum-Duo Preservation String Mapping Problem

Norm-Free Radon-Nikodym Approach to Machine Learning

Functional Data Analysis of Amplitude and Phase Variation

Gated networks: an inventory

The Strong Arnold Property for 4-connected flat graphs

Graph-theoretic autofill

Unified treatment of the asymptotics of asymmetric kernel density estimators

On tail behaviour of $k$-th upper order statistics under fixed and random sample sizes via tail equivalence

Monotonicity of the collateralized debt obligations term structure model

Semantic Boolean Arabic Information Retrieval

Semantic Arabic Information Retrieval Framework

Quantum ergodicity of Wigner induced spherical harmonics

Flexible constraint satisfiability and a problem in semigroup theory

Accurate Reaction-Diffusion Operator Splitting on Tetrahedral Meshes for Parallel Stochastic Molecular Simulations

Local law for the product of independent non-Hermitian matrices with independent entries

Restarted Stochastic subGradient Descent for Non-smooth Non-strongly Convex Optimization

The Construction and Properties of Assortative Configuration Graphs

Gamma Belief Networks

Q-polynomial invariant of rooted trees