Informative Data Projections: A Framework and Two Examples

Methods for Projection Pursuit aim to facilitate the visual exploration of high-dimensional data by identifying interesting low-dimensional projections. A major challenge is the design of a suitable quality metric of projections, commonly referred to as the projection index, to be maximized by the Projection Pursuit algorithm. In this paper, we introduce a new information-theoretic strategy for tackling this problem, based on quantifying the amount of information the projection conveys to a user given their prior beliefs about the data. The resulting projection index is a subjective quantity, explicitly dependent on the intended user. As a useful illustration, we developed this idea for two particular kinds of prior beliefs. The first kind leads to PCA (Principal Component Analysis), shining new light on when PCA is (not) appropriate. The second kind leads to a novel projection index, the maximization of which can be regarded as a robust variant of PCA. We show how this projection index, though non-convex, can be effectively maximized using a modified power method as well as using a semidefinite programming relaxation. The usefulness of this new projection index is demonstrated in comparative empirical experiments against PCA and a popular Projection Pursuit method.

Convergence in Navigational Reinforcement Learning

Reinforcement learning is a formal framework for modeling agents that learn to solve tasks. For example, one important task for animals is to navigate in an environment to find food or to return to their nest. In such tasks, the agent has to learn a path through the environment from start states to goal states, by visiting a sequence of intermediate states. The agent receives reward on a goal state. Concretely, we need to learn a policy that maps each encountered state to an immediate action that leads to a next state, eventually leading to a goal state. We say that a learning process has converged if eventually the policy will no longer change, i.e., the policy stabilizes. The intuition of paths and navigation policies can be applied generally. Indeed, in this article, we study navigation tasks formalized as a graph structure that, for each application of an action to a state, describes the possible successor states that could result from that application. In contrast to standard reinforcement learning, we essentially simplify numeric reward signals to boolean flags on the transitions in the graph. The resulting framework enables a clear theoretical study of how properties of the graph structure can cause convergence of the learning process. In particular, we formally study a learning process that detects revisits to states in the graph, i.e., we detect cycles, and the process keeps adjusting the policy until no more cycles are made. So, eventually, the agent goes straight to reward from each start state. We identify reducibility of the task graph as a sufficient condition for this learning process to converge. We also syntactically characterize the form of the final policy, which can be used to detect convergence in a simulation.

A C-LSTM Neural Network for Text Classification

Neural network models have been demonstrated to be capable of achieving remarkable performance in sentence and document modeling. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modeling tasks, which adopt totally different ways of understanding natural languages. In this work, we combine the strengths of both architectures and propose a novel and unified model called C-LSTM for sentence representation and text classification. C-LSTM utilizes CNN to extract a sequence of higher-level phrase representations, and are fed into a long short-term memory recurrent neural network (LSTM) to obtain the sentence representation. C-LSTM is able to capture both local features of phrases as well as global and temporal sentence semantics. We evaluate the proposed architecture on sentiment classification and question classification tasks. The experimental results show that the C-LSTM outperforms both CNN and LSTM and can achieve excellent performance on these tasks.

A Stochastic Process Model of Classical Search

Among classical search algorithms with the same heuristic information, with sufficient memory A* is essentially as fast as possible in finding a proven optimal solution. However, in many situations optimal solutions are simply infeasible, and thus search algorithms that trade solution quality for speed are desirable. In this paper, we formalize the process of classical search as a metalevel decision problem, the Abstract Search MDP. For any given optimization criterion, this establishes a well-defined notion of the best possible behaviour for a search algorithm and offers a theoretical approach to the design of algorithms for that criterion. We proceed to approximately solve a version of the Abstract Search MDP for anytime algorithms and thus derive a novel search algorithm, Search by Maximizing the Incremental Rate of Improvement (SMIRI). SMIRI is shown to outperform current state-of-the-art anytime search algorithms on a parametrized stochastic tree model for most of the tested parameter values.

Regularized EM Algorithms: A Unified Framework and Provable Statistical Guarantees

Latent variable models are a fundamental modeling tool in machine learning applications, but they present significant computational and analytical challenges. The popular EM algorithm and its variants, is a much used algorithmic tool; yet our rigorous understanding of its performance is highly incomplete. Recently, work in Balakrishnan et al. (2014) has demonstrated that for an important class of problems, EM exhibits linear local convergence. In the high-dimensional setting, however, the M-step may not be well defined. We address precisely this setting through a unified treatment using regularization. While regularization for high-dimensional problems is by now well understood, the iterative EM algorithm requires a careful balancing of making progress towards the solution while identifying the right structure (e.g., sparsity or low-rank). In particular, regularizing the M-step using the state-of-the-art high-dimensional prescriptions (e.g., Wainwright (2014)) is not guaranteed to provide this balance. Our algorithm and analysis are linked in a way that reveals the balance between optimization and statistical errors. We specialize our general framework to sparse gaussian mixture models, high-dimensional mixed regression, and regression with missing variables, obtaining statistical guarantees for each of these examples.

Variable Selection in Causal Inference using a Simultaneous Penalization Method

In the causal adjustment setting, variable selection techniques based on one of either the outcome or treatment allocation model can result in the omission of confounders, which leads to bias, or the inclusion of spurious variables, which leads to variance inflation, in the propensity score. We propose a variable selection method based on a penalized objective function which considers the outcome and treatment assignment models simultaneously. The proposed method facilitates confounder selection in high-dimensional settings. We show that under regularity conditions our method attains the oracle property. The selected variables are used to form a doubly robust regression estimator of the treatment effect. We show that under some conditions our method attains the oracle property. Simulation results are presented and economic growth data are analyzed. Specifically, we study the effect of life expectancy as a measure of population health on the average growth rate of gross domestic product per capita.

Distributed Machine Learning via Sufficient Factor Broadcasting

Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML problems starting at millions of samples and tens of thousands of classes, their parameter matrix can grow at an unexpected rate, resulting in high parameter synchronization costs that greatly slow down distributed learning. To address this issue, we propose a Sufficient Factor Broadcasting (SFB) computation model for efficient distributed learning of a large family of matrix-parameterized models, which share the following property: the parameter update computed on each data sample is a rank-1 matrix, i.e., the outer product of two ‘sufficient factors’ (SFs). By broadcasting the SFs among worker machines and reconstructing the update matrices locally at each worker, SFB improves communication efficiency — communication costs are linear in the parameter matrix’s dimensions, rather than quadratic — without affecting computational correctness. We present a theoretical convergence analysis of SFB, and empirically corroborate its efficiency on four different matrix-parametrized ML models.

An Introduction to Convolutional Neural Networks

This document provides a brief introduction to Convolutional Neural Networks (CNNs), discussing recently published papers and newly form techniques in developing these brilliantly fantastic image recognition models. This introduction assumes you are familiar with the workings of neural networks and have some background in Artificial Intelligence.

OntoSeg: a Novel Approach to Text Segmentation using Ontological Similarity

Text segmentation (TS) aims at dividing long text into coherent segments which reflect the subtopic structure of the text. It is beneficial to many natural language processing tasks, such as Information Retrieval (IR) and document summarisation. Current approaches to text segmentation are similar in that they all use word-frequency metrics to measure the similarity between two regions of text, so that a document is segmented based on the lexical cohesion between its words. Various NLP tasks are now moving towards the semantic web and ontologies, such as ontology-based IR systems, to capture the conceptualizations associated with user needs and contents. Text segmentation based on lexical cohesion between words is hence not sufficient anymore for such tasks. This paper proposes OntoSeg, a novel approach to text segmentation based on the ontological similarity between text blocks. The proposed method uses ontological similarity to explore conceptual relations between text segments and a Hierarchical Agglomerative Clustering (HAC) algorithm to represent the text as a tree-like hierarchy that is conceptually structured. The rich structure of the created tree further allows the segmentation of text in a linear fashion at various levels of granularity. The proposed method was evaluated on a wellknown dataset, and the results show that using ontological similarity in text segmentation is very promising. Also we enhance the proposed method by combining ontological similarity with lexical similarity and the results show an enhancement of the segmentation quality.

The Automatic Statistician: A Relational Perspective

Gaussian processes (GPs) provide a general and analytically tractable way of capturing complex time-varying, nonparametric functions. The time varying parameters of GPs can be explained as a composition of base kernels such as linear, smoothness or periodicity in that covariance kernels are closed under addition and multiplication. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GPs. Unfortunately, learning a composite covariance kernel with a single time-series dataset often results in less informative kernels instead of finding qualitative distinct descriptions. We address this issue by proposing a relational kernel learning which can model relationship between sets of data and find shared structure among the time series datasets. We show the shared structure can help learning more accurate models for sets of regression problems with some synthetic data, US top market capitalization stock data and US house sales index data.

Random Forests for Big Data

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include data streams and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper reviews available proposals about random forests in parallel environments as well as about online random forests. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment three variants involving subsampling, Big Data-bootstrap and MapReduce respectively, on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data.

Recurrent Instance Segmentation

Instance segmentation is the problem of detecting and delineating each object of interest appearing in an image. Current instance segmentation approaches consist of ensembles of modules that are trained independently of each other, thus missing learning opportunities. Here we propose a new instance segmentation paradigm consisting in an end-to-end method that learns how to segment instances sequentially. The model is based on a recurrent neural network that sequentially finds objects and their segmentations one at a time. This net is provided with a spatial memory that keeps track of what pixels have been explained and allows handling occlusion. In order to train the model we designed a new principled loss function that accurately represents the properties of the instance segmentation problem. In the experiments carried out we found that our method outperforms all 5 state of the art approaches on the Plant Phenotyping dataset.

Neural GPUs Learn Algorithms

Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that use backpropagation to learn their own programming. Despite their appeal NTMs have a weakness that is caused by their sequential nature: they cannot be parallelized and are are hard to train due to their large depth when unfolded. We present a neural network architecture to address this problem: the Neural GPU. It is based on a type of convolutional gated recurrent unit and, like the NTM, is computationally universal. Unlike the NTM, the Neural GPU is highly parallel which makes it easier to train and efficient to run. An essential property of algorithms is their ability to handle inputs of arbitrary size. We show that the Neural GPU can be trained on short instances of an algorithmic task and successfully generalize to long instances. We verified it on a number of tasks including long addition and long multiplication of numbers represented in binary. We train the Neural GPU on numbers with upto 20 bits and observe no errors whatsoever while testing it, even on much longer numbers. To achieve these results we introduce a technique for training deep recurrent networks: parameter sharing relaxation. We also found a small amount of dropout and gradient noise to have a large positive effect on learning and generalization.

On the length of fully commutative elements

Multiagent Cooperation and Competition with Deep Reinforcement Learning

Planar transitive graphs

Adjusted Priors for Bayes Factors Involving Reparameterized Order Constraints

Cliques in the union of $C_4$-free graphs

Gradient Estimation with Simultaneous Perturbation and Compressive Sensing

Universality of the mean number of real zeros of random trigonometric polynomials under a weak Cramer condition

A class of cyclic $(v;k_1,k_2,k_3;λ)$ difference families with $v \equiv 3 \pmod{4}$ a prime

Kolmogorov type and general extension results for nonlinear expectations

Dominating Sets inducing Large Components in Maximal Outerplanar Graphs

A GA based approach for task scheduling in multi-cloud environment

Concentration behavior of the penalized least squares estimator

Low-degree Boolean functions on $S_n$, with an application to isoperimetry

The complement of the figure-eight knot geometrically bounds

Classical and Quantum Parts of the Quantum Dynamics: the Discrete-Time Case

Algorithms for Differentially Private Multi-Armed Bandits

Domains of weak continuity of statistical functionals with a view toward robust statistics

On the spectral characterization of pinapple graphs

On oriented cliques with respect to push operation

A Bayesian binary classification approach to pure tone audiometry

Cascading failures in coupled networks with both inner-dependency and interdependency links

Longest increasing subsequences and log concavity

Tight Bounds for Gomory-Hu-like Cut Counting

A Lasserre Lower Bound for the Min-Sum Single Machine Scheduling Problem

Arm Exponents for SLE$(κ)$

Category Enhanced Word Embedding

On Asymptotic Properties of the Separating Hill Estimator

Non Parametric Hidden Markov Models with Finite State Space: Posterior Concentration Rates

Universal behavior of crystalline membranes: crumpling transition and Poisson ratio of the flat phase

Predecessor problem on smooth distributions

On Game-Theoretic Risk Management (Part Two) – Algorithms to Compute Nash-Equilibria in Games with Distributions as Payoffs

Shaping Proto-Value Functions via Rewards

Study of almost everywhere convergence of series by means of martingale methods

Sparse Signal Recovery Using gOMP Assisted mOLS

Elementary proof of convergence to the mean-field model for the SIR process

On stochastic perturbations of slowly changing dynamical systems

Simultaneous Private Learning of Multiple Concepts

Wilkinson’s Inertia-Revealing Factorization and Its Application to Sparse Matrices

A Cooperative Scheduling Scheme of Local Cloud and Internet Cloud for Delay-Aware Mobile Cloud Computing

Gaussian Elimination with Randomized Complete Pivoting

Singular rational curves in P^n via semigroups

Some Epistemological Problems with the Knowledge Level in Cognitive Architectures

Closability, regularity, and approximation by graphs for separable bilinear forms

Iterative Instance Segmentation

Incremental Truncated LSTD

Bayesian Network Models for Adaptive Testing

Mean width of regular polytopes and expected maxima of correlated Gaussian variables

Mixing and large deviations for nonlinear wave equation with white noise

A Symbolic SAT-based Algorithm for Almost-sure Reachability with Small Strategies in POMDPs

Decidability and Complexity for Quiescent Consistency and its Variations

Characterization of DNA methylation as a function of biological complexity via dinucleotide inter-distances

Existence of Continuous and Càdlàg Versions for Cylindrical Processes in the Dual of a Nuclear Space

Typical sumsets of linear codes

On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals

TGSum: Build Tweet Guided Multi-Document Summarization Dataset

Beyond OWL 2 QL in OBDA: Rewritings and Approximations (Extended Version)

The Mechanism of Additive Composition

Gains and Losses are Fundamentally Different in Regret Minimization: The Sparse Case

Improved Efficiency in the Analysis of Randomized Trials with Survival Outcomes

On bounding the difference between the maximum degree and the chromatic number by a constant

Regularizing RNNs by Stabilizing Activations

Bridging the gap between rooted and unrooted phylogenetic networks

Second-Order Inference for the Mean of a Variable Missing at Random

On randomization of neural networks as a form of post-learning strategy

Performance bounds for the mean-field limit of constrained dynamics

A global Constraint for mining Sequential Patterns with GAP constraint

Complete monotonicity and bernstein properties of functions are characterized by their restriction on N

Theoretical study on density of microscopic states in configuration space via Random Matrix

Named Entity Recognition with Bidirectional LSTM-CNNs

Engineering Oracles for Time-Dependent Road Networks

Hierarchical classification of e-commerce related social media

A spectral sequence for polyhedral products

Conditional assessment of the impact of a Hausman pretest on confidence intervals

A tight lower bound for Vertex Planarization on graphs of bounded treewidth

Welfare of Sequential Allocation Mechanisms for Indivisible Goods

A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations

Feature selection for longitudinal microarray data by adapting a pathway analysis method

Kernel Density Estimation on Embedded Manifolds with Boundary

On the hardness of learning sparse parities

The maximal order of hyper-($b$-ary)-expansions

Quantum Algorithms and Circuits for Scientific Computing

The poset on connected graphs is Sperner

Shattered Sets and the Hilbert Function

Information metrics for long-time errors in splitting schemes for stochastic dynamics and parallel KMC

Beyond One Third Byzantine Failures

Neighbors, Generic Sets and Scarf-Buchberger Hypersurfaces