On non-iterative training of a neural classifier

Recently an algorithm, was discovered, which separates points in n-dimension by planes in such a manner that no two points are left unseparated by at least one plane{[}1-3{]}. By using this new algorithm we show that there are two ways of classification by a neural network, for a large dimension feature space, both of which are non-iterative and deterministic and we apply both these methods to a classical pattern recognition problem and present the results. It is expected these methods will now be widely used for the training of neural networks for Deep Learning not only because of their non-iterative and deterministic nature but also because of their efficiency and speed and will supercede other classification methods which are iterative in nature and rely on error minimization.


Semisupervised Autoencoder for Sentiment Analysis

In this paper, we investigate the usage of autoencoders in modeling textual data. Traditional autoencoders suffer from at least two aspects: scalability with the high dimensionality of vocabulary size and dealing with task-irrelevant words. We address this problem by introducing supervision via the loss function of autoencoders. In particular, we first train a linear classifier on the labeled data, then define a loss for the autoencoder with the weights learned from the linear classifier. To reduce the bias brought by one single classifier, we define a posterior probability distribution on the weights of the classifier, and derive the marginalized loss of the autoencoder with Laplace approximation. We show that our choice of loss function can be rationalized from the perspective of Bregman Divergence, which justifies the soundness of our model. We evaluate the effectiveness of our model on six sentiment analysis datasets, and show that our model significantly outperforms all the competing methods with respect to classification accuracy. We also show that our model is able to take advantage of unlabeled dataset and get improved performance. We further show that our model successfully learns highly discriminative feature maps, which explains its superior performance.


Sentence Entailment in Compositional Distributional Semantics

Distributional semantic models provide vector representations for words by gathering co-occurrence frequencies from corpora of text. Compositional distributional models extend these representations from words to phrases and sentences. In categorical compositional distributional semantics these representations are built in such a manner that meanings of phrases and sentences are functions of their grammatical structure and the meanings of the words therein. These models have been applied to reasoning about phrase and sentence level similarity. In this paper, we argue for and prove that these models can also be used to reason about phrase and sentence level entailment. We provide preliminary experimental results on a toy entailment dataset.


We Are Humor Beings: Understanding and Predicting Visual Humor

Humor is an integral part of human lives. Despite being tremendously impactful, it is perhaps surprising that we do not yet have a detailed understanding of humor. In this work, we explore the novel problem of studying visual humor and designing computational models of humor using abstract scenes. We collect and make publicly available two datasets of abstract scenes: one that enables the study of humor at the scene-level and the other at the object-level. We study the funny scenes in our dataset and explore different types of humor depicted in these scenes. We model two tasks that we believe demonstrate an understanding of visual humor — predicting the funniness of a scene and altering the funniness of a scene. We show that our models perform well using automatic evaluation as well as human studies.


Data-driven Sequential Monte Carlo in Probabilistic Programming

Most of Markov Chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) algorithms in existing probabilistic programming systems suboptimally use only model priors as proposal distributions. In this work, we describe an approach for training a discriminative model, namely a neural network, in order to approximate the optimal proposal by using posterior estimates from previous runs of inference. We show an example that incorporates a data-driven proposal for use in a non-parametric model in the Anglican probabilistic programming system. Our results show that data-driven proposals can significantly improve inference performance so that considerably fewer particles are necessary to perform a good posterior estimation.


Preconditioned Stochastic Gradient Descent

Stochastic gradient descent (SGD) still is the workhorse for many practical problems. However, it converges slow, and can be difficult to tune. It is possible to precondition SGD to accelerate its convergence remarkably. But many attempts in this direction either aim at solving specialized problems, or result in significantly more complicated methods than SGD. This paper proposes a new way to estimate a preconditioner by equalizing the amplitudes of parameter changes and the amplitudes of associated gradient changes. Unlike the Hessian inverse like preconditioners based on secant equation fitting as done in deterministic quasi-Newton methods, which work the best when the Hessian is positive definite, the new preconditioner works equally well for both convex and non-convex optimizations. When stochastic gradient is used, it can naturally damp the gradient noise to stabilize SGD. Efficient preconditioner estimation methods are developed, and with reasonable simplifications, they are applicable to large scaled problems. Experimental results demonstrate that equipped with the new preconditioner, without any tuning effort, preconditioned SGD can efficiently solve many challenging problems like the training of a deep neural network or a recurrent neural network requiring extremely long term memories.


Multiple Change-point Detection: a Selective Overview

Very long and noisy sequence data arise from biological sciences to social science including high throughput data in genomics and stock prices in econometrics. Often such data are collected in order to identify and understand shifts in trend, e.g., from a bull market to a bear market in finance or from a normal number of chromosome copies to an excessive number of chromosome copies in genetics. Thus, identifying multiple change points in a long, possibly very long, sequence is an important problem. In this article, we review both classical and new multiple change-point detection strategies. Considering the long history and the extensive literature on the change-point detection, we provide an in-depth discussion on a normal mean change-point model from aspects of regression analysis, hypothesis testing, consistency and inference. In particular, we present a strategy to gather and aggregate local information for change-point detection that has become the cornerstone of several emerging methods because of its attractiveness in both computational and theoretical properties.


True Online Temporal-Difference Learning

The temporal-difference methods TD(\lambda) and Sarsa(\lambda) form a core part of modern reinforcement learning. Their appeal comes from their good performance, low computational cost, and their simple interpretation, given by their forward view. Recently, new versions of these methods were introduced, called true online TD(\lambda) and true online Sarsa(\lambda), respectively (van Seijen and Sutton, 2014). Algorithmically, these true online methods only make two small changes to the update rules of the regular methods, and the extra computational cost is negligible in most cases. However, they follow the ideas underlying the forward view much more closely. In particular, they maintain an exact equivalence with the forward view at all times, whereas the traditional versions only approximate it for small step-sizes. We hypothesize that these true online methods not only have better theoretical properties, but also dominate the regular methods empirically. In this article, we put this hypothesis to the test by performing an extensive empirical comparison. Specifically, we compare the performance of true online TD(\lambda)/Sarsa(\lambda) with regular TD(\lambda)/Sarsa(\lambda) on random MRPs, a real-world myoelectric prosthetic arm, and a domain from the Arcade Learning Environment. We use linear function approximation with tabular, binary, and non-binary features. Our results suggest that the true online methods indeed dominate the regular methods. Across all domains/representations the learning speed of the true online methods are often better, but never worse than that of the regular methods. An additional advantage is that no choice between traces has to be made for the true online methods. We show that new true online temporal-difference methods can be derived by making changes to the real-time forward view and then rewriting the update equations.


Online Visual Analytics of Text Streams

We present an online visual analytics approach to helping users explore and understand hierarchical topic evolution in high-volume text streams. The key idea behind this approach is to identify representative topics in incoming documents and align them with the existing representative topics that they immediately follow (in time). To this end, we learn a set of streaming tree cuts from topic trees based on user-selected focus nodes. A dynamic Bayesian network model has been developed to derive the tree cuts in the incoming topic trees to balance the fitness of each tree cut and the smoothness between adjacent tree cuts. By connecting the corresponding topics at different times, we are able to provide an overview of the evolving hierarchical topics. A sedimentation-based visualization has been designed to enable the interactive analysis of streaming text data from global patterns to local details. We evaluated our method on real-world datasets and the results are generally favorable.


The Power of Depth for Feedforward Neural Networks

We show that there are simple functions expressible by small 3-layer feedforward neural networks, which cannot be approximated by a 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension. The result holds for most continuous activation functions, such as rectified linear units and sigmoids, and formally demonstrates that depth — even if increased by 1 — can be exponentially more valuable than width for standard feedforward neural networks.


Active Distance-Based Clustering using K-medoids

k-medoids algorithm is a partitional, centroid-based clustering algorithm which uses pairwise distances of data points and tries to directly decompose the dataset with n points into a set of k disjoint clusters. However, k-medoids itself requires all distances between data points that are not so easy to get in many applications. In this paper, we introduce a new method which requires only a small proportion of the whole set of distances and makes an effort to estimate an upper-bound for unknown distances using the inquired ones. This algorithm makes use of the triangle inequality to calculate an upper-bound estimation of the unknown distances. Our method is built upon a recursive approach to cluster objects and to choose some points actively from each bunch of data and acquire the distances between these prominent points from oracle. Experimental results show that the proposed method using only a small subset of the distances can find proper clustering on many real-world and synthetic datasets.


Active Sampler: Light-weight Accelerator for Complex Data Analytics at Scale

Recent years have witnessed amazing outcomes from ‘Big Models’ trained by ‘Big Data’. Most popular algorithms for model training are iterative. Due to the surging volumes of data, we can usually afford to process only a fraction of the training data in each iteration. Typically, the data are either uniformly sampled or sequentially accessed. In this paper, we study how the data access pattern can affect model training. We propose an Active Sampler algorithm, where training data with more ‘learning value’ to the model are sampled more frequently. The goal is to focus training effort on valuable instances near the classification boundaries, rather than evident cases, noisy data or outliers. We show the correctness and optimality of Active Sampler in theory, and then develop a light-weight vectorized implementation. Active Sampler is orthogonal to most approaches optimizing the efficiency of large-scale data analytics, and can be applied to most analytics models trained by stochastic gradient descent (SGD) algorithm. Extensive experimental evaluations demonstrate that Active Sampler can speed up the training procedure of SVM, feature selection and deep learning, for comparable training quality by 1.6-2.2x.


Breaking the Variance: Approximating the Hamming Distance in $\tilde O(1/ε)$ Time Per Alignment

The Dehn-Sommerville Relations and the Catalan Matroid

False discovery rate control for identifying simultaneous signals

Optimal Surviving Strategy for Drifted Brownian Motions with Absorption

Dropout Training of Matrix Factorization and Autoencoder for Link Prediction in Sparse Graphs

Multimodal, high-dimensional, model-based, Bayesian inverse problems with applications in biomechanics

Patterns of Negative Shifts and Beta-Shifts

Population Dynamics of Self-Replicating Cell-like Structures Emerging from Chaos

On the replica symmetry phase of the independent set problem

Universality Classes of Generalized Epidemic Process on Random Networks

Memory-based control with recurrent neural networks

Free energy in the mixed p-spin models with vector spins

Small noise and long time phase diffusion in stochastic limit cycle oscillators

Near-Optimal Bounds for Binary Embeddings of Arbitrary Sets

Chimera states in coupled Kuramoto oscillators with inertia

Effects of time-delay in a model of intra- and inter-personal motor coordination

Simple cellular automata to mimic foraging ants submitted to abduction

Local Half-Region Depth for Functional Data

Extremal C4-free/C5-free planar graphs

Simplicial moves on balanced complexes

Random sparse sampling in a Gibbs weighted tree

An Event Calculus Production Rule System for Reasoning in Dynamic and Uncertain Domains

A Distributed Multi-agent Market Place for HPC Compute Cycle Resource Trading

On well-posedness of semilinear stochastic evolution equations on $L_p$ spaces

Rational Shi tableaux and the skew length statistic

Improved upper bounds for partial spreads

Origami: A Convolutional Network Accelerator

Small-footprint Deep Neural Networks with Highway Connections for Speech Recognition

A Tweezer for Chimeras in Small Networks

Decoding index finger position from EEG using random forests

Preprocessing Solar Images while Preserving their Latent Structure

Berry-Esseen bounds for weighted averages of Poisson avoidance functionals

On the measure of Voronoi cells

Multistability of Phase-Locking and Topological Winding Numbers in Locally Coupled Kuramoto Models on Single-Loop Networks

Non-homogeneous random walks on a half strip with generalized Lamperti drifts

Finite-temperature effects on interacting bosonic 1D systems in disordered lattices

Amortized Averaging Algorithms for Approximate Consensus

State-crossings in multidimensional random walks

Parameterized Algorithms on Perfect Graphs for deletion to $(r,\ell)$-graphs

Effects of Disorder on a 1-D Floquet Symmetry Protected Topological Phase

Hypergraph Two-Coloring in the Streaming Model

Embedding approximately low-dimensional $\ell_2^2$ metrics into $\ell_1$

Polynomiality of shifted Plancherel averages and content evaluations

Uninformed sacrifice: evidence against long-range alarm transmission in foraging ants exposed to a localized perturbation

Hypercubes are determined by their distance spectra

Fighting Bandits with a New Kind of Smoothness

Return Probabilities of Random Walks

Search-to-Decision Reductions for Lattice Problems with Approximation Factors (Slightly) Greater Than One

Evaluation of Pose Tracking Accuracy in the First and Second Generations of Microsoft Kinect

A Person Re-Identification System For Mobile Devices

Critical exponents at the unconventional disorder-driven transition in a Weyl semimetal

On Unique Ergodicity in Nonlinear Stochastic Partial Differential Equations

Discrepancy of High-Dimensional Permutations

Building and Measuring Privacy-Preserving Predictive Blacklists

Policy Gradient Methods for Off-policy Control

Coherence vs. incoherence for a mean-field XY model in random field

Using Linear Constraints for Logic Program Termination Analysis

Stack Exchange Tagger

A Probabilistic Characterization of the Dominance Order on Partitions

Censored and shifted gamma distribution based EMOS model for probabilistic quantitative precipitation forecasting

Non-iterative Joint and Individual Variation Explained

Big Data Scaling through Metric Mapping: Exploiting the Remarkable Simplicity of Very High Dimensional Spaces using Correspondence Analysis

Parameterizing edge modification problems above lower bounds

Statistical properties of 1D spin glasses from first principles of classical mechanics

Distributed Optimization with Arbitrary Local Solvers

An Uncertainty-Aware Approach for Exploratory Microblog Retrieval

Tracking Idea Flows between Social Groups

Improved bounds on the Hadwiger-Debrunner numbers

The Rationale behind the Concept of Goal

L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework

Finding HeavyPaths in Weighted Graphs and a Case-Study on Community Detection

Predictable representation property for progressive enlargements of a Poisson filtration

Cloud-based Electronic Health Records for Real-time, Region-specific Influenza Surveillance

On the Finite-Sample Analysis of $Θ$-estimators

An iterative importance sampler for Bayesian parameter estimation in stochastic models of multicellular clocks

Coexistence in the face of uncertainty

Incompleteness of the bond market with Lévy noise under the physical measure

Survival analysis, the infinite Gaussian mixture model, FDG-PET and non-imaging data in the prediction of progression from mild cognitive impairment

A Hidden Markov Model Based System for Entity Extraction from Social Media English Text at FIRE 2015

Quantum assisted Gaussian process regression

Query Answering over Contextualized RDF/OWL Knowledge with Forall-Existential Bridge Rules: Decidable Finite Extension Classes (Post Print)

Sparse Generalized Principal Component Analysis for Large-scale Applications beyond Gaussianity

On the Second Fundamental Theorem of Asset Pricing

Minimal Perceptrons for Memorizing Complex Patterns

Efficient Deep Feature Learning and Extraction via StochasticNets