Structured Transforms for Small-Footprint Deep Learning

We consider the task of building compact deep learning pipelines suitable for deployment on storage and power constrained mobile devices. We propose a unified framework to learn a broad family of structured parameter matrices that are characterized by the notion of low displacement rank. Our structured transforms admit fast function and gradient evaluation, and span a rich range of parameter sharing configurations whose statistical modeling capacity can be explicitly tuned along a continuum from structured to unstructured. Experimental results show that these transforms can significantly accelerate inference and forward/backward passes during training, and offer superior accuracy-compactness-speed tradeoffs in comparison to a number of existing techniques. In keyword spotting applications in mobile speech recognition, our methods are much more effective than standard linear low-rank bottleneck layers and nearly retain the performance of state of the art models, while providing more than 3.5-fold compression.

Language Segmentation

Language segmentation consists in finding the boundaries where one language ends and another language begins in a text written in more than one language. This is important for all natural language processing tasks. The problem can be solved by training language models on language data. However, in the case of low- or no-resource languages, this is problematic. I therefore investigate whether unsupervised methods perform better than supervised methods when it is difficult or impossible to train supervised approaches. A special focus is given to difficult texts, i.e. texts that are rather short (one sentence), containing abbreviations, low-resource languages and non-standard language. I compare three approaches: supervised n-gram language models, unsupervised clustering and weakly supervised n-gram language model induction. I devised the weakly supervised approach in order to deal with difficult text specifically. In order to test the approach, I compiled a small corpus of different text types, ranging from one-sentence texts to texts of about 300 words. The weakly supervised language model induction approach works well on short and difficult texts, outperforming the clustering algorithm and reaching scores in the vicinity of the supervised approach. The results look promising, but there is room for improvement and a more thorough investigation should be undertaken.

Parameterized Neural Network Language Models for Information Retrieval

Information Retrieval (IR) models need to deal with two difficult issues, vocabulary mismatch and term dependencies. Vocabulary mismatch corresponds to the difficulty of retrieving relevant documents that do not contain exact query terms but semantically related terms. Term dependencies refers to the need of considering the relationship between the words of the query when estimating the relevance of a document. A multitude of solutions has been proposed to solve each of these two problems, but no principled model solve both. In parallel, in the last few years, language models based on neural networks have been used to cope with complex natural language processing tasks like emotion and paraphrase detection. Although they present good abilities to cope with both term dependencies and vocabulary mismatch problems, thanks to the distributed representation of words they are based upon, such models could not be used readily in IR, where the estimation of one language model per document (or query) is required. This is both computationally unfeasible and prone to over-fitting. Based on a recent work that proposed to learn a generic language model that can be modified through a set of document-specific parameters, we explore use of new neural network models that are adapted to ad-hoc IR tasks. Within the language model IR framework, we propose and study the use of a generic language model as well as a document-specific language model. Both can be used as a smoothing component, but the latter is more adapted to the document at hand and has the potential of being used as a full document language model. We experiment with such models and analyze their results on TREC-1 to 8 datasets.

Change-point detection using the conditional entropy of ordinal patterns

This paper is devoted to change-point detection using only the ordinal structure of a time series. A statistic based on the conditional entropy of ordinal patterns characterizing the local up and down in a time series is introduced and investigated. The statistic requires only minimal a priori information on given data and shows good performance in numerical experiments.

SentiCap: Generating Image Descriptions with Sentiments

The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems. One of such styles is descriptions with emotions, which is commonplace in everyday communication, and influences decision-making and interpersonal relationships. We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments. We propose a novel switching recurrent neural network with word-level regularization, which is able to produce emotional image captions using only 2000+ training sentences containing sentiments. We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions, of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment.

The Problem with Assessing Statistical Methods

In this paper, we investigate the problem of assessing statistical methods and effectively summarizing results from simulations. Specifically, we consider problems of the type where multiple methods are compared on a reasonably large test set of problems. These simulation studies are typically used to provide advice on an effective method for analyzing future untested problems. Most of these simulation studies never apply statistical methods to find which method(s) are expected to perform best. Instead, conclusions are based on a qualitative assessment of poorly chosen graphical and numerical summaries of the results. We illustrate that the Empirical Cumulative Distribution Function when used appropriately is an extremely effective tool for assessing what matters in large scale statistical simulations.

Learning-based Reduced Order Model Stabilization for Partial Differential Equations: Application to the Coupled Burgers Equation

Avalanches and perturbation theory in the random-field Ising model

Large $\{0, 1, \ldots, t\}$-Cliques in Dual Polar Graphs

Four-Point, 2D, Free-Ranging, IMSPE-Optimal, Twin-Point Designs

Improving Ice Sheet Model Calibration Using Paleoclimate and Modern Data

What’s in a ball? Constructing and characterizing uncertainty sets

Sketching for Simultaneously Sparse and Low-Rank Covariance Matrices

DKP-AOM: results for OAEI 2015

A new generalization of Hermite’s reciprocity law

Rayleigh surface waves and phonon mode conversion in nanostructures

Large-scale subspace clustering using sketching and validation

Population-Contrastive-Divergence: Does Consistency help with RBM training?

Maximum moments of sum of independent random matrices

Stability for a class of semilinear fractional stochastic integral equations

Automata, reduced words, and Garside shadows in Coxeter groups

Disjunctive Answer Set Solvers via Templates

Analyzer and generator for Pali

Simultaneous Feedback Vertex Set: A Parameterized Perspective

Geodesic Forests in the Last-Passage Percolation

Tensors Masquerading as Matchgates: Relaxing Planarity Restrictions on Pfaffian Circuits

DC Decomposition of Nonconvex Polynomials with Algebraic Techniques

$Z_4$-codes and their Gray map images as orthogonal arrays

Local microscopic behavior for 2D Coulomb gases

Quantifying Emergent Behavior of Autonomous Robots

Bayesian Markov Blanket Estimation

Equivariant maps related to the topological Tverberg conjecture

Local Rademacher Complexity Bounds based on Covering Numbers

Hölderian weak invariance principle under Maxwell and Woodroofe condition

A Framework for Estimating Stream Expression Cardinalities

Strong spatial mixing in homomorphism spaces

A Stochastic Gradient Method with Linear Convergence Rate for a Class of Non-smooth Non-strongly Convex Optimizations

Distance-2 MDS codes and latin colorings in the Doob graphs

Improved Estimation of Class Prior Probabilities through Unlabeled Data

Symmetric Graphs with respect to Graph Entropy

Torsion in the homology of Milnor fibers of hyperplane arrangements

Randomized Alternating Least Squares for Canonical Tensor Decompositions: Application to a PDE with Random Data

Accurate calculations of stationary distributions and mean first passage times in Markov renewal processes and Markov chains

Depinning of disordered bosonic chains

Batch Normalized Recurrent Neural Networks

Inverse Problems for a Class of Conditional Probability Measure-Dependent Evolution Equations

Revisiting Kneser’s Theorem for Field Extensions

The parametric Frobenius problem and parametric exclusion

Within-Brain Classification for Brain Tumor Segmentation

Parametrizing an integer linear program by an integer

Rank-frequency relations of phonemes uncover an author-dependency of their usage

Time-dependent community structure in legislation cosponsorship networks in the Congress of the Republic of Peru

Noncommutative Schur functions, switchboards, and Schur positivity

Automatic Taxonomy Extraction from Query Logs with no Additional Sources of Information

Exposing the Probabilistic Causal Structure of Discrimination

Minimax Binary Classifier Aggregation with General Losses