• Structured Transforms for Small-Footprint Deep Learning
We consider the task of building compact deep learning pipelines suitable for deployment on storage and power constrained mobile devices. We propose a unified framework to learn a broad family of structured parameter matrices that are characterized by the notion of low displacement rank. Our structured transforms admit fast function and gradient evaluation, and span a rich range of parameter sharing configurations whose statistical modeling capacity can be explicitly tuned along a continuum from structured to unstructured. Experimental results show that these transforms can significantly accelerate inference and forward/backward passes during training, and offer superior accuracy-compactness-speed tradeoffs in comparison to a number of existing techniques. In keyword spotting applications in mobile speech recognition, our methods are much more effective than standard linear low-rank bottleneck layers and nearly retain the performance of state of the art models, while providing more than 3.5-fold compression.
• Language Segmentation
Language segmentation consists in finding the boundaries where one language ends and another language begins in a text written in more than one language. This is important for all natural language processing tasks. The problem can be solved by training language models on language data. However, in the case of low- or no-resource languages, this is problematic. I therefore investigate whether unsupervised methods perform better than supervised methods when it is difficult or impossible to train supervised approaches. A special focus is given to difficult texts, i.e. texts that are rather short (one sentence), containing abbreviations, low-resource languages and non-standard language. I compare three approaches: supervised n-gram language models, unsupervised clustering and weakly supervised n-gram language model induction. I devised the weakly supervised approach in order to deal with difficult text specifically. In order to test the approach, I compiled a small corpus of different text types, ranging from one-sentence texts to texts of about 300 words. The weakly supervised language model induction approach works well on short and difficult texts, outperforming the clustering algorithm and reaching scores in the vicinity of the supervised approach. The results look promising, but there is room for improvement and a more thorough investigation should be undertaken.
• Parameterized Neural Network Language Models for Information Retrieval
Information Retrieval (IR) models need to deal with two difficult issues, vocabulary mismatch and term dependencies. Vocabulary mismatch corresponds to the difficulty of retrieving relevant documents that do not contain exact query terms but semantically related terms. Term dependencies refers to the need of considering the relationship between the words of the query when estimating the relevance of a document. A multitude of solutions has been proposed to solve each of these two problems, but no principled model solve both. In parallel, in the last few years, language models based on neural networks have been used to cope with complex natural language processing tasks like emotion and paraphrase detection. Although they present good abilities to cope with both term dependencies and vocabulary mismatch problems, thanks to the distributed representation of words they are based upon, such models could not be used readily in IR, where the estimation of one language model per document (or query) is required. This is both computationally unfeasible and prone to over-fitting. Based on a recent work that proposed to learn a generic language model that can be modified through a set of document-specific parameters, we explore use of new neural network models that are adapted to ad-hoc IR tasks. Within the language model IR framework, we propose and study the use of a generic language model as well as a document-specific language model. Both can be used as a smoothing component, but the latter is more adapted to the document at hand and has the potential of being used as a full document language model. We experiment with such models and analyze their results on TREC-1 to 8 datasets.
• Change-point detection using the conditional entropy of ordinal patterns
This paper is devoted to change-point detection using only the ordinal structure of a time series. A statistic based on the conditional entropy of ordinal patterns characterizing the local up and down in a time series is introduced and investigated. The statistic requires only minimal a priori information on given data and shows good performance in numerical experiments.
• SentiCap: Generating Image Descriptions with Sentiments
The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems. One of such styles is descriptions with emotions, which is commonplace in everyday communication, and influences decision-making and interpersonal relationships. We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments. We propose a novel switching recurrent neural network with word-level regularization, which is able to produce emotional image captions using only 2000+ training sentences containing sentiments. We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions, of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment.
• The Problem with Assessing Statistical Methods
In this paper, we investigate the problem of assessing statistical methods and effectively summarizing results from simulations. Specifically, we consider problems of the type where multiple methods are compared on a reasonably large test set of problems. These simulation studies are typically used to provide advice on an effective method for analyzing future untested problems. Most of these simulation studies never apply statistical methods to find which method(s) are expected to perform best. Instead, conclusions are based on a qualitative assessment of poorly chosen graphical and numerical summaries of the results. We illustrate that the Empirical Cumulative Distribution Function when used appropriately is an extremely effective tool for assessing what matters in large scale statistical simulations.
• Learning-based Reduced Order Model Stabilization for Partial Differential Equations: Application to the Coupled Burgers Equation
• Avalanches and perturbation theory in the random-field Ising model
• Large $\{0, 1, \ldots, t\}$-Cliques in Dual Polar Graphs
• Four-Point, 2D, Free-Ranging, IMSPE-Optimal, Twin-Point Designs
• Improving Ice Sheet Model Calibration Using Paleoclimate and Modern Data
• What’s in a ball? Constructing and characterizing uncertainty sets
• Sketching for Simultaneously Sparse and Low-Rank Covariance Matrices
• DKP-AOM: results for OAEI 2015
• A new generalization of Hermite’s reciprocity law
• Rayleigh surface waves and phonon mode conversion in nanostructures
• Large-scale subspace clustering using sketching and validation
• Population-Contrastive-Divergence: Does Consistency help with RBM training?
• Maximum moments of sum of independent random matrices
• Stability for a class of semilinear fractional stochastic integral equations
• Automata, reduced words, and Garside shadows in Coxeter groups
• Disjunctive Answer Set Solvers via Templates
• Analyzer and generator for Pali
• Simultaneous Feedback Vertex Set: A Parameterized Perspective
• Geodesic Forests in the Last-Passage Percolation
• Tensors Masquerading as Matchgates: Relaxing Planarity Restrictions on Pfaffian Circuits
• DC Decomposition of Nonconvex Polynomials with Algebraic Techniques
• $Z_4$-codes and their Gray map images as orthogonal arrays
• Local microscopic behavior for 2D Coulomb gases
• Quantifying Emergent Behavior of Autonomous Robots
• Bayesian Markov Blanket Estimation
• Equivariant maps related to the topological Tverberg conjecture
• Local Rademacher Complexity Bounds based on Covering Numbers
• Hölderian weak invariance principle under Maxwell and Woodroofe condition
• A Framework for Estimating Stream Expression Cardinalities
• Strong spatial mixing in homomorphism spaces
• A Stochastic Gradient Method with Linear Convergence Rate for a Class of Non-smooth Non-strongly Convex Optimizations
• Distance-2 MDS codes and latin colorings in the Doob graphs
• Improved Estimation of Class Prior Probabilities through Unlabeled Data
• Symmetric Graphs with respect to Graph Entropy
• Torsion in the homology of Milnor fibers of hyperplane arrangements
• Randomized Alternating Least Squares for Canonical Tensor Decompositions: Application to a PDE with Random Data
• Accurate calculations of stationary distributions and mean first passage times in Markov renewal processes and Markov chains
• Depinning of disordered bosonic chains
• Batch Normalized Recurrent Neural Networks
• Inverse Problems for a Class of Conditional Probability Measure-Dependent Evolution Equations
• Revisiting Kneser’s Theorem for Field Extensions
• The parametric Frobenius problem and parametric exclusion
• Within-Brain Classification for Brain Tumor Segmentation
• Parametrizing an integer linear program by an integer
• Rank-frequency relations of phonemes uncover an author-dependency of their usage
• Time-dependent community structure in legislation cosponsorship networks in the Congress of the Republic of Peru
• Noncommutative Schur functions, switchboards, and Schur positivity
• Automatic Taxonomy Extraction from Query Logs with no Additional Sources of Information
• Exposing the Probabilistic Causal Structure of Discrimination
• Minimax Binary Classifier Aggregation with General Losses
Like this:
Like Loading...