Adaptive Learning with Robust Generalization Guarantees

The traditional notion of generalization — i.e., learning a hypothesis whose empirical error is close to its true error — is surprisingly brittle. As has recently been noted [DFH+15b], even if several algorithms have this guarantee in isolation, the guarantee need not hold if the algorithms are composed adaptively. In this paper, we study three notions of generalization —increasing in strength— that are robust to post-processing and amenable to adaptive composition, and examine the relationships between them. We call the weakest such notion Robust Generalization. A second, intermediate, notion is the stability guarantee known as differential privacy. The strongest guarantee we consider we call Perfect Generalization. We prove that every hypothesis class that is PAC learnable is also PAC learnable in a robustly generalizing fashion, albeit with an exponential blowup in sample complexity. We conjecture that a stronger version of this theorem also holds that avoids any blowup in sample complexity (and, in fact, it would, subject to a longstanding conjecture [LW86, War03]). It was previously known that differentially private algorithms satisfy robust generalization. In this paper, we show that robust generalization is a strictly weaker concept, and that there is a learning task that can be carried out subject to robust generalization guarantees, yet cannot be carried out subject to differential privacy, answering an open question of [DFH+15a]. We also show that perfect generalization is a strictly stronger guarantee than differential privacy, but that, nevertheless, many learning tasks can be carried out subject to the guarantees of perfect generalization.


Top-N Recommendation with Novel Rank Approximation

The importance of accurate recommender systems has been widely recognized by academia and industry. However, the recommendation quality is still rather low. Recently, a linear sparse and low-rank representation of the user-item matrix has been applied to produce Top-N recommendations. This approach uses the nuclear norm as a convex relaxation for the rank function and has achieved better recommendation accuracy than the state-of-the-art methods. In the past several years, solving rank minimization problems by leveraging nonconvex relaxations has received increasing attention. Some empirical results demonstrate that it can provide a better approximation to original problems than convex relaxation. In this paper, we propose a novel rank approximation to enhance the performance of Top-N recommendation systems, where the approximation error is controllable. Experimental results on real data show that the proposed rank approximation improves the Top-N recommendation accuracy substantially.


Data Cleaning for XML Electronic Dictionaries via Statistical Anomaly Detection

Many important forms of data are stored digitally in XML format. Errors can occur in the textual content of the data in the fields of the XML. Fixing these errors manually is time-consuming and expensive, especially for large amounts of data. There is increasing interest in the research, development, and use of automated techniques for assisting with data cleaning. Electronic dictionaries are an important form of data frequently stored in XML format that frequently have errors introduced through a mixture of manual typographical entry errors and optical character recognition errors. In this paper we describe methods for flagging statistical anomalies as likely errors in electronic dictionaries stored in XML format. We describe six systems based on different sources of information. The systems detect errors using various signals in the data including uncommon characters, text length, character-based language models, word-based language models, tied-field length ratios, and tied-field transliteration models. Four of the systems detect errors based on expectations automatically inferred from content within elements of a single field type. We call these single-field systems. Two of the systems detect errors based on correspondence expectations automatically inferred from content within elements of multiple related field types. We call these tied-field systems. For each system, we provide an intuitive analysis of the type of error that it is successful at detecting. Finally, we describe two larger-scale evaluations using crowdsourcing with Amazon’s Mechanical Turk platform and using the annotations of a domain expert. The evaluations consistently show that the systems are useful for improving the efficiency with which errors in XML electronic dictionaries can be detected.


Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time. We demonstrate the usefulness of our method on applications in supervised image recognition, generative modelling, and deep reinforcement learning.


Firefly Algorithm for optimization problems with non-continuous variables: A Review and Analysis

Firefly algorithm is a swarm based metaheuristic algorithm inspired by the flashing behavior of fireflies. It is an effective and an easy to implement algorithm. It has been tested on different problems from different disciplines and found to be effective. Even though the algorithm is proposed for optimization problems with continuous variables, it has been modified and used for problems with non-continuous variables, including binary and integer valued problems. In this paper a detailed review of this modifications of firefly algorithm for problems with non-continuous variables will be discussed. The strength and weakness of the modifications along with possible future works will be presented.


PCA/LDA Approach for Text-Independent Speaker Recognition

Various algorithms for text-independent speaker recognition have been developed through the decades, aiming to improve both accuracy and e?ciency. This paper presents a novel PCA/LDA-based approach that is faster than traditional statistical model-based methods and achieves competitive results. First, the performance based on only PCA and only LDA is measured; then a mixed model, taking advantages of both methods, is introduced. A subset of the TIMIT corpus composed of 200 male speakers, is used for enrollment, validation and testing. The best results achieve 100%; 96% and 95% classi?cation rate at population level 50; 100 and 200, using 39-dimensional MFCC features with delta and double delta. These results are based on 12-second text-independent speech for training and 4-second data for test. These are comparable to the conventional MFCC-GMM methods, but require signi?cantly less time to train and operate.


Learning functions across many orders of magnitudes

Closed and asymptotic formulas for energy of some circulant graphs

On the distances between Latin squares and the smallest defining set size

Phonon Monte Carlo: Generating Random Variates for Thermal Transport Simulation

Toward Mention Detection Robustness with Recurrent Neural Networks

A Note on Modeling Self-Suspending Time as Blocking Time in Real-Time Systems

Envelopes of conditional probabilities extending a strategy and a prior probability

A Compressed Sensing Based Decomposition of Electro-Dermal Activity Signals

Reinforcement Learning of POMDP’s using Spectral Methods

Recurrent Neural Network Grammars

Classification of directed regular Ricci-flat graphs

Expectation Consistent Approximate Inference: Generalizations and Convergence

A Study on the usage of Data Structures in Information Retrieval

Monomial Gamma Monte Carlo Sampling

Automated Word Prediction in Bangla Language Using Stochastic Language Models

Minimum Cost Homomorphisms with Constrained Costs

A Bayesian baseline for belief in uncommon events

Tight bounds on discrete quantitative Helly numbers

Fast Nonsmooth Regularized Risk Minimization with Continuation

Elementary observations on Rogers-Szegö polynomials

Modeling cumulative biological phenomena with Suppes-Bayes causal networks

Probably Approximately Correct Greedy Maximization

Learning Gaussian Graphical Models With Fractional Marginal Pseudo-likelihood

Projected Estimators for Robust Semi-supervised Classification

Exact simulation of the jump times of a class of Piecewise Deterministic Markov Processes

On Satisfiability Problems with a Linear Structure

Thompson Sampling is Asymptotically Optimal in General Environments

Experimental Performance Evaluation of Cloud-Based Analytics-as-a-Service

Delocalization in One-Dimensional Tight-Binding Models with Fractal Disorder II: Existence of Mobility Edge

Bootstrap Inference when Using Multiple Imputation

Kempf-Laksov Schubert classes for even infinitesimal cohomology theories

Causal Discovery from Subsampled Time Series Data by Constraint Optimization

How effective can simple ordinal peer grading be?

A multidimensional analogue of the Rademacher-Gaussian tail comparison

Essential dimension and the flats spanned by a point set

Adaptive estimation of High-Dimensional Signal-to-Noise Ratios

Practical Riemannian Neural Networks

Meta-learning within Projective Simulation

Time-Space Trade-offs in Population Protocols

Zero-Suppressed Computation: A New Computation Inspired by ZDDs

The enhanced Sanov theorem and propagation of chaos

Denoising Flows on Trees

How to quantify coherence: distinguishing speakable and unspeakable notions

Probabilistic community detection with unknown number of communities

Semi-parametric inference for the means of heavy-tailed distributions

Perfect snake-in-the-box codes for rank modulation