Prediction Advantage (PA)
We introduce the Prediction Advantage (PA), a novel performance measure for prediction functions under any loss function (e.g., classification or regression). The PA is defined as the performance advantage relative to the Bayesian risk restricted to knowing only the distribution of the labels. We derive the PA for well-known loss functions, including 0/1 loss, cross-entropy loss, absolute loss, and squared loss. In the latter case, the PA is identical to the well-known R-squared measure, widely used in statistics. The use of the PA ensures meaningful quantification of prediction performance, which is not guaranteed, for example, when dealing with noisy imbalanced classification problems. We argue that among several known alternative performance measures, PA is the best (and only) quantity ensuring meaningfulness for all noise and imbalance levels. …
Multi-Cell LSTM
Language models, being at the heart of many NLP problems, are always of great interest to researchers. Neural language models come with the advantage of distributed representations and long range contexts. With its particular dynamics that allow the cycling of information within the network, `Recurrent neural network’ (RNN) becomes an ideal paradigm for neural language modeling. Long Short-Term Memory (LSTM) architecture solves the inadequacies of the standard RNN in modeling long-range contexts. In spite of a plethora of RNN variants, possibility to add multiple memory cells in LSTM nodes was seldom explored. Here we propose a multi-cell node architecture for LSTMs and study its applicability for neural language modeling. The proposed multi-cell LSTM language models outperform the state-of-the-art results on well-known Penn Treebank (PTB) setup. …
Object Oriented Data Analysis (OODA)
Object oriented data analysis (OODA) is the statistical analysis of data sets of complex objects. The area is understood through consideration of the atom of the statistical analysis. In a first course in statistics, the atoms are numbers. Atoms are vectors in multivariate analysis. An interesting special case of OODA is functional data analysis, where atoms are curves; see Ramsay and Silverman for excellent overviews, as well as many interesting analyses, novel methodologies and detailed discussion. More general atoms have also been considered. Locantore et al. studied the case of images as atoms, and Pizer, Thall and Chen and Yushkevich et al. took the atoms to be shape objects in two- and three-dimensional space. An important major goal of OODA is understanding population structure of a data set. The usual first step is to find a centerpoint, for example, a mean or median, of the data set. The second step is to analyze the variation about the center. Principal component analysis (PCA) has been a workhorse method for this, especially when combined with new visualizations as done in functional data analysis. An important reason for this success to date is that the data naturally lie in Euclidean spaces, where standard vector space analyses have proven to be both insightful and effective. …
GaterNet
The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing. In this paper, we investigate input-dependent dynamic filter selection in deep convolutional neural networks (CNNs). The problem is interesting because the idea of forcing different parts of the model to learn from different types of samples may help us acquire better filters in CNNs, improve the model generalization performance and potentially increase the interpretability of model behavior. We propose a novel yet simple framework called GaterNet, which involves a backbone and a gater network. The backbone network is a regular CNN that performs the major computation needed for making a prediction, while a global gater network is introduced to generate binary gates for selectively activating filters in the backbone network based on each input. Extensive experiments on CIFAR and ImageNet datasets show that our models consistently outperform the original models with a large margin. On CIFAR-10, our model also improves upon state-of-the-art results. …
If you did not already know
23 Saturday Jan 2021
Posted What is ...
in