Estimating the average treatment causal effect in clustered data often involves dealing with unmeasured cluster-specific confounding variables. Such variables may be correlated with the measured unit covariates and outcome. When the correlations are ignored, the causal effect estimation can be biased. By utilizing sufficient statistics, we propose an inverse conditional probability weighting (ICPW) method, which is robust to both (i) the correlation between the unmeasured cluster-specific confounding variable and the covariates and (ii) the correlation between the unmeasured cluster-specific confounding variable and the outcome. Assumptions and conditions for the ICPW method are presented. We establish the asymptotic properties of the proposed estimators. Simulation studies and a case study are presented for illustration.
Answer triggering is the task of selecting the best-suited answer for a given question from a set of candidate answers if exists. In this paper, we present a hybrid deep learning model for answer triggering, which combines several dependency graph based alignment features, namely graph edit distance, graph-based similarity and dependency graph coverage, with dense vector embeddings from a Convolutional Neural Network (CNN). Our experiments on the WikiQA dataset show that such a combination can more accurately trigger a candidate answer compared to the previous state-of-the-art models. Comparative study on WikiQA dataset shows 5.86% absolute F-score improvement at the question level.
The high-dimensional data setting, in which p >> n, is a challenging statistical paradigm that appears in many real-world problems. In this setting, learning a compact, low-dimensional representation of the data can substantially help distinguish signal from noise. One way to achieve this goal is to perform subspace learning to estimate a small set of latent features that capture the majority of the variance in the original data. Most existing subspace learning models, such as PCA, assume that the data can be fully represented by its embedding in one or more latent subspaces. However, in this work, we argue that this assumption is not suitable for many high-dimensional datasets; often only some variables can easily be projected to a low-dimensional space. We propose a hybrid dimensionality reduction technique in which some features are mapped to a low-dimensional subspace while others remain in the original space. Our model leads to more accurate estimation of the latent space and lower reconstruction error. We present a simple optimization procedure for the resulting biconvex problem and show synthetic data results that demonstrate the advantages of our approach over existing methods. Finally, we demonstrate the effectiveness of this method for extracting meaningful features from both gene expression and video background subtraction datasets.
The prediction accuracy has been the long-lasting and sole standard for comparing the performance of different image classification models, including the ImageNet competition. However, recent studies have highlighted the lack of robustness in well-trained deep neural networks to adversarial examples. Visually imperceptible perturbations to natural images can easily be crafted and mislead the image classifiers towards misclassification. To demystify the trade-offs between robustness and accuracy, in this paper we thoroughly benchmark 18 ImageNet models using multiple robustness metrics, including the distortion, success rate and transferability of adversarial examples between 306 pairs of models. Our extensive experimental results reveal several new insights: (1) linear scaling law – the empirical $\ell_2$ and $\ell_\infty$ distortion metrics scale linearly with the logarithm of classification error; (2) model architecture is a more critical factor to robustness than model size, and the disclosed accuracy-robustness Pareto frontier can be used as an evaluation criterion for ImageNet model designers; (3) for a similar network architecture, increasing network depth slightly improves robustness in $\ell_\infty$ distortion; (4) there exist models (in VGG family) that exhibit high adversarial transferability, while most adversarial examples crafted from one model can only be transferred within the same family. Experiment code is publicly available at \url{https://…/Adversarial_Survey}.
In the footsteps of the book \textit{Measure Theory and Integration By and For the Learner} of our series in Probability Theory and Statistics, we intended to devote a special volume of the very probabilistic aspects of the first cited theory. The book might have assigned the title : From Measure Theory and Integration to Probability Theory. The fundamental aspects of Probability Theory, as described by the keywords and phrases below, are presented, not from experiences as in the book \textit{A Course on Elementary Probability Theory}, but from a pure mathematical view based on Measure Theory. Such an approach places Probability Theory in its natural frame of Functional Analysis and constitutes a firm preparation to the study of Random Analysis and Stochastic processes. At the same time, it offers a solid basis towards Mathematical Statistics Theory. The book will be continuously updated and improved on a yearly basis.
We argue that logical semantics might have faltered due to its failure in distinguishing between two fundamentally very different types of concepts: ontological concepts, that should be types in a strongly-typed ontology, and logical concepts, that are predicates corresponding to properties of and relations between objects of various ontological types. We will then show that accounting for these differences amounts to the integration of lexical and compositional semantics in one coherent framework, and to an embedding in our logical semantics of a strongly-typed ontology that reflects our commonsense view of the world and the way we talk about it in ordinary language. We will show that in such a framework a number of challenges in natural language semantics can be adequately and systematically treated.
NIMFA is an open-source Python library that provides a unified interface to nonnegative matrix factorization algorithms. It includes implementations of state-of-the-art factorization methods, initialization approaches, and quality scoring. It supports both dense and sparse matrix representation. NIMFA’s component-based implementation and hierarchical design should help the users to employ already implemented techniques or design and code new strategies for matrix factorization tasks.
In this paper, we propose a regularized mixture probabilistic model to cluster matrix data and apply it to brain signals. The approach is able to capture the sparsity (low rank, small/zero values) of the original signals by introducing regularization terms into the likelihood function. Through a modified EM algorithm, our method achieves the optimal solution with low computational cost. Theoretical results are also provided to establish the consistency of the proposed estimators. Simulations show the advantages of the proposed method over other existing methods. We also apply the approach to two real datasets from different experiments. Promising results imply that the proposed method successfully characterizes signals with different patterns while yielding insightful scientific interpretation.
Detecting abrupt changes in the mean of a time series, so-called changepoints, is important for many applications. However, many procedures rely on the estimation of nuisance parameters (like long-run variance). Under the alternative (a change in mean), estimators might be biased and data-adaptive rules for the choice of tuning parameters might not work as expected. If the data is not stationary, but heteroscedastic, this becomes more challenging. The aim of this paper is to present and investigate two changepoint tests, which involve neither nuisance nor tuning parameters. This is achieved by combing self-normalization and wild bootstrap. We study the asymptotic behavior and show the consistency of the bootstrap under the hypothesis as well as under the alternative, assuming mild conditions on the weak dependence of the time series and allowing the variance to change over time. As a by-product of the proposed tests, a changepoint estimator is introduced and its consistency is proved. The results are illustrated through a simulation study, which demonstrates computational efficiency of the developed methods. The new tests will also be applied to real data examples from finance and hydrology.
Training deep recurrent neural network (RNN) architectures is complicated due to the increased network complexity. This disrupts the learning of higher order abstracts using deep RNN. In case of feed-forward networks training deep structures is simple and faster while learning long-term temporal information is not possible. In this paper we propose a residual memory neural network (RMN) architecture to model short-time dependencies using deep feed-forward layers having residual and time delayed connections. The residual connection paves way to construct deeper networks by enabling unhindered flow of gradients and the time delay units capture temporal information with shared weights. The number of layers in RMN signifies both the hierarchical processing depth and temporal depth. The computational complexity in training RMN is significantly less when compared to deep recurrent networks. RMN is further extended as bi-directional RMN (BRMN) to capture both past and future information. Experimental analysis is done on AMI corpus to substantiate the capability of RMN in learning long-term information and hierarchical information. Recognition performance of RMN trained with 300 hours of Switchboard corpus is compared with various state-of-the-art LVCSR systems. The results indicate that RMN and BRMN gains 6 % and 3.8 % relative improvement over LSTM and BLSTM networks.
A number of applications benefit from continuously releasing streams of personal data statistics. The process, however, poses significant privacy risks. Motivated by an application in energy systems, this paper presents OptStream, a novel algorithm for releasing differential private data streams. OptStream is a 4-step procedure consisting of sampling, perturbation, reconstruction, and post-processing modules. The sampling module selects a small set of points to access privately in each period of interest, the perturbation module adds noise to the sampled data points to guarantee privacy, the reconstruction module re-assembles the non-sampling data points from the perturbed sampled points, and the post-processing module uses convex optimization over the private output of the previous modules, as well as the private answers of additional queries on the data stream, to ensure consistency of the data’s salient features. OptStream is used to release a real data stream from the largest transmission operator in Europe. Experimental results show that OptStream not only improves the accuracy of the state-of-the-art by at least one order of magnitude on this application domain, but it is also able to ensure accurate load forecasting based on the private data.
As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its development. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. This survey focuses on reviewing the current researches of transfer learning by using deep neural network and its applications. We defined deep transfer learning, category and review the recent research works based on the techniques used in deep transfer learning.