We consider a sequential Bayesian changepoint detection problem for a general stochastic model, assuming that the observed data may be dependent and non-identically distributed and the prior distribution of the change point is arbitrary, not necessarily geometric. Tartakovsky and Veeravalli (2004) developed a general asymptotic theory of changepoint detection in the non-iid case and discrete time, and Baron and Tartakovsky (2006) in continuous time assuming certain stability of the log-likelihood ratio process. This stability property was formulated in terms of the r-quick convergence of the normalized log-likelihood ratio process to a positive and finite number, which can be interpreted as the limiting Kullback-Leibler information between the ‘change’ and ‘no change’ hypotheses. In these papers, it was conjectured that the r-quick convergence can be relaxed in the r-complete convergence, which is typically much easier to verify in particular examples. In the present paper, we justify this conjecture by showing that the Shiryaev change detection procedure is nearly optimal, minimizing asymptotically (as the probability of false alarm vanishes) the moments of the delay to detection up to order r whenever r-complete convergence holds. We also study asymptotic properties of the Shiryaev-Roberts detection procedure in the Bayesian context.
Convolutional Neural Networks (CNNs) have recently achieved remarkably strong performance on sentence classification tasks (Kim, 2014; Kalchbrenner et al.,2014; Wang et al., 2015). However, these models require practitioners to specify the exact model architecture and accompanying hyper-parameters, e.g., the choice of filter region size, regularization parameters, and so on. It is currently unknown how sensitive model performance is to changes in these configurations for the task of sentence classification. We thus conduct an empirical sensitivity analysis of one-layer CNNs to explore the effect of each part of the architecture on the performance; our aim is to assess the robustness of the model and to distinguish between important and comparatively inconsequential design decisions for sentence classification. We focus on one-layer CNNs (to the exclusion of more complex models) due to their comparative simplicity and strong empirical performance (Kim, 2014). We derive practical advice from our extensive empirical results for those interested in getting the most out of CNNs for sentence classification.
Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider an alternative approach to statistical matching without using the conditional independence assumption. We apply parametric fractional imputation of Kim (2011) to create imputed data using an instrumental variable assumption to identify the joint distribution. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.
We develop a model-based empirical Bayes approach to variable selection problems in which the number of predictors is very large, possibly much larger than the number of responses (the so-called ‘large p, small n’ problem). We consider the multiple linear regression setting, where the response is assumed to be a continuous variable and it is a linear function of the predictors plus error. The explanatory variables in the linear model can have a positive effect on the response, a negative effect, or no effect. We model the effects of the linear predictors as a three-component mixture in which a key assumption is that only a small (unknown) fraction of the candidate predictors have a non-zero effect on the response variable. By treating the coefficients as random effects we develop an approach that is computationally efficient because the number of parameters that have to be estimated is small, and remains constant regardless of the number of explanatory variables. The model parameters are estimated using the EM algorithm which is scalable and leads to significantly faster convergence, compared with simulation-based methods.
This article proposes a first analysis of kernel spectral clustering methods in the regime where the dimension of the data vectors to be clustered and their number grow large at the same rate. We demonstrate, under a -class Gaussian mixture model, that the normalized Laplacian matrix associated with the kernel matrix asymptotically behaves similar to a so-called spiked random matrix. Some of the isolated eigenvalue-eigenvector pairs in this model are shown to carry the clustering information upon a separability condition classical in spiked matrix models. We evaluate precisely the position of these eigenvalues and the content of the eigenvectors, which unveil important properties concerning spectral clustering, in particular in simple toy models. Our results are then compared to the practical clustering of images from the MNIST database, thereby revealing an important match between theory and practice.
Heteroscedasticity testing is of importance in regression analysis. Existing local smoothing tests suffer severely from curse of dimensionality even when the number of covariates is moderate because of use of nonparametric estimation. In this paper, a dimension reduction-based model adaptive test is proposed which behaves like a local smoothing test as if the number of covariates were equal to the number of their linear combinations in the mean regression function, in particular, equal to 1 when the mean function contains a single index. The test statistic is asymptotically normal under the null hypothesis such that critical values are easily determined. The finite sample performances of the test are examined by simulations and a real data analysis.