This work introduces a tensor-based method to perform supervised classification on spatiotemporal data processed in an echo state network. Typically when performing supervised classification tasks on data processed in an echo state network, the entire collection of hidden layer node states from the training dataset is shaped into a matrix, allowing one to use standard linear algebra techniques to train the output layer. However, the collection of hidden layer states is multidimensional in nature, and representing it as a matrix may lead to undesirable numerical conditions or loss of spatial and temporal correlations in the data. This work proposes a tensor-based supervised classification method on echo state network data that preserves and exploits the multidimensional nature of the hidden layer states. The method, which is based on orthogonal Tucker decompositions of tensors, is compared with the standard linear output weight approach in several numerical experiments on both synthetic and natural data. The results show that the tensor-based approach tends to outperform the standard approach in terms of classification accuracy.
We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of $p$-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class.
The volume and velocity of information that gets generated online limits current journalistic practices to fact-check claims at the same rate. Computational approaches for fact checking may be the key to help mitigate the risks of massive misinformation spread. Such approaches can be designed to not only be scalable and effective at assessing veracity of dubious claims, but also to boost a human fact checker’s productivity by surfacing relevant facts and patterns to aid their analysis. To this end, we present a novel, unsupervised network-flow based approach to determine the truthfulness of a statement of fact expressed in the form of a (subject, predicate, object) triple. We view a knowledge graph of background information about real-world entities as a flow network, and knowledge as a fluid, abstract commodity. We show that computational fact checking of such a triple then amounts to finding a ‘knowledge stream’ that emanates from the subject node and flows toward the object node through paths connecting them. Evaluation on a range of real-world and hand-crafted datasets of facts related to entertainment, business, sports, geography and more reveals that this network-flow model can be very effective in discerning true statements from false ones, outperforming existing algorithms on many test cases. Moreover, the model is expressive in its ability to automatically discover several useful path patterns and surface relevant facts that may help a human fact checker corroborate or refute a claim.
An exhaustive study on neural network language modeling (NNLM) is performed in this paper. Different architectures of basic neural network language models are described and examined. A number of different improvements over basic neural network language models, including importance sampling, word classes, caching and bidirectional recurrent neural network (BiRNN), are studied separately, and the advantages and disadvantages of every technique are evaluated. Then, the limits of neural network language modeling are explored from the aspects of model architecture and knowledge representation. Part of the statistical information from a word sequence will loss when it is processed word by word in a certain order, and the mechanism of training neural network by updating weight matrixes and vectors imposes severe restrictions on any significant enhancement of NNLM. For knowledge representation, the knowledge represented by neural network language models is the approximate probabilistic distribution of word sequences from a certain training data set rather than the knowledge of a language itself or the information conveyed by word sequences in a natural language. Finally, some directions for improving neural network language modeling further is discussed.
Dynamic tensor data are becoming prevalent in numerous applications. Existing tensor clustering methods either fail to account for the dynamic nature of the data, or are inapplicable to a general-order tensor. Also there is often a gap between statistical guarantee and computational efficiency for existing tensor clustering solutions. In this article, we aim to bridge this gap by proposing a new dynamic tensor clustering method, which takes into account both sparsity and fusion structures, and enjoys strong statistical guarantees as well as high computational efficiency. Our proposal is based upon a new structured tensor factorization that encourages both sparsity and smoothness in parameters along the specified tensor modes. Computationally, we develop a highly efficient optimization algorithm that benefits from substantial dimension reduction. In theory, we first establish a non-asymptotic error bound for the estimator from the structured tensor factorization. Built upon this error bound, we then derive the rate of convergence of the estimated cluster centers, and show that the estimated clusters recover the true cluster structures with a high probability. Moreover, our proposed method can be naturally extended to co-clustering of multiple modes of the tensor data. The efficacy of our approach is illustrated via simulations and a brain dynamic functional connectivity analysis from an Autism spectrum disorder study.
With the arrival of the big data era, recommendation system has been a hot technology for enterprises to streamline their sales. Recommendation algorithms for individual users have been extensively studied over the past decade. Most existing recommendation systems also focus on individual user recommendations, however in many daily activities, items are recommended to the groups not one person. As an effective means to solve the problem of group recommendation problem,we extend the single user recommendation to group recommendation. Specifically we propose a novel approach for family-based shopping recommendation system. We use the dataset from the real shopping mall consisting of shopping records table, client-profile table and family relationship table. Our algorithm integrates user behavior similarity and user profile similarity to build the user based collaborative filtering model. We evaluate our approach on a real-world shopping mall dataset.
We consider the problem of estimating a probability distribution that maximizes the entropy while satisfying a finite number of moment constraints, possibly corrupted by noise. Based on duality of convex programming, we present a novel approximation scheme using a smoothed fast gradient method that is equipped with explicit bounds on the approximation error. We further demonstrate how the presented scheme can be used for approximating the chemical master equation through the zero-information moment closure method.
In survival analysis, regression models are used to understand the effects of explanatory variables (e.g., age, sex, weight, etc.) to the survival probability. However, for sensitive survival data such as medical data, there are serious concerns about the privacy of individuals in the data set when medical data is used to fit the regression models. The closest work addressing such privacy concerns is the work on Cox regression which linearly projects the original data to a lower dimensional space. However, the weakness of this approach is that there is no formal privacy guarantee for such projection. In this work, we aim to propose solutions for the regression problem in survival analysis with the protection of differential privacy which is a golden standard of privacy protection in data privacy research. To this end, we extend the Output Perturbation and Objective Perturbation approaches which are originally proposed to protect differential privacy for the Empirical Risk Minimization (ERM) problems. In addition, we also propose a novel sampling approach based on the Markov Chain Monte Carlo (MCMC) method to practically guarantee differential privacy with better accuracy. We show that our proposed approaches achieve good accuracy as compared to the non-private results while guaranteeing differential privacy for individuals in the private data set.
Global sensitivity analysis (GSA) is frequently used to analyze the influence of uncertain parameters in mathematical models. In principle, tools from GSA may be extended to analyze the influence of parameters in statistical models. Such analyses may enable parsimonious modeling and greater predictive capability. However, difficulties such as parameter correlation, model stochasticity, and multivariate model output prohibit a direct extension of GSA tools to statistical models. We introduce a novel framework that addresses these difficulties and enables GSA for statistical model parameters. Theoretical and computational properties are considered and illustrated on a synthetic example. The framework is applied to a Gaussian process model that depends on 95 parameters from the literature. Non-influential parameters are discovered through GSA and a reduced model with equal or stronger predictive capability is constructed by using only 79 parameters.
Information theory is a mathematical theory of learning with deep connections with topics as diverse as artificial intelligence, statistical physics, and biological evolution. Many primers on the topic paint a broad picture with relatively little mathematical sophistication, while many others develop specific application areas in detail. In contrast, these informal notes aim to outline some elements of the information-theoretic ‘way of thinking,’ by cutting a rapid and interesting path through some of the theory’s foundational concepts and theorems. We take the Kullback-Leibler divergence as our foundational concept, and then proceed to develop the entropy and mutual information. We discuss some of the main foundational results, including the Chernoff bounds as a characterization of the divergence; Gibbs’ Theorem; and the Data Processing Inequality. A recurring theme is that the definitions of information theory support natural theorems that sound ‘obvious’ when translated into English. More pithily, ‘information theory makes common sense precise.’ Since the focus of the notes is not primarily on technical details, proofs are provided only where the relevant techniques are illustrative of broader themes. Otherwise, proofs and intriguing tangents are referenced in liberally-sprinkled footnotes. The notes close with a highly nonexhaustive list of references to resources and other perspectives on the field.
We present an efficient method to estimate a function of the state of a Markov chain at time-step d. Our method is based on a new unbiased algorithm that estimates the expected value of f(U) via Monte Carlo simulation, where U is a vector of d independent random variables, and f is a function that depends little on its last arguments. The algorithm samples an initial segment of components of U according to a certain distribution. For several Markov chains, our method improves upon standard Monte Carlo by a factor asymptotically proportional to d. We establish a relationship between the performance of our algorithm and the truncation dimension of f. The algorithm uses a new geometric method, of independent interest, that computes the optimal distribution in O(d) steps. Numerical experiments support our findings.