To make sense of the world our brains must analyze high-dimensional datasets streamed by our sensory organs. Because such analysis begins with dimensionality reduction, modelling early sensory processing requires biologically plausible online dimensionality reduction algorithms. Recently, we derived such an algorithm, termed similarity matching, from a Multidimensional Scaling (MDS) objective function. However, in the existing algorithm, the number of output dimensions is set a priori by the number of output neurons and cannot be changed. Because the number of informative dimensions in sensory inputs is variable there is a need for adaptive dimensionality reduction. Here, we derive biologically plausible dimensionality reduction algorithms which adapt the number of output dimensions to the eigenspectrum of the input covariance matrix. We formulate three objective functions which, in the offline setting, are optimized by the projections of the input dataset onto its principal subspace scaled by the eigenvalues of the output covariance matrix. In turn, the output eigenvalues are computed as i) soft-thresholded, ii) hard-thresholded, iii) equalized thresholded eigenvalues of the input covariance matrix. In the online setting, we derive the three corresponding adaptive algorithms and map them onto the dynamics of neuronal activity in networks with biologically plausible local learning rules. Remarkably, in the last two networks, neurons are divided into two classes which we identify with principal neurons and interneurons in biological circuits.
We present an information-theoretic framework for solving global black-box optimization problems that also have black-box constraints. Of particular interest to us is to efficiently solve problems with decoupled constraints, in which subsets of the objective and constraint functions may be evaluated independently. For example, when the objective is evaluated on a CPU and the constraints are evaluated independently on a GPU. These problems require an acquisition function that can be separated into the contributions of the individual function evaluations. We develop one such acquisition function and call it Predictive Entropy Search with Constraints (PESC). PESC is an approximation to the expected information gain criterion and it compares favorably to alternative approaches based on improvement in several synthetic and real-world problems. In addition to this, we consider problems with a mix of functions that are fast and slow to evaluate. These problems require balancing the amount of time spent in the meta-computation of PESC and in the actual evaluation of the target objective. We take a bounded rationality approach and develop a partial update for PESC which trades off accuracy against speed. We then propose a method for adaptively switching between the partial and full updates for PESC. This allows us to interpolate between versions of PESC that are efficient in terms of function evaluations and those that are efficient in terms of wall-clock time. Overall, we demonstrate that PESC is an effective algorithm that provides a promising direction towards a unified solution for constrained Bayesian optimization.
Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan. In this paper, we develop SAOLA, a Scalable and Accurate OnLine Approach for feature selection. With a theoretical analysis on bounds of the pairwise correlations between features, SAOLA employs novel online pairwise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner. Furthermore, to tackle the dimensionality that arrives by groups, we extend our SAOLA algorithm, and then propose a novel group-SAOLA algorithm for online group feature selection. The group-SAOLA algorithm can online maintain a set of feature groups that is sparse at the level of both groups and individual features simultaneously. An empirical study using a series of benchmark real data sets shows that our two algorithms, SAOLA and group-SAOLA, are scalable on data sets of extremely high dimensionality, and have superior performance over the state-of-the-art feature selection methods.
This paper addresses the general problem of reinforcement learning (RL) in partially observable environments. In 2013, our large RL recurrent neural networks (RNNs) learned from scratch to drive simulated cars from high-dimensional video input. However, real brains are more powerful in many ways. In particular, they learn a predictive model of their initially unknown environment, and somehow use it for abstract (e.g., hierarchical) planning and reasoning. Guided by algorithmic information theory, we describe RNN-based AIs (RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending sequences of tasks, some of them provided by the user, others invented by the RNNAI itself in a curious, playful fashion, to improve its RNN-based world model. Unlike our previous model-building RNN-based RL machines dating back to 1990, the RNNAI learns to actively query its model for abstract reasoning and planning and decision making, essentially ‘learning to think.’ The basic ideas of this report can be applied to many other cases where one RNN-like system exploits the algorithmic information content of another. They are taken from a grant proposal submitted in Fall 2014, and also explain concepts such as ‘mirror neurons.’ Experimental results will be described in separate papers.
This paper introduces a novel type theory and logic for probabilistic reasoning. Its logic is quantitative, with fuzzy predicates. It includes normalisation and conditioning of states. This conditioning uses a key aspect that distinguishes our probabilistic type theory from quantum type theory, namely the bijective correspondence between predicates and side-effect free actions (called instrument, or assert, maps). The paper shows how suitable computation rules can be derived from this predicate-action correspondence, and uses these rules for calculating conditional probabilities in two well-known examples of Bayesian reasoning in (graphical) models. Our type theory may thus form the basis for a mechanisation of Bayesian inference.
Suicide is a global public health problem. Early detection of individual suicide risk plays a key role in suicide prevention. In this paper, we propose to look into individual suicide risk through time series analysis of personal linguistic expression on social media (Weibo). We examined temporal patterns of the linguistic expression of individuals on Chinese social media (Weibo). Then, we used such temporal patterns as predictor variables to build classification models for estimating levels of individual suicide risk. Characteristics of time sequence curves to linguistic features including parentheses, auxiliary verbs, personal pronouns and body words are reported to affect performance of suicide most, and the predicting model has a accuracy higher than 0.60, shown by the results. This paper confirms the efficiency of the social media data in detecting individual suicide risk. Results of this study may be insightful for improving the performance of suicide prevention programs.
Automated sentiment analysis and opinion mining is a complex process concerning the extraction of useful subjective information from text. The explosion of user generated content on the Web, especially the fact that millions of users, on a daily basis, express their opinions on products and services to blogs, wikis, social networks, message boards, etc., render the reliable, automated export of sentiments and opinions from unstructured text crucial for several commercial applications. In this paper, we present a novel hybrid vectorization approach for textual resources that combines a weighted variant of the popular Word2Vec representation (based on Term Frequency-Inverse Document Frequency) representation and with a Bag- of-Words representation and a vector of lexicon-based sentiment values. The proposed text representation approach is assessed through the application of several machine learning classification algorithms on a dataset that is used extensively in literature for sentiment detection. The classification accuracy derived through the proposed hybrid vectorization approach is higher than when its individual components are used for text represenation, and comparable with state-of-the-art sentiment detection methodologies.
Many exact and approximate solution methods for Markov Decision Processes (MDPs) attempt to exploit structure in the problem and are based on factorization of the value function. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, meaning that approximation architectures are restricted in the problem sizes and types they can handle. We present an approach to mitigate this limitation for certain types of multiagent systems, exploiting a property that can be thought of as ‘anonymous influence’ in the factored MDP. Anonymous influence summarizes joint variable effects efficiently whenever the explicit representation of variable identity in the problem can be avoided. We show how representational benefits from anonymity translate into computational efficiencies, both for general variable elimination in a factor graph but in particular also for the approximate linear programming solution to factored MDPs. The latter allows to scale linear programming to factored MDPs that were previously unsolvable. Our results are shown for the control of a stochastic disease process over a densely connected graph with 50 nodes and 25 agents.
We consider the problem of estimating a directed acyclic graph (DAG) for a multivariate normal distribution from high-dimensional data with $p\gg n$. Our main results establish nonasymptotic deviation bounds on the estimation error, sparsity bounds, and model selection consistency for a penalized least squares estimator under concave regularization. The proofs rely on interpreting the graphical model as a recursive linear structural equation model, which reduces the estimation problem to a series of tractable neighbourhood regressions and allows us to avoid making any assumptions regarding faithfulness. In doing so, we provide some novel techniques for handling general nonidentifiable and nonconvex problems. These techniques are used to guarantee uniform control over a superexponential number of neighbourhood regression problems by exploiting various notions of monotonicity among them. Our results apply to a wide variety of practical situations that allow for arbitrary nondegenerate covariance structures as well as many popular regularizers including the MCP, SCAD, $\ell_{0}$ and $\ell_{1}$.
We consider the problem of efficiently computing the maximum likelihood estimator in Generalized Linear Models (GLMs) when the number of observations is much larger than the number of coefficients ($n \gg p \gg 1$). In this regime, optimization algorithms can immensely benefit from approximate second order information. We propose an alternative way of constructing the curvature information by formulating it as an estimation problem and applying a Stein-type lemma, which allows further improvements through sub-sampling and eigenvalue thresholding. Our algorithm enjoys fast convergence rates, resembling that of second order methods, with modest per-iteration cost. We provide its convergence analysis for the general case where the rows of the design matrix are samples from a sub-gaussian distribution. We show that the convergence has two phases, a quadratic phase followed by a linear phase. Finally, we empirically demonstrate that our algorithm achieves the highest performance compared to various algorithms on several datasets.
Human language is recognized as a very complex domain since decades. No computer system has been able to reach human levels of performance so far. The only known computational system capable of proper language processing is the human brain. While we gather more and more data about the brain, its fundamental computational processes still remain obscure. The lack of a sound computational brain theory also prevents the fundamental understanding of Natural Language Processing. As always when science lacks a theoretical foundation, statistical modeling is applied to accommodate as many sampled real-world data as possible. An unsolved fundamental issue is the actual representation of language (data) within the brain, denoted as the Representational Problem. Starting with Jeff Hawkins’ Hierarchical Temporal Memory (HTM) theory, a consistent computational theory of the human cortex, we have developed a corresponding theory of language data representation: The Semantic Folding Theory. The process of encoding words, by using a topographic semantic space as distributional reference frame into a sparse binary representational vector is called Semantic Folding and is the central topic of this document. Semantic Folding describes a method of converting language from its symbolic representation (text) into an explicit, semantically grounded representation that can be generically processed by Hawkins’ HTM networks. As it turned out, this change in representation, by itself, can solve many complex NLP problems by applying Boolean operators and a generic similarity function like the Euclidian Distance. Many practical problems of statistical NLP systems, like the high cost of computation, the fundamental incongruity of precision and recall , the complex tuning procedures etc., can be elegantly overcome by applying Semantic Folding.