A Normative Theory of Adaptive Dimensionality Reduction in Neural Networks

To make sense of the world our brains must analyze high-dimensional datasets streamed by our sensory organs. Because such analysis begins with dimensionality reduction, modelling early sensory processing requires biologically plausible online dimensionality reduction algorithms. Recently, we derived such an algorithm, termed similarity matching, from a Multidimensional Scaling (MDS) objective function. However, in the existing algorithm, the number of output dimensions is set a priori by the number of output neurons and cannot be changed. Because the number of informative dimensions in sensory inputs is variable there is a need for adaptive dimensionality reduction. Here, we derive biologically plausible dimensionality reduction algorithms which adapt the number of output dimensions to the eigenspectrum of the input covariance matrix. We formulate three objective functions which, in the offline setting, are optimized by the projections of the input dataset onto its principal subspace scaled by the eigenvalues of the output covariance matrix. In turn, the output eigenvalues are computed as i) soft-thresholded, ii) hard-thresholded, iii) equalized thresholded eigenvalues of the input covariance matrix. In the online setting, we derive the three corresponding adaptive algorithms and map them onto the dynamics of neuronal activity in networks with biologically plausible local learning rules. Remarkably, in the last two networks, neurons are divided into two classes which we identify with principal neurons and interneurons in biological circuits.


A General Framework for Constrained Bayesian Optimization using Information-based Search

We present an information-theoretic framework for solving global black-box optimization problems that also have black-box constraints. Of particular interest to us is to efficiently solve problems with decoupled constraints, in which subsets of the objective and constraint functions may be evaluated independently. For example, when the objective is evaluated on a CPU and the constraints are evaluated independently on a GPU. These problems require an acquisition function that can be separated into the contributions of the individual function evaluations. We develop one such acquisition function and call it Predictive Entropy Search with Constraints (PESC). PESC is an approximation to the expected information gain criterion and it compares favorably to alternative approaches based on improvement in several synthetic and real-world problems. In addition to this, we consider problems with a mix of functions that are fast and slow to evaluate. These problems require balancing the amount of time spent in the meta-computation of PESC and in the actual evaluation of the target objective. We take a bounded rationality approach and develop a partial update for PESC which trades off accuracy against speed. We then propose a method for adaptively switching between the partial and full updates for PESC. This allows us to interpolate between versions of PESC that are efficient in terms of function evaluations and those that are efficient in terms of wall-clock time. Overall, we demonstrate that PESC is an effective algorithm that provides a promising direction towards a unified solution for constrained Bayesian optimization.


Scalable and Accurate Online Feature Selection for Big Data

Feature selection is important in many big data applications. There are at least two critical challenges. Firstly, in many applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly, feature selection has to be highly scalable, preferably in an online manner such that each feature can be processed in a sequential scan. In this paper, we develop SAOLA, a Scalable and Accurate OnLine Approach for feature selection. With a theoretical analysis on bounds of the pairwise correlations between features, SAOLA employs novel online pairwise comparison techniques to address the two challenges and maintain a parsimonious model over time in an online manner. Furthermore, to tackle the dimensionality that arrives by groups, we extend our SAOLA algorithm, and then propose a novel group-SAOLA algorithm for online group feature selection. The group-SAOLA algorithm can online maintain a set of feature groups that is sparse at the level of both groups and individual features simultaneously. An empirical study using a series of benchmark real data sets shows that our two algorithms, SAOLA and group-SAOLA, are scalable on data sets of extremely high dimensionality, and have superior performance over the state-of-the-art feature selection methods.


On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

This paper addresses the general problem of reinforcement learning (RL) in partially observable environments. In 2013, our large RL recurrent neural networks (RNNs) learned from scratch to drive simulated cars from high-dimensional video input. However, real brains are more powerful in many ways. In particular, they learn a predictive model of their initially unknown environment, and somehow use it for abstract (e.g., hierarchical) planning and reasoning. Guided by algorithmic information theory, we describe RNN-based AIs (RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending sequences of tasks, some of them provided by the user, others invented by the RNNAI itself in a curious, playful fashion, to improve its RNN-based world model. Unlike our previous model-building RNN-based RL machines dating back to 1990, the RNNAI learns to actively query its model for abstract reasoning and planning and decision making, essentially ‘learning to think.’ The basic ideas of this report can be applied to many other cases where one RNN-like system exploits the algorithmic information content of another. They are taken from a grant proposal submitted in Fall 2014, and also explain concepts such as ‘mirror neurons.’ Experimental results will be described in separate papers.


A Type Theory for Probabilistic and Bayesian Reasoning

This paper introduces a novel type theory and logic for probabilistic reasoning. Its logic is quantitative, with fuzzy predicates. It includes normalisation and conditioning of states. This conditioning uses a key aspect that distinguishes our probabilistic type theory from quantum type theory, namely the bijective correspondence between predicates and side-effect free actions (called instrument, or assert, maps). The paper shows how suitable computation rules can be derived from this predicate-action correspondence, and uses these rules for calculating conditional probabilities in two well-known examples of Bayesian reasoning in (graphical) models. Our type theory may thus form the basis for a mechanisation of Bayesian inference.


Recognizing Temporal Linguistic Expression Pattern of Individual with Suicide Risk on Social Media

Suicide is a global public health problem. Early detection of individual suicide risk plays a key role in suicide prevention. In this paper, we propose to look into individual suicide risk through time series analysis of personal linguistic expression on social media (Weibo). We examined temporal patterns of the linguistic expression of individuals on Chinese social media (Weibo). Then, we used such temporal patterns as predictor variables to build classification models for estimating levels of individual suicide risk. Characteristics of time sequence curves to linguistic features including parentheses, auxiliary verbs, personal pronouns and body words are reported to affect performance of suicide most, and the predicting model has a accuracy higher than 0.60, shown by the results. This paper confirms the efficiency of the social media data in detecting individual suicide risk. Results of this study may be insightful for improving the performance of suicide prevention programs.


Machine Learning Sentiment Prediction based on Hybrid Document Representation

Automated sentiment analysis and opinion mining is a complex process concerning the extraction of useful subjective information from text. The explosion of user generated content on the Web, especially the fact that millions of users, on a daily basis, express their opinions on products and services to blogs, wikis, social networks, message boards, etc., render the reliable, automated export of sentiments and opinions from unstructured text crucial for several commercial applications. In this paper, we present a novel hybrid vectorization approach for textual resources that combines a weighted variant of the popular Word2Vec representation (based on Term Frequency-Inverse Document Frequency) representation and with a Bag- of-Words representation and a vector of lexicon-based sentiment values. The proposed text representation approach is assessed through the application of several machine learning classification algorithms on a dataset that is used extensively in literature for sentiment detection. The classification accuracy derived through the proposed hybrid vectorization approach is higher than when its individual components are used for text represenation, and comparable with state-of-the-art sentiment detection methodologies.


Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs (Extended Version)

Many exact and approximate solution methods for Markov Decision Processes (MDPs) attempt to exploit structure in the problem and are based on factorization of the value function. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, meaning that approximation architectures are restricted in the problem sizes and types they can handle. We present an approach to mitigate this limitation for certain types of multiagent systems, exploiting a property that can be thought of as ‘anonymous influence’ in the factored MDP. Anonymous influence summarizes joint variable effects efficiently whenever the explicit representation of variable identity in the problem can be avoided. We show how representational benefits from anonymity translate into computational efficiencies, both for general variable elimination in a factor graph but in particular also for the approximate linear programming solution to factored MDPs. The latter allows to scale linear programming to factored MDPs that were previously unsolvable. Our results are shown for the control of a stochastic disease process over a densely connected graph with 50 nodes and 25 agents.


Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression

We consider the problem of estimating a directed acyclic graph (DAG) for a multivariate normal distribution from high-dimensional data with p\gg n. Our main results establish nonasymptotic deviation bounds on the estimation error, sparsity bounds, and model selection consistency for a penalized least squares estimator under concave regularization. The proofs rely on interpreting the graphical model as a recursive linear structural equation model, which reduces the estimation problem to a series of tractable neighbourhood regressions and allows us to avoid making any assumptions regarding faithfulness. In doing so, we provide some novel techniques for handling general nonidentifiable and nonconvex problems. These techniques are used to guarantee uniform control over a superexponential number of neighbourhood regression problems by exploiting various notions of monotonicity among them. Our results apply to a wide variety of practical situations that allow for arbitrary nondegenerate covariance structures as well as many popular regularizers including the MCP, SCAD, \ell_{0} and \ell_{1}.


Newton-Stein Method: An optimization method for GLMs via Stein’s Lemma

We consider the problem of efficiently computing the maximum likelihood estimator in Generalized Linear Models (GLMs) when the number of observations is much larger than the number of coefficients (n \gg p \gg 1). In this regime, optimization algorithms can immensely benefit from approximate second order information. We propose an alternative way of constructing the curvature information by formulating it as an estimation problem and applying a Stein-type lemma, which allows further improvements through sub-sampling and eigenvalue thresholding. Our algorithm enjoys fast convergence rates, resembling that of second order methods, with modest per-iteration cost. We provide its convergence analysis for the general case where the rows of the design matrix are samples from a sub-gaussian distribution. We show that the convergence has two phases, a quadratic phase followed by a linear phase. Finally, we empirically demonstrate that our algorithm achieves the highest performance compared to various algorithms on several datasets.


Semantic Folding Theory And its Application in Semantic Fingerprinting

Human language is recognized as a very complex domain since decades. No computer system has been able to reach human levels of performance so far. The only known computational system capable of proper language processing is the human brain. While we gather more and more data about the brain, its fundamental computational processes still remain obscure. The lack of a sound computational brain theory also prevents the fundamental understanding of Natural Language Processing. As always when science lacks a theoretical foundation, statistical modeling is applied to accommodate as many sampled real-world data as possible. An unsolved fundamental issue is the actual representation of language (data) within the brain, denoted as the Representational Problem. Starting with Jeff Hawkins’ Hierarchical Temporal Memory (HTM) theory, a consistent computational theory of the human cortex, we have developed a corresponding theory of language data representation: The Semantic Folding Theory. The process of encoding words, by using a topographic semantic space as distributional reference frame into a sparse binary representational vector is called Semantic Folding and is the central topic of this document. Semantic Folding describes a method of converting language from its symbolic representation (text) into an explicit, semantically grounded representation that can be generically processed by Hawkins’ HTM networks. As it turned out, this change in representation, by itself, can solve many complex NLP problems by applying Boolean operators and a generic similarity function like the Euclidian Distance. Many practical problems of statistical NLP systems, like the high cost of computation, the fundamental incongruity of precision and recall , the complex tuning procedures etc., can be elegantly overcome by applying Semantic Folding.


The Writhe of Permutations and Random Framed Knots

Optimization theory of Hebbian/anti-Hebbian networks for PCA and whitening

On automorphisms and structural properties of generalized Cayley graphs

An algebraic approach to enumerating non-equivalent double traces in graphs

Ask, and shall you receive?: Understanding Desire Fulfillment in Natural Language Text

Non-ambiguous trees: new results and generalisation

Two Universality Properties Associated with the Monkey Model of Zipf’s Law

Distributionally robust inventory control when demand is a martingale

The characterization of the graphs of bilinear $(d\times d)$-forms over $\mathbb{F}_2$

Universality laws for randomized dimension reduction, with applications

Chromatic roots and limits of dense graphs

Stochastic simulation of predictive space-time scenarios of wind speed using observations and physical models

Randomized Hamiltonian Monte Carlo

Modeling Dynamic Relationships Between Characters in Literary Novels

On the Complexity of Multi-Parameterized Cluster Editing

High-dimensional Multivariate Mediation: with Application to Neuroimaging Data

Diameter of Ramanujan Graphs and Random Cayley Graphs with Numerics

A concrete approach to diagonal short time asymptotics of heat kernels associated with sub-Laplacian on CR manifolds

Cost-aware Pre-training for Multiclass Cost-sensitive Deep Learning

Scaling to 1024 software processes and hardware cores of the distributed simulation of a spiking neural network including up to 20G synapses

On connected degree sequences

The Hausdorff dimension of multivariate operator-self-similar Gaussian random fields

$m$-Cycle Packings of $(λ+μ)K_{v+u}-λK_v$: $m$ even

Influence diagrams for the optimization of a vehicle speed profile

Generalized Post-Widder inversion formula with application to statistics

An Improved Approximation Guarantee for the Maximum Budgeted Allocation Problem

‘Piaf’ vs ‘Adele’: classifying encyclopedic queries using automatically labeled training data

Antifactor of regular bipartite graphs

Randomization method and backward SDEs for optimal control of partially observed path-dependent stochastic systems

Discrete time McKean-Vlasov control problem: a dynamic programming approach

Simultaneous observation of small- and large-energy-transfer electron-electron scattering in three dimensional indium oxide thick films

On independent $[1,2]$-sets in trees

The Alternating Stock Size Problem and the Gasoline Puzzle

Ruin probabilities and passage times of $γ$-reflected Gaussian processes with stationary increments

Efficient Deterministic Single Round Document Exchange for Edit Distance

Algorithms as Mechanisms: The Price of Anarchy of Relax-and-Round

Non-Adaptive Group Testing on Graphs

Differential Network Analysis via the Lasso Penalized D-Trace Loss

Proximal gradient method for huberized support vector machine

Constant-approximation algorithms for highly connected multi-dominating sets in unit disk graphs

Alternating direction method of multipliers for regularized multiclass support vector machines

Fast Moving Sampling Designs in Temporal Networks

Scaling POMDPs For Selecting Sellers in E-markets-Extended Version

Coexistence of energy diffusion and spin sub-diffusion in the ergodic phase of a many body localizable spin chain

Hypertoric varieties and zonotopal tilings

Aspect-based Opinion Summarization with Convolutional Neural Networks

Independence number of triple systems with no tight cycle

Position paper: a general framework for applying machine learning techniques in operating room

The plaid model and outer billiards on kites

Group SLOPE – adaptive selection of groups of predictors

One more Turán number and Ramsey number for the loose 3-uniform path of length three

The effect of competition and horizontal trait inheritance on invasion, fixation and polymorphism

Multiple-Instance Learning: Radon-Nikodym Approach to Distribution Regression Problem

Retardation and flow at the glass transition

Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

Game options in an imperfect market with default

Combinatorially rigid simple polytopes with d+3 facets

Exchangeable optimal transportation and log-concavity

On the spectral radius of nonregular uniform hypergraphs

Long Concept Query on Conceptual Taxonomies

Entity Suggestion by Example using a Conceptual Taxonomy

$k$-Means for Streaming and Distributed Big Sparse Data

How do the naive Bayes classifier and the Support Vector Machine compare in their ability to forecast the Stock Exchange of Thailand?

Agri-Info: Cloud Based Autonomic System for Delivering Agriculture as a Service

Robotic Search & Rescue via Online Multi-task Reinforcement Learning

Bootstrapping Ternary Relation Extractors

MidRank: Learning to rank based on subsequences

Right Angles in (F_q)^d

Column-Oriented Datalog Materialization for Large Knowledge Graphs (Extended Technical Report)

4-coloring ($P_6$, bull)-free graphs

On exceptional sets in Erdős-Rényi limit theorem II

Applying deep learning to classify pornographic images and videos

Degradable channels, less noisy channels, and quantum statistical morphisms: an equivalence relation

Malliavin Calculus for regularity structures: the case of gPAM

Selective inference after cross-validation

A note on the ‘conditional triviality’ property for regular conditional distributions

Designing high-fidelity single-shot three-qubit gates: A machine learning approach

Multi-Cloud Resource Provisioning with Aneka: A Unified and Integrated Utilisation of Microsoft Azure and Amazon EC2 Instances

Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) – The $\ell_0$ Method

A Beta-splitting model for evolutionary trees

Colouring powers and girth