Learning Classifier Systems (LCS) are population-based reinforcement learners that were originally designed to model various cognitive phenomena. This paper presents an explicitly cognitive LCS by using spiking neural networks as classifiers, providing each classifier with a measure of temporal dynamism. We employ a constructivist model of growth of both neurons and synaptic connections, which permits a Genetic Algorithm (GA) to automatically evolve sufficiently-complex neural structures. The spiking classifiers are coupled with a temporally-sensitive reinforcement learning algorithm, which allows the system to perform temporal state decomposition by appropriately rewarding ‘macro-actions,’ created by chaining together multiple atomic actions. The combination of temporal reinforcement learning and neural information processing is shown to outperform benchmark neural classifier systems, and successfully solve a robotic navigation task.
In this paper, an event network is presented for exploring open information, where linguistic units about an event are organized for analysing. The process is divided into three steps: document event detection, event network construction and event network analysis. First, by implementing event detection or tracking, documents are retrospectively (or on-line) organized into document events. Secondly, for each of the document event, linguistic units are extracted and combined into event networks. Thirdly, various analytic methods are proposed for event network analysis. In our application methodologies are presented for exploring open information.
Bayesian networks, and especially their structures, are powerful tools for representing conditional independencies and dependencies between random variables. In applications where related variables form a priori known groups, chosen to represent different ‘views’ to or aspects of the same entities, one may be more interested in modeling dependencies between groups of variables rather than between individual variables. Motivated by this, we study prospects of representing relationships between variable groups using Bayesian network structures. We show that for dependency structures between groups to be learnable, the data have to satisfy the so-called groupwise faithfulness assumption. We also show that one cannot learn causal relations between groups using only groupwise conditional independencies, but also variable-wise relations are needed. Additionally, we present algorithms for finding the groupwise dependency structures.
A general approach for anomaly detection or novelty detection consists in estimating high density regions or Minimum Volume (MV) sets. The One-Class Support Vector Machine (OCSVM) is a state-of-the-art algorithm for estimating such regions from high dimensional data. Yet it suffers from practical limitations. When applied to a limited number of samples it can lead to poor performance even when picking the best hyperparameters. Moreover the solution of OCSVM is very sensitive to the selection of hyperparameters which makes it hard to optimize in an unsupervised setting. We present a new approach to estimate MV sets using the OCSVM with a different choice of the parameter controlling the proportion of outliers. The solution function of the OCSVM is learnt on a training set and the desired probability mass is obtained by adjusting the offset on a test set to prevent overfitting. Models learnt on different train/test splits are then aggregated to reduce the variance induced by such random splits. Our approach makes it possible to tune the hyperparameters automatically and obtain nested set estimates. Experimental results show that our approach outperforms the standard OCSVM formulation while suffering less from the curse of dimensionality than kernel density estimates. Results on actual data sets are also presented.
Language is a social phenomenon and inherent to its social nature is that it is constantly changing. Recently, a surge of interest can be observed within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of ‘Computational Sociolinguistics’ that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.
Big data and the Internet of Things era continue to challenge computational systems. Several technology solutions such as NoSQL databases have been developed to deal with this challenge. In order to generate meaningful results from large datasets, analysts often use a graph representation which provides an intuitive way to work with the data. Graph vertices can represent users and events, and edges can represent the relationship between vertices. Graph algorithms are used to extract meaningful information from these very large graphs. At MIT, the Graphulo initiative is an effort to perform graph algorithms directly in NoSQL databases such as Apache Accumulo or SciDB, which have an inherently sparse data storage scheme. Sparse matrix operations have a history of efficient implementations and the Graph Basic Linear Algebra Subprogram (GraphBLAS) community has developed a set of key kernels that can be used to develop efficient linear algebra operations. However, in order to use the GraphBLAS kernels, it is important that common graph algorithms be recast using the linear algebra building blocks. In this article, we look at common classes of graph algorithms and recast them into linear algebra operations using the GraphBLAS building blocks.
ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web, or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, ROOT offers packages for complex data modeling and fitting, as well as multivariate classification based on machine learning techniques. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks – e.g. data mining in HEP – by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.
The vector autoregression (VAR) has long proven to be an effective method for modeling the joint dynamics of macroe- conomic time series as well as forecasting. A major shortcomings of the VAR that has hindered its applicability is its heavy parameterization: the parameter space grows quadratically with the number of series included, quickly exhausting the available degrees of freedom. Consequently, forecasting using VARs is intractable for low-frequency, high-dimensional macroeconomic data. However, empirical evidence suggests that VARs that incorporate more component series tend to result in more accurate forecasts. Conventional methods that allow for the estimation of large VARs either tend to require ad hoc subjective specifications or are computationally infeasible. Moreover, as global economies become more intricately intertwined, there has been substantial interest in incorporating the impact of stochastic, unmodeled exogenous variables. Vector autoregression with exogenous variables (VARX) extends the VAR to allow for the inclusion of unmodeled variables, but it similarly faces dimensionality challenges. We introduce the VARX-L framework, a structured family of VARX models, and provide methodology which allows for both efficient estimation and accurate forecasting in high-dimensional analysis. VARX-L adapts several prominent scalar regression regularization techniques to a vector time series context to greatly reduce the parameter space of VAR and VARX models. We formulate convex optimization procedures that are amenable to efficient solutions for the time-ordered, high-dimensional problems we aim to solve. We also highlight a compelling extension that allows for shrinking toward reference models. We demonstrate the efficacy of VARX-L in both low- and high-dimensional macroeconomic applications and simulated data examples.
Word representations induced from models with discrete latent variables (e.g.\ HMMs) have been shown to be beneficial in many NLP applications. In this work, we exploit labeled syntactic dependency trees and formalize the induction problem as unsupervised learning of tree-structured hidden Markov models. Syntactic functions are used as additional observed variables in the model, influencing both transition and emission components. Such syntactic information can potentially lead to capturing more fine-grain and functional distinctions between words, which, in turn, may be desirable in many NLP applications. We evaluate the word representations on two tasks — named entity recognition and semantic frame identification. We observe improvements from exploiting syntactic function information in both cases, and the results rivaling those of state-of-the-art representation learning methods. Additionally, we revisit the relationship between sequential and unlabeled-tree models and find that the advantage of the latter is not self-evident.