A Gibbs Sampler for Multivariate Linear Regression

Kelly (2007, hereafter K07) described an efficient algorithm, using Gibbs sampling, for performing linear regression in the fairly general case where non-zero measurement errors exist for both the covariates and response variables, where these measurements may be correlated (for the same data point), where the response variable is affected by intrinsic scatter in addition to measurement error, and where the prior distribution of covariates is modeled by a flexible mixture of Gaussians rather than assumed to be uniform. Here I extend the K07 algorithm in two ways. First, the procedure is generalized to the case of multiple response variables. Second, I describe how to model the prior distribution of covariates using a Dirichlet process, which can be thought of as a Gaussian mixture where the number of mixture components is learned from the data. I present an example of multivariate regression using the extended algorithm, namely fitting scaling relations of the gas mass, temperature, and luminosity of dynamically relaxed galaxy clusters as a function of their mass and redshift. An implementation of the Gibbs sampler in the R language, called LRGS, is provided.

Bayesian Masking: Sparse Bayesian Estimation with Weaker Shrinkage Bias

A common strategy for sparse linear regression is to introduce regularization, which eliminates irrelevant features by letting the corresponding weights be zeros. Regularization, however, often shrinks the estimator for relevant features, which leads incorrect feature selection. Motivated by the above issue, we propose Bayesian masking (BM), a sparse estimation method which imposes no regularization on the weights. The key concept of BM is to introduce binary latent variables that randomly mask features. Estimating the masking rates determines the relevances of the features automatically. We derive a variational Bayesian inference algorithm that maximizes a lower bound of the factorized information criterion (FIC), which is a recently-developed asymptotic evaluation of the marginal log-likelihood. We also propose reparametrization that accelerates the convergence. We demonstrate that BM outperforms Lasso and automatic relevance determination (ARD) in terms of the sparsity-shrinkage trade-off.

Community Detection in Networks with Node Features

Many methods have been proposed for community detection in networks, but most of them do not take into account additional information on the nodes that is often available in practice. In this paper, we propose a new joint community detection criterion that uses both the network edge information and the node features to detect community structures. One advantage our method has over existing joint detection approaches is the flexibility of learning the impact of different features which may differ across communities. Another advantage is the flexibility of choosing the amount of influence the feature information has on communities. The method is asymptotically consistent under the block model with additional assumptions on the feature distributions, and performs well on simulated and real networks.

Encoding Prior Knowledge with Eigenword Embeddings

Canonical correlation analysis (CCA) is a method for reducing the dimension of data represented using two views. It has been previously used to derive word embeddings, where one view indicates a word, and the other view indicates its context. We describe a way to incorporate prior knowledge into CCA, give a theoretical justification for it, and test it by deriving word embeddings and evaluating them on a myriad of datasets.

Extreme Value Theory for Time Series using Peak-Over-Threshold method

This brief paper summarize the chances offered by the Peak-Over-Threshold method, related with analysis of extremes. Identification of appropriate Value at Risk can be solved by fitting data with a Generalized Pareto Distribution. Also an estimation of value for the Expected Shortfall can be useful, and the application of these few concepts are valid for the most wide range of risk analysis, from the financial application to the operational risk assessment, through the analysis for climate time series; resolving the problem of borderline data.

Fast Clustering and Topic Modeling Based on Rank-2 Nonnegative Matrix Factorization

The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data. In this paper, we propose a fast method for hierarchical clustering and topic modeling called HierNMF2. Our method is based on fast Rank-2 nonnegative matrix factorization (NMF) that performs binary clustering and an efficient node splitting rule. Further utilizing the final leaf nodes generated in HierNMF2 and the idea of nonnegative least squares fitting, we propose a new clustering/topic modeling method called FlatNMF2 that recovers a flat clustering/topic modeling result in a very simple yet significantly more effective way than any other existing methods. We describe highly optimized open source software in C++ for both HierNMF2 and FlatNMF2 for hierarchical and partitional clustering/topic modeling of document data sets. Substantial experimental tests are presented that illustrate significant improvements both in computational time as well as quality of solutions. We compare our methods to other clustering methods including K-means, standard NMF, and CLUTO, and also topic modeling methods including latent Dirichlet allocation (LDA) and recently proposed algorithms for NMF with separability constraints. Overall, we present efficient tools for analyzing large-scale data sets, and techniques that can be generalized to many other data analytics problem domains.

Generating Weather Forecast Texts with Case Based Reasoning

Several techniques have been used to generate weather forecast texts. In this paper, case based reasoning (CBR) is proposed for weather forecast text generation because similar weather conditions occur over time and should have similar forecast texts. CBR-METEO, a system for generating weather forecast texts was developed using a generic framework (jCOLIBRI) which provides modules for the standard components of the CBR architecture. The advantage in a CBR approach is that systems can be built in minimal time with far less human effort after initial consultation with experts. The approach depends heavily on the goodness of the retrieval and revision components of the CBR process. We evaluated CBRMETEO with NIST, an automated metric which has been shown to correlate well with human judgements for this domain. The system shows comparable performance with other NLG systems that perform the same task.

On-the-Fly Learning in a Perpetual Learning Machine

Despite the promise of brain-inspired machine learning, deep neural networks (DNN) have frustratingly failed to bridge the deceptively large gap between learning and memory. Here, we introduce a Perpetual Learning Machine; a new type of DNN that is capable of brain-like dynamic ‘on the fly’ learning because it exists in a self-supervised state of Perpetual Stochastic Gradient Descent. Thus, we provide the means to unify learning and memory within a machine learning framework.

Parallel Knowledge Embedding with MapReduce on a Multi-core Processor

This article firstly attempts to explore parallel algorithms of learning distributed representations for both entities and relations in large-scale knowledge repositories with {\it MapReduce} programming model on a multi-core processor. We accelerate the training progress of a canonical knowledge embedding method, i.e. {\it translating embedding} ({\bf TransE}) model, by dividing a whole knowledge repository into several balanced subsets, and feeding each subset into an individual core where local embeddings can concurrently run updating during the {\it Map} phase. However, it usually suffers from inconsistent low-dimensional vector representations of the same key, which are collected from different {\it Map} workers, and further leads to conflicts when conducting {\it Reduce} to merge the various vectors associated with the same key. Therefore, we try several strategies to acquire the merged embeddings which may not only retain the performance of {\it entity inference}, {\it relation prediction}, and even {\it triplet classification} evaluated by the single-thread {\bf TransE} on several well-known knowledge bases such as Freebase and NELL, but also scale up the learning speed along with the number of cores within a processor. So far, the empirical studies show that we could achieve comparable results as the single-thread {\bf TransE} performs by the {\it stochastic gradient descend} (SGD) algorithm, as well as increase the training speed multiple times via adapting the {\it batch gradient descend} (BGD) algorithm for {\it MapReduce} paradigm.

Training a Restricted Boltzmann Machine for Classification by Labeling Model Samples

We propose an alternative method for training a classification model. Using the MNIST set of handwritten digits and Restricted Boltzmann Machines, it is possible to reach a classification performance competitive to semi-supervised learning if we first train a model in an unsupervised fashion on unlabeled data only, and then manually add labels to model samples instead of training data samples with the help of a GUI. This approach can benefit from the fact that model samples can be presented to the human labeler in a video-like fashion, resulting in a higher number of labeled examples. Also, after some initial training, hard-to-classify examples can be distinguished from easy ones automatically, saving manual work.

$p$-Adic Random Walk in a Potential

A compact aVLSI conductance-based silicon neuron

A novel principal component analysis for spatially-misaligned multivariate air pollution data

A Reconfigurable Mixed-signal Implementation of a Neuromorphic ADC

A robust approach for estimating change-points in the mean of an AR(p) process

A scientific definition for fish school cohesiveness

A Theory of Solving TAP Equations for Ising Models with General Invariant Random Matrices

A tree-based kernel for graphs with continuous attributes

Abelian regular coverings of the quaternion hypermap

Active Learning for Adaptive Clinical Trials: a Stream-based Selective Sampling Strategy

Advanced Multilevel Node Separator Algorithms

An extension of the classification of high rank regular polytopes

Building a Truly Distributed Constraint Solver with JADE

Cointegrating Jumps: an Application to Energy Facilities

Comparing non-nested models in the search for new physics

Decomposition of Brownian loop-soup clusters

Differential Spatial Modulation with Gray Coded Antenna Activation Order

Distance labellings of Cayley graphs of semigroups

Energy Dependence and Scaling Property of Localization Length near a Gapped Flat Band

Factorization of Temperley–Lieb diagrams

Finding the Leftmost Critical Factorization on Unordered Alphabet

Fourier Phase Retrieval with a Single Mask by Douglas-Rachford Algorithm

Generalized Quantile Treatment Effect: A Flexible Bayesian Approach Using Quantile Ratio Smoothing

In-Band $α$-Duplex Scheme for Cellular Networks: A Stochastic Geometry Approach

Incidences and pairs of dot products

Large time asymptotics for the parabolic Anderson model driven by spatially correlated noise

Machine Learning Model of the Swift/BAT Trigger Algorithm for Long GRB Population Studies

Model Predictive Path Integral Control using Covariance Variable Importance Sampling

Necessary and Sufficient Conditions for High-Dimensional Posterior Consistency under $g$-Priors

On Record Cayley Graphs of Diameter Two

On Recurrence and Transience of Two-Dimensional Lévy and Lévy-Type Processes

On the distance spectra of graphs

On the Equivalence among Problems of Bounded Width

On the fully commutative elements of type $\tilde C$ and faithfullness of related towers

On TimeML-Compliant Temporal Expression Extraction in Turkish

On viscosity solution of HJB equations with state constraints and reflection control

Palm measures and rigidity phenomena in point processes

Pattern avoidances seen in multiplicities of maximal weights of affine Lie algebra representations

PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data

Rough path transport equation with discontinuous coefficient – regularization by fractional Brownian motion

Sampling-based Causal Inference in Cue Combination and its Neural Implementation

Scaling the Gibbs posterior credible regions

Self-organized criticality in a discrete model for Smoluchowski’s equation with limited aggregations

Semi-described and semi-supervised learning with Gaussian processes

Sequential Design for Ranking Response Surfaces

Soliton Mobility in Disordered Lattice

Spatio-temporal bivariate statistical models for atmospheric trace-gas inversion

Strong Pseudoprimes to Twelve Prime Bases

Testing Properties of Functions on Finite Groups

The classification of tensor categories of two-colored noncrossing partitions

The cut-tree of large trees with small heights

The extremal function for disconnected minors

Three-coloring triangle-free graphs on surfaces VI. 3-colorability of quadrangulations

Tiling sets and spectral sets over finite fields

Train faster, generalize better: Stability of stochastic gradient descent

Training of CC4 Neural Network with Spread Unary Coding

Two-point correlation function and Feynman-Kac formula for the stochastic heat equation