Optimal scaling of the independence sampler: Theory and Practice

The independence sampler is one of the most commonly used MCMC algorithms usually as a component of a Metropolis-within-Gibbs algorithm. The common focus for the independence sampler is on the choice of proposal distribution to obtain an as high as possible acceptance rate. In this paper we have a somewhat different focus concentrating on the use of the independence sampler for updating augmented data in a Bayesian framework where a natural proposal distribution for the independence sampler exists. Thus we concentrate on the proportion of the augmented data to update to optimise the independence sampler. Generic guidelines for optimising the independence sampler are obtained for independent and identically distributed product densities mirroring findings for the random walk Metropolis algorithm. The generic guidelines are shown to be informative beyond the narrow confines of idealised product densities in two epidemic examples.


Novel feature extraction, selection and fusion for effective malware family classification

Modern malware is designed with mutation characteristics, namely polymorphism and metamorphism, which causes an enormous growth in the number of variants of malware samples. Categorization of malware samples on the basis of their behaviors is essential for the computer security community in order to group samples belonging to same family. Microsoft released a malware classification challenge in 2015 with a huge dataset of near 0.5 terabytes of data, containing more than 20K malware samples. The analysis of this dataset inspired the development of a novel paradigm that is effective in categorizing malware variants into their actual family groups. This paradigm is presented and discussed in the present paper, where emphasis has been given to the phases related to the extraction, and selection of a set of novel features for the effective representation of malware samples. Features can be grouped according to different characteristics of malware behavior, and their fusion is performed according to a per-class weighting paradigm. The proposed method achieved a very high accuracy (\approx 0.998) on the Microsoft Malware Challenge dataset.


Deep Feature Learning for EEG Recordings

We introduce and compare several strategies for learning discriminative features from electroencephalography (EEG) recordings using deep learning techniques. EEG data are generally only available in small quantities, they are high-dimensional with a poor signal-to-noise ratio, and there is considerable variability between individual subjects and recording sessions. Our proposed techniques specifically address these challenges for feature learning. Similarity-constraint encoders learn features that allow to distinguish between classes by demanding that two trials from the same class are more similar to each other than to trials from other classes. This tuple-based training approach is especially suitable for small datasets. Hydra-nets allow for separate processing pathways adapting to subsets of a dataset and thus combine the advantages of individual feature learning (better adaptation of early, low-level processing) with group model training (better generalization of higher-level processing in deeper layers). This way, models can, for instance, adapt to each subject individually to compensate for differences in spatial patterns due to anatomical differences or variance in electrode positions. The different techniques are evaluated using the publicly available OpenMIIR dataset of EEG recordings taken while participants listened to and imagined music.


Active Contextual Entropy Search

Contextual policy search allows adapting robotic movement primitives to different situations. For instance, a locomotion primitive might be adapted to different terrain inclinations or desired walking speeds. Such an adaptation is often achievable by modifying a small number of hyperparameters. However, learning, when performed on real robotic systems, is typically restricted to a small number of trials. Bayesian optimization has recently been proposed as a sample-efficient means for contextual policy search that is well suited under these conditions. In this work, we extend entropy search, a variant of Bayesian optimization, such that it can be used for active contextual policy search where the agent selects those tasks during training in which it expects to learn the most. Empirical results in simulation suggest that this allows learning successful behavior with less trials.


On the Quality of the Initial Basin in Overspecified Neural Networks

Over the past few years, artificial neural networks have seen a dramatic resurgence in popularity as a tool for solving hard learning problems in AI applications. While it is widely known that neural networks are computationally hard to train in the worst case, in practice, neural networks are trained efficiently using SGD methods and a variety of techniques which accelerate the learning process. One mechanism which has been suggested to explain this is overspecification, which is the training of a network larger than what would be needed with unbounded computational power. Empirically, despite worst-case NP-hardness results, large networks tend to achieve a smaller error over the training set. In this work, we aspire to understand this phenomenon. In particular, we wish to better understand the behavior of the error over the sample as a function of the weights of the network, where we focus mostly on neural nets comprised of 2 layers, although we will also consider single neuron nets and nets of arbitrary depth, investigating properties such as the number of local minima the function has, and the probability of initializing from a basin with a given minimal value, with the goal of finding reasonable conditions under which efficient learning of the network is possible.


Adaptive Affinity Matrix for Unsupervised Metric Learning

Spectral clustering is one of the most popular clustering approaches with the capability to handle some challenging clustering problems. Most spectral clustering methods provide a nonlinear map from the data manifold to a subspace. Only a little work focuses on the explicit linear map which can be viewed as the unsupervised distance metric learning. In practice, the selection of the affinity matrix exhibits a tremendous impact on the unsupervised learning. While much success of affinity learning has been achieved in recent years, some issues such as noise reduction remain to be addressed. In this paper, we propose a novel method, dubbed Adaptive Affinity Matrix (AdaAM), to learn an adaptive affinity matrix and derive a distance metric from the affinity. We assume the affinity matrix to be positive semidefinite with ability to quantify the pairwise dissimilarity. Our method is based on posing the optimization of objective function as a spectral decomposition problem. We yield the affinity from both the original data distribution and the widely-used heat kernel. The provided matrix can be regarded as the optimal representation of pairwise relationship on the manifold. Extensive experiments on a number of real-world data sets show the effectiveness and efficiency of AdaAM.


Deep Mean Maps

The use of distributions and high-level features from deep architecture has become commonplace in modern computer vision. Both of these methodologies have separately achieved a great deal of success in many computer vision tasks. However, there has been little work attempting to leverage the power of these to methodologies jointly. To this end, this paper presents the Deep Mean Maps (DMMs) framework, a novel family of methods to non-parametrically represent distributions of features in convolutional neural network models. DMMs are able to both classify images using the distribution of top-level features, and to tune the top-level features for performing this task. We show how to implement DMMs using a special mean map layer composed of typical CNN operations, making both forward and backward propagation simple. We illustrate the efficacy of DMMs at analyzing distributional patterns in image data in a synthetic data experiment. We also show that we extending existing deep architectures with DMMs improves the performance of existing CNNs on several challenging real-world datasets.


An Efficient Assignment of Drainage Direction Over Flat Surfaces in Raster Digital Elevation Models

On the Asymptotic Bias of the Diffusion-Based Distributed Pareto Optimization

Universality in survivor distributions: Characterising the winners of competitive dynamics

Dynamic Sum Product Networks for Tractable Inference on Sequence Data

Cyclic $m$-cycle systems of complete graphs minus a 1-factor

Scalable Gaussian Processes for Characterizing Multidimensional Change Surfaces

Lass-0: sparse non-convex regression by local search

Symbol Grounding Association in Multimodal Sequences with Missing Elements

Similarity-based Text Recognition by Deeply Supervised Siamese Network

On a theorem of Halin

Combining Monte-Carlo and Hyper-heuristic methods for the Multi-mode Resource-constrained Multi-project Scheduling Problem

Handling Class Imbalance in Link Prediction using Learning to Rank Techniques

A refinement of theorems on vertex-disjoint chorded cycles

Introduzione all’Intelligenza Artificiale

Evaluating Statistical Diversity in the NBA Using Player Tracking Data

Large Scale Artificial Neural Network Training Using Multi-GPUs

The sensorimotor loop as a dynamical system: How regular motion primitives may emerge from self-organized limit cycles

Financial Models with Defaultable Numéraires

Experimental Evaluation of Distributed Node Coloring Algorithms for Wireless Networks

Counting quadrant walks via Tutte’s invariant method (extended abstract)

Searching for Disjoint Covering Systems with Precisely One Repeated Modulus

On the Instability of Matching Queues

First-passage percolation and local modifications of distances in random triangulations

The limiting shape of a full mailbox

Moments of the position of the maximum for GUE characteristic polynomials and for log-correlated Gaussian processes

Classical Adjoints for Ergodic Stochastic Control

A random cell splitting scheme on the sphere

Optimization techniques for multivariate least trimmed absolute deviation estimation

Equilibrium pricing under relative performance concerns

A Survey on Reproducibility in Parallel Computing

Acyclic colourings of graphs with bounded degree

Growing timescales and lengthscales characterizing vibrations of amorphous solids

On Choosing Committees Based on Approval Votes in the Presence of Outliers

Natural Language Object Retrieval

Heterogeneous Treatment Effects with Mismeasured Endogenous Treatment

The Voter Model and Jump Diffusion

$k$-means: Fighting against Degeneracy in Sequential Monte Carlo with an Application to Tracking

Neuroprosthetic decoder training as imitation learning

Partially magic labelings and the Antimagic Graph Conjecture

A Continuous-time Mutually-Exciting Point Process Framework for Prioritizing Events in Social Media

A General Decision Theory for Huber’s $ε$-Contamination Model

Deep Reinforcement Learning in Parameterized Action Space

Seeing the Unseen Network: Inferring Hidden Social Ties from Respondent-Driven Sampling

A Stochastic Reliability Model of a Server under a Random Workload

An Improper Complex Autoregressive Process of Order One

Symmetric matrices, Catalan paths, and correlations

A Kuramoto coupling of quasi-cycle oscillators

Action Recognition using Visual Attention

Embedded connectivity of recursive networks

Exact sampling of diffusions with a discontinuity in the drift

Going Deeper in Facial Expression Recognition using Deep Neural Networks

LSTM-based Deep Learning Models for non-factoid answer selection

Computing derangement probabilities of the symmetric group acting on k-sets