Deep Kernel Learning

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost O(n) for n training points, and predictions cost O(1) per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.


The Poisson Gamma Belief Network

To infer a multilayer representation of high-dimensional count vectors, we propose the Poisson gamma belief network (PGBN) that factorizes each of its layers into the product of a connection weight matrix and the nonnegative real hidden units of the next layer. The PGBN’s hidden layers are jointly trained with an upward-downward Gibbs sampler, each iteration of which upward samples Dirichlet distributed connection weight vectors starting from the first layer (bottom data layer), and then downward samples gamma distributed hidden units starting from the top hidden layer. The gamma-negative binomial process combined with a layer-wise training strategy allows the PGBN to infer the width of each layer given a fixed budget on the width of the first layer. The PGBN with a single hidden layer reduces to Poisson factor analysis. Example results on text analysis illustrate interesting relationships between the width of the first layer and the inferred network structure, and demonstrate that the PGBN, whose hidden units are imposed with correlated gamma priors, can add more layers to increase its performance gains over Poisson factor analysis, given the same limit on the width of the first layer.


Stop or Continue Data Collection: A Nonignorable Missing Data Approach for Continuous Variables

We present an approach to inform decisions about nonresponse followup sampling. The basic idea is (i) to create completed samples by imputing nonrespondents’ data under various assumptions about the nonresponse mechanisms, (ii) take hypothetical samples of varying sizes from the completed samples, and (iii) compute and compare measures of accuracy and cost for different proposed sample sizes. As part of the methodology, we present a new approach for generating imputations for multivariate continuous data with nonignorable unit nonresponse. We fit mixtures of multivariate normal distributions to the respondents’ data, and adjust the probabilities of the mixture components to generate nonrespondents’ distributions with desired features. We illustrate the approaches using data from the 2007 U. S. Census of Manufactures.


Streaming regularization parameter selection via stochastic gradient descent

We propose a framework to perform streaming covariance selection. Our approach employs regularization constraints where a time-varying sparsity parameter is iteratively estimated via stochastic gradient descent. This allows for the regularization parameter to be efficiently learnt in an online manner. The proposed framework is developed for linear regression models and extended to graphical models via neighbourhood selection. We demonstrate the capabilities of such an approach using both synthetic data as well as neuroimaging data.


Search-Convolutional Neural Networks

We present a new deterministic relational model derived from convolutional neural networks. Search-Convolutional Neural Networks (SCNNs) extend the notion of convolution to graph search to construct a rich latent representation that extracts local behavior from general graph-structured data. Unlike other neural network models that take graph-structured data as input, SCNNs have a parameterization that is independent of input size, a property that enables transfer learning between datasets. SCNNs can be applied to a wide variety of prediction tasks, including node classification, community detection, and link prediction. Our results indicate that SCNNs can offer considerable lift over off-the-shelf classifiers and simple multilayer perceptrons, and comparable performance to state-of-the-art probabilistic graphical models.


ALOJA: A Framework for Benchmarking and Predictive Analytics in Big Data Deployments

This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.


ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments

This article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here we detail the approach, efficacy of the model and initial results. Hadoop presents a complex execution environment, where costs and performance depends on a large number of software (SW) configurations and on multiple hardware (HW) deployment choices. These results are accompanied by a test bed and tools to deploy and evaluate the cost-effectiveness of the different hardware configurations, parameter tunings, and Cloud services. Despite early success within ALOJA from expert-guided benchmarking, it became clear that a genuinely comprehensive study requires automation of modeling procedures to allow a systematic analysis of large and resource-constrained search spaces. ALOJA-ML provides such an automated system allowing knowledge discovery by modeling Hadoop executions from observed benchmarks across a broad set of configuration parameters. The resulting performance models can be used to forecast execution behavior of various workloads; they allow ‘a-priori’ prediction of the execution times for new configurations and HW choices and they offer a route to model-based anomaly detection. In addition, these models can guide the benchmarking exploration efficiently, by automatically prioritizing candidate future benchmark tests. Insights from ALOJA-ML’s models can be used to reduce the operational time on clusters, speed-up the data acquisition and knowledge discovery process, and importantly, reduce running costs. In addition to learning from the methodology presented in this work, the community can benefit in general from ALOJA data-sets, framework, and derived insights to improve the design and deployment of Big Data applications.


Towards a Better Understanding of Predict and Count Models

In a recent paper, Levy and Goldberg pointed out an interesting connection between prediction-based word embedding models and count models based on pointwise mutual information. Under certain conditions, they showed that both models end up optimizing equivalent objective functions. This paper explores this connection in more detail and lays out the factors leading to differences between these models. We find that the most relevant differences from an optimization perspective are (i) predict models work in a low dimensional space where embedding vectors can interact heavily; (ii) since predict models have fewer parameters, they are less prone to overfitting. Motivated by the insight of our analysis, we show how count models can be regularized in a principled manner and provide closed-form solutions for L1 and L2 regularization. Finally, we propose a new embedding model with a convex objective and the additional benefit of being intelligible.


An Action Language for Multi-Agent Domains: Foundations

In multi-agent domains (MADs), an agent’s action may not just change the world and the agent’s knowledge and beliefs about the world, but also may change other agents’ knowledge and beliefs about the world and their knowledge and beliefs about other agents’ knowledge and beliefs about the world. The goals of an agent in a multi-agent world may involve manipulating the knowledge and beliefs of other agents’ and again, not just their knowledge/belief about the world, but also their knowledge about other agents’ knowledge about the world. Our goal is to present an action language (mA+) that has the necessary features to address the above aspects in representing and RAC in MADs. mA+ allows the representation of and reasoning about different types of actions that an agent can perform in a domain where many other agents might be present—such as world-altering actions, sensing actions, and announcement/communication actions. It also allows the specification of agents’ dynamic awareness of action occurrences which has future implications on what agents’ know about the world and other agents’ knowledge about the world. mA+ considers three different types of awareness: full,- partial- awareness, and complete oblivion of an action occurrence and its effects. This keeps the language simple, yet powerful enough to address a large variety of knowledge manipulation scenarios in MADs. The semantics of mA+ relies on the notion of state, which is described by a pointed Kripke model and is used to encode the agent’s knowledge and the real state of the world. It is defined by a transition function that maps pairs of actions and states into sets of states. We illustrate properties of the action theories, including properties that guarantee finiteness of the set of initial states and their practical implementability. Finally, we relate mA+ to other related formalisms that contribute to RAC in MADs.


A sharp lower bound for choosing the maximum of an independent sequence

Learning Optimized Or’s of And’s

Entanglement scaling of excited states in large one-dimensional many-body localized systems

An Extended Frank-Wolfe Method with ‘In-Face’ Directions, and its Application to Low-Rank Matrix Completion

Evaluating Protein-protein Interaction Predictors with a Novel 3-Dimensional Metric

Accelerating Adaptive IDW Interpolation Algorithm on a Single GPU

Optimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice

Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors

Meta-food-chains as a many-layer epidemic process on networks

Evaluation of the Intel Xeon Phi and NVIDIA K80 as accelerators for two-dimensional panel codes

Submodular Hamming Metrics

Traversing Grammar-Compressed Trees with Constant Delay

A novel model and estimation method for the individual random component of earthquake ground-motion relations

A quantitative performance analysis for Stokes solvers at the extreme scale

Unraveling the nature of carrier mediated ferromagnetism in diluted magnetic semiconductors

Modified vertex Folkman numbers

Barrier Frank-Wolfe for Marginal Inference

Thermodynamics for spatially inhomogeneous magnetization and Young-Gibbs measures

Introducing SKYSET – a Quintuple Approach for Improving Instructions

Optimal kernel selection for density estimation

Plane lattice walks avoiding a quadrant

Hierarchical Coupled Geometry Analysis for Neuronal Structure and Activity Pattern Discovery

Quantifying inhomogeneity in fractal sets

Online Balanced Repartitioning

Hyperbolic Pascal pyramid

ExpertSeer: a Keyphrase Based Expert Recommender for Digital Libraries

The core in random hypergraphs and local weak convergence

Performance Evaluation of Microservices Architectures using Containers

Hamiltonian Path in 2-Trees

Finding structure in data using multivariate tree boosting

Population size predicts lexical diversity, but so does the mean sea level – one problem in the analysis of temporal data

Enhancing speed of pinning synchronizability: low-degree nodes with high feedback gains

Probabilistic wind speed forecasting on a grid based on ensemble model output statistics

Next Generation Multicuts for Semi-Planar Graphs

Neutralized Empirical Risk Minimization with Generalization Neutrality Bound

Persistence of centrality in random growing trees

Multi-lingual Geoparsing based on Machine Translation

Improving Covariate Balance in 2^K Factorial Designs via Rerandomization

Enhanced Low-Rank Matrix Approximation

Quantum Walks on Generalized Quadrangles

False Discoveries Occur Early on the Lasso Path

Regularity theory and extension problem for fractional nonlocal parabolic equations and the master equation

Stop Wasting My Gradients: Practical SVRG

On the Matsumoto-Yor property in free probability

Graphs that are simultaneously efficient open domination and efficient closed domination graphs