We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost $O(n)$ for $n$ training points, and predictions cost $O(1)$ per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.
To infer a multilayer representation of high-dimensional count vectors, we propose the Poisson gamma belief network (PGBN) that factorizes each of its layers into the product of a connection weight matrix and the nonnegative real hidden units of the next layer. The PGBN’s hidden layers are jointly trained with an upward-downward Gibbs sampler, each iteration of which upward samples Dirichlet distributed connection weight vectors starting from the first layer (bottom data layer), and then downward samples gamma distributed hidden units starting from the top hidden layer. The gamma-negative binomial process combined with a layer-wise training strategy allows the PGBN to infer the width of each layer given a fixed budget on the width of the first layer. The PGBN with a single hidden layer reduces to Poisson factor analysis. Example results on text analysis illustrate interesting relationships between the width of the first layer and the inferred network structure, and demonstrate that the PGBN, whose hidden units are imposed with correlated gamma priors, can add more layers to increase its performance gains over Poisson factor analysis, given the same limit on the width of the first layer.
We present an approach to inform decisions about nonresponse followup sampling. The basic idea is (i) to create completed samples by imputing nonrespondents’ data under various assumptions about the nonresponse mechanisms, (ii) take hypothetical samples of varying sizes from the completed samples, and (iii) compute and compare measures of accuracy and cost for different proposed sample sizes. As part of the methodology, we present a new approach for generating imputations for multivariate continuous data with nonignorable unit nonresponse. We fit mixtures of multivariate normal distributions to the respondents’ data, and adjust the probabilities of the mixture components to generate nonrespondents’ distributions with desired features. We illustrate the approaches using data from the 2007 U. S. Census of Manufactures.
We propose a framework to perform streaming covariance selection. Our approach employs regularization constraints where a time-varying sparsity parameter is iteratively estimated via stochastic gradient descent. This allows for the regularization parameter to be efficiently learnt in an online manner. The proposed framework is developed for linear regression models and extended to graphical models via neighbourhood selection. We demonstrate the capabilities of such an approach using both synthetic data as well as neuroimaging data.
We present a new deterministic relational model derived from convolutional neural networks. Search-Convolutional Neural Networks (SCNNs) extend the notion of convolution to graph search to construct a rich latent representation that extracts local behavior from general graph-structured data. Unlike other neural network models that take graph-structured data as input, SCNNs have a parameterization that is independent of input size, a property that enables transfer learning between datasets. SCNNs can be applied to a wide variety of prediction tasks, including node classification, community detection, and link prediction. Our results indicate that SCNNs can offer considerable lift over off-the-shelf classifiers and simple multilayer perceptrons, and comparable performance to state-of-the-art probabilistic graphical models.
This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.
This article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here we detail the approach, efficacy of the model and initial results. Hadoop presents a complex execution environment, where costs and performance depends on a large number of software (SW) configurations and on multiple hardware (HW) deployment choices. These results are accompanied by a test bed and tools to deploy and evaluate the cost-effectiveness of the different hardware configurations, parameter tunings, and Cloud services. Despite early success within ALOJA from expert-guided benchmarking, it became clear that a genuinely comprehensive study requires automation of modeling procedures to allow a systematic analysis of large and resource-constrained search spaces. ALOJA-ML provides such an automated system allowing knowledge discovery by modeling Hadoop executions from observed benchmarks across a broad set of configuration parameters. The resulting performance models can be used to forecast execution behavior of various workloads; they allow ‘a-priori’ prediction of the execution times for new configurations and HW choices and they offer a route to model-based anomaly detection. In addition, these models can guide the benchmarking exploration efficiently, by automatically prioritizing candidate future benchmark tests. Insights from ALOJA-ML’s models can be used to reduce the operational time on clusters, speed-up the data acquisition and knowledge discovery process, and importantly, reduce running costs. In addition to learning from the methodology presented in this work, the community can benefit in general from ALOJA data-sets, framework, and derived insights to improve the design and deployment of Big Data applications.
In a recent paper, Levy and Goldberg pointed out an interesting connection between prediction-based word embedding models and count models based on pointwise mutual information. Under certain conditions, they showed that both models end up optimizing equivalent objective functions. This paper explores this connection in more detail and lays out the factors leading to differences between these models. We find that the most relevant differences from an optimization perspective are (i) predict models work in a low dimensional space where embedding vectors can interact heavily; (ii) since predict models have fewer parameters, they are less prone to overfitting. Motivated by the insight of our analysis, we show how count models can be regularized in a principled manner and provide closed-form solutions for L1 and L2 regularization. Finally, we propose a new embedding model with a convex objective and the additional benefit of being intelligible.
In multi-agent domains (MADs), an agent’s action may not just change the world and the agent’s knowledge and beliefs about the world, but also may change other agents’ knowledge and beliefs about the world and their knowledge and beliefs about other agents’ knowledge and beliefs about the world. The goals of an agent in a multi-agent world may involve manipulating the knowledge and beliefs of other agents’ and again, not just their knowledge/belief about the world, but also their knowledge about other agents’ knowledge about the world. Our goal is to present an action language (mA+) that has the necessary features to address the above aspects in representing and RAC in MADs. mA+ allows the representation of and reasoning about different types of actions that an agent can perform in a domain where many other agents might be present—such as world-altering actions, sensing actions, and announcement/communication actions. It also allows the specification of agents’ dynamic awareness of action occurrences which has future implications on what agents’ know about the world and other agents’ knowledge about the world. mA+ considers three different types of awareness: full,- partial- awareness, and complete oblivion of an action occurrence and its effects. This keeps the language simple, yet powerful enough to address a large variety of knowledge manipulation scenarios in MADs. The semantics of mA+ relies on the notion of state, which is described by a pointed Kripke model and is used to encode the agent’s knowledge and the real state of the world. It is defined by a transition function that maps pairs of actions and states into sets of states. We illustrate properties of the action theories, including properties that guarantee finiteness of the set of initial states and their practical implementability. Finally, we relate mA+ to other related formalisms that contribute to RAC in MADs.