Deep Collective Matrix Factorization (dCMF)
Learning by integrating multiple heterogeneous data sources is a common requirement in many tasks. Collective Matrix Factorization (CMF) is a technique to learn shared latent representations from arbitrary collections of matrices. It can be used to simultaneously complete one or more matrices, for predicting the unknown entries. Classical CMF methods assume linearity in the interaction of latent factors which can be restrictive and fails to capture complex non-linear interactions. In this paper, we develop the first deep-learning based method, called dCMF, for unsupervised learning of multiple shared representations, that can model such non-linear interactions, from an arbitrary collection of matrices. We address optimization challenges that arise due to dependencies between shared representations through Multi-Task Bayesian Optimization and design an acquisition function adapted for collective learning of hyperparameters. Our experiments show that dCMF significantly outperforms previous CMF algorithms in integrating heterogeneous data for predictive modeling. Further, on two tasks – recommendation and prediction of gene-disease association – dCMF outperforms state-of-the-art matrix completion algorithms that can utilize auxiliary sources of information. …

ABCD-Strategy
Determining the causal structure of a set of variables is critical for both scientific inquiry and decision-making. However, this is often challenging in practice due to limited interventional data. Given that randomized experiments are usually expensive to perform, we propose a general framework and theory based on optimal Bayesian experimental design to select experiments for targeted causal discovery. That is, we assume the experimenter is interested in learning some function of the unknown graph (e.g., all descendants of a target node) subject to design constraints such as limits on the number of samples and rounds of experimentation. While it is in general computationally intractable to select an optimal experimental design strategy, we provide a tractable implementation with provable guarantees on both approximation and optimization quality based on submodularity. We evaluate the efficacy of our proposed method on both synthetic and real datasets, thereby demonstrating that our method realizes considerable performance gains over baseline strategies such as random sampling. …

TritanDB
The efficient management of data is an important prerequisite for realising the potential of the Internet of Things (IoT). Two issues given the large volume of structured time-series IoT data are, addressing the difficulties of data integration between heterogeneous Things and improving ingestion and query performance across databases on both resource-constrained Things and in the cloud. In this paper, we examine the structure of public IoT data and discover that the majority exhibit unique flat, wide and numerical characteristics with a mix of evenly and unevenly-spaced time-series. We investigate the advances in time-series databases for telemetry data and combine these findings with microbenchmarks to determine the best compression techniques and storage data structures to inform the design of a novel solution optimised for IoT data. A query translation method with low overhead even on resource-constrained Things allows us to utilise rich data models like the Resource Description Framework (RDF) for interoperability and data integration on top of the optimised storage. Our solution, TritanDB, shows an order of magnitude performance improvement across both Things and cloud hardware on many state-of-the-art databases within IoT scenarios. Finally, we describe how TritanDB supports various analyses of IoT time-series data like forecasting. …

Since the discovery of adversarial examples in machine learning, researchers have designed several techniques to train neural networks that are robust against different types of attacks (most notably $\ell_\infty$ and $\ell_2$ based attacks). However, it has been observed that the defense mechanisms designed to protect against one type of attack often offer poor performance against the other. In this paper, we introduce Randomized Adversarial Training (RAT), a technique that is efficient both against $\ell_2$ and $\ell_\infty$ attacks. To obtain this result, we build upon adversarial training, a technique that is efficient against $\ell_\infty$ attacks, and demonstrate that adding random noise at training and inference time further improves performance against \ltwo attacks. We then show that RAT is as efficient as adversarial training against $\ell_\infty$ attacks while being robust against strong $\ell_2$ attacks. Our final comparative experiments demonstrate that RAT outperforms all state-of-the-art approaches against $\ell_2$ and $\ell_\infty$ attacks. …