Understanding each other is the key to success in collaboration. For humans, attributing mental states to others, the theory of mind, provides the crucial advantage. We argue for formulating human–AI interaction as a multi-agent problem, endowing AI with a computational theory of mind to understand and anticipate the user. To differentiate the approach from previous work, we introduce a categorisation of user modelling approaches based on the level of agency learnt in the interaction. We describe our recent work in using nested multi-agent modelling to formulate user models for multi-armed bandit based interactive AI systems, including a proof-of-concept user study.
Formal verification of neural networks is essential for their deployment in safety-critical areas. Many available formal verification methods have been shown to be instances of a unified Branch and Bound (BaB) formulation. We propose a novel framework for designing an effective branching strategy for BaB. Specifically, we learn a graph neural network (GNN) to imitate the strong branching heuristic behaviour. Our framework differs from previous methods for learning to branch in two main aspects. Firstly, our framework directly treats the neural network we want to verify as a graph input for the GNN. Secondly, we develop an intuitive forward and backward embedding update schedule. Empirically, our framework achieves roughly $50\%$ reduction in both the number of branches and the time required for verification on various convolutional networks when compared to the best available hand-designed branching strategy. In addition, we show that our GNN model enjoys both horizontal and vertical transferability. Horizontally, the model trained on easy properties performs well on properties of increased difficulty levels. Vertically, the model trained on small neural networks achieves similar performance on large neural networks.
We present Smooth Grad-CAM++, a technique which combines two recent techniques: SMOOTHGRAD and Grad-CAM++. Smooth Grad-CAM++ has the capability of either visualizing a layer, subset of feature maps, or subset of neurons within a feature map at each instance. We experimented with few images, and we discovered that Smooth Grad-CAM++ produced more visually sharp maps with larger number of salient pixels highlighted in the given input images when compared with other methods. Smooth Grad-CAM++ will give insight into what our deep CNN models (including models trained on medical scan or imagery) learn. Hence informing decisions on creating a representative training set.
In this paper, we obtain fundamental $\mathcal{L}_{p}$ bounds in sequential prediction and recursive algorithms via an entropic analysis. Both classes of problems are examined by investigating the underlying entropic relationships of the data and/or noises involved, and the derived lower bounds may all be quantified in a conditional entropy characterization. We also study the conditions to achieve the generic bounds from an innovations’ viewpoint.
Deep neural networks (DNN) are versatile parametric models utilised successfully in a diverse number of tasks and domains. However, they have limitations—particularly from their lack of robustness and over-sensitivity to out of distribution samples. Bayesian Neural Networks, due to their formulation under the Bayesian framework, provide a principled approach to building neural networks that address these limitations. This paper describes a study that empirically evaluates and compares Bayesian Neural Networks to their equivalent point estimate Deep Neural Networks to quantify the predictive uncertainty induced by their parameters, as well as their performance in view of this uncertainty. In this study, we evaluated and compared three point estimate deep neural networks against comparable Bayesian neural network alternatives using two well-known benchmark image classification datasets (CIFAR-10 and SVHN).
In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.
Machine learning has seen tremendous advances in the past few years which has lead to deep learning models being deployed in varied applications of day-to-day life. Attacks on such models using perturbations, particularly in real-life scenarios, pose a serious challenge to their applicability, pushing research into the direction which aims to enhance the robustness of these models. After the introduction of these perturbations by Szegedy et al., significant amount of research has focused on the reliability of such models, primarily in two aspects – white-box, where the adversary has access to the targeted model and related parameters; and the black-box, which resembles a real-life scenario with the adversary having almost no knowledge of the model to be attacked. We propose to attract attention on the latter scenario and thus, present a comprehensive comparative study among the different adversarial black-box attack approaches proposed till date. The second half of this literature survey focuses on the defense techniques. This is the first study, to the best of our knowledge, that specifically focuses on the black-box setting to motivate future work on the same.
Scanning and filtering over multi-dimensional tables are key operations in modern analytical database engines. To optimize the performance of these operations, databases often create clustered indexes over a single dimension or multi-dimensional indexes such as R-trees, or use complex sort orders (e.g., Z-ordering). However, these schemes are often hard to tune and their performance is inconsistent across different datasets and queries. In this paper, we introduce Flood, a multi-dimensional in-memory index that automatically adapts itself to a particular dataset and workload by jointly optimizing the index structure and data storage. Flood achieves up to three orders of magnitude faster performance for range scans with predicates than state-of-the-art multi-dimensional indexes or sort orders on real-world datasets and workloads. Our work serves as a building block towards an end-to-end learned database system.
Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.
We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. Long composition rollouts with non-reversible state changes are among the phenomena we include to shrink the gap between research benchmarks and real-world applications. ALFRED consists of expert demonstrations in interactive visual environments for 25k natural language directives. These directives contain both high-level goals like ‘Rinse off a mug and place it in the coffee maker.’ and low-level language instructions like ‘Walk to the coffee maker on the right.’ ALFRED tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets. We show that a baseline model designed for recent embodied vision-and-language tasks performs poorly on ALFRED, suggesting that there is significant room for developing innovative grounded visual language understanding models with this benchmark.
Federated learning opens a number of research opportunities due to its high communication efficiency in distributed training problems within a star network. In this paper, we focus on improving the communication efficiency for fully decentralized federated learning over a graph, where the algorithm performs local updates for several iterations and then enables communications among the nodes. In such a way, the communication rounds of exchanging the common interest of parameters can be saved significantly without loss of optimality of the solutions. Multiple numerical simulations based on large, real-world electronic health record databases showcase the superiority of the decentralized federated learning compared with classic methods.
Training of deep neural networks heavily depends on the data distribution. In particular, the networks easily suffer from class imbalance. The trained networks would recognize the frequent classes better than the infrequent classes. To resolve this problem, existing approaches typically propose novel loss functions to obtain better feature embedding. In this paper, we argue that drawing a better decision boundary is as important as learning better features. Inspired by observations, we investigate how the class imbalance affects the decision boundary and deteriorates the performance. We also investigate the feature distributional discrepancy between training and test time. As a result, we propose a novel, yet simple method for class imbalanced learning. Despite its simplicity, our method shows outstanding performance. In particular, the experimental results show that we can significantly improve the network by scaling the weight vectors, even without additional training process.
The ubiquity of GPS enabled devices result into the generation of an enormous amount of user movement data consisting of a sequence of locations together with activities performed at those locations. Such data, commonly known as {\it activity-trajectory data}, can be analysed to find common user movements and preferred activities, which will have tremendous business value. However, various countries have now introduced data privacy regulations that make it mandatory to anonymize any data before releasing it. This makes it difficult to look for patterns as the existing mining techniques may not be directly applicable over anonymized data. User locations in an activity-trajectory dataset are anonymized to regions of different shapes and sizes making them uncomparable; therefore, it is unclear how to define suitable patterns over those regions. In this paper, we propose a top-k pattern mining technique called TopKMintra that employs a pre-processing step to transform anonymized activity-trajectory into an intermediate representation that address the above problem. Unlike usual sequential data, activity-trajectory data is 2-dimensional that can lead to generation of duplicate patterns. To handle this, TopKMintra restricts arbitrary extensions in the two dimensions by imposing an order over the search space; this also helps in addressing the common problem of updating the threshold in top-k pattern mining algorithms during various stages. We perform extensive experiments on real datasets to demonstrate the efficiency and the effectiveness of TopKMintra. Our results show that even after anonymization, certain patterns may remain in a dataset and those could be mined efficiently.
We propose a scalable Bayesian preference learning method for jointly predicting the preferences of individuals as well as the consensus of a crowd from pairwise labels. Peoples’ opinions often differ greatly, making it difficult to predict their preferences from small amounts of personal data. Individual biases also make it harder to infer the consensus of a crowd when there are few labels per item. We address these challenges by combining matrix factorisation with Gaussian processes, using a Bayesian approach to account for uncertainty arising from noisy and sparse data. Our method exploits input features, such as text embeddings and user metadata, to predict preferences for new items and users that are not in the training set. As previous solutions based on Gaussian processes do not scale to large numbers of users, items or pairwise labels, we propose a stochastic variational inference approach that limits computational and memory costs. Our experiments on a recommendation task show that our method is competitive with previous approaches despite our scalable inference approximation. We demonstrate the method’s scalability on a natural language processing task with thousands of users and items, and show improvements over the state of the art on this task. We make our software publicly available for future work.
Neural Architecture Search (NAS) that aims to automate the procedure of architecture design has achieved promising results in many computer vision fields. In this paper, we propose an AdversarialNAS method specially tailored for Generative Adversarial Networks (GANs) to search for a superior generative model on the task of unconditional image generation. The proposed method leverages an adversarial searching mechanism to search for the architectures of generator and discriminator simultaneously in a differentiable manner. Therefore, the searching algorithm considers the relevance and balance between the two networks leading to search for a superior generative model. Besides, AdversarialNAS does not need any extra evaluation metric to evaluate the performance of the architecture in each searching iteration, which is very efficient and can take only 1 GPU day to search for an optimal network architecture in a large search space ($10^{38}$). Experiments demonstrate the effectiveness and superiority of our method. The discovered generative model sets a new state-of-the-art FID score of $10.87$ and highly competitive Inception Score of $8.74$ on CIFAR-10. Its transferability is also proven by setting new state-of-the-art FID score of $26.98$ and Inception score of $9.63$ on STL-10. Our code will be released to facilitate the related academic and industrial study.