Jet grooming through reinforcement learning

We introduce a novel implementation of a reinforcement learning (RL) algorithm which is designed to find an optimal jet grooming strategy, a critical tool for collider experiments. The RL agent is trained with a reward function constructed to optimize the resulting jet properties, using both signal and background samples in a simultaneous multi-level training. We show that the grooming algorithm derived from the deep RL agent can match state-of-the-art techniques used at the Large Hadron Collider, resulting in improved mass resolution for boosted objects. Given a suitable reward function, the agent learns how to train a policy which optimally removes soft wide-angle radiation, allowing for a modular grooming technique that can be applied in a wide range of contexts. These results are accessible through the corresponding GroomRL framework.

Capsule Networks with Max-Min Normalization

Capsule Networks (CapsNet) use the Softmax function to convert the logits of the routing coefficients into a set of normalized values that signify the assignment probabilities between capsules in adjacent layers. We show that the use of Softmax prevents capsule layers from forming optimal couplings between lower and higher-level capsules. Softmax constrains the dynamic range of the routing coefficients and leads to probabilities that remain mostly uniform after several routing iterations. Instead, we propose the use of Max-Min normalization. Max-Min performs a scale-invariant normalization of the logits that allows each lower-level capsule to take on an independent value, constrained only by the bounds of normalization. Max-Min provides consistent improvement in test accuracy across five datasets and allows more routing iterations without a decrease in network performance. A single CapsNet trained using Max-Min achieves an improved test error of 0.20% on the MNIST dataset. With a simple 3-model majority vote, we achieve a test error of 0.17% on MNIST.

Scalable Data Augmentation for Deep Learning

Scalable Data Augmentation (SDA) provides a framework for training deep learning models using auxiliary hidden layers. Scalable MCMC is available for network training and inference. SDA provides a number of computational advantages over traditional algorithms, such as avoiding backtracking, local modes and can perform optimization with stochastic gradient descent (SGD) in TensorFlow. Standard deep neural networks with logit, ReLU and SVM activation functions are straightforward to implement. To illustrate our architectures and methodology, we use P\'{o}lya-Gamma logit data augmentation for a number of standard datasets. Finally, we conclude with directions for future research.

Physics-Aware Neural Networks for Distribution System State Estimation

The distribution system state estimation problem seeks to determine the network state from available measurements. Widely used Gauss-Newton approaches are very sensitive to the initialization and often not suitable for real-time estimation. Learning approaches are very promising for real-time estimation, as they shift the computational burden to an offline training stage. Prior machine learning approaches to power system state estimation have been electrical model-agnostic, in that they did not exploit the topology and physical laws governing the power grid to design the architecture of the learning model. In this paper, we propose a novel learning model that utilizes the structure of the power grid. The proposed neural network architecture reduces the number of coefficients needed to parameterize the mapping from the measurements to the network state by exploiting the separability of the estimation problem. This prevents overfitting and reduces the complexity of the training stage. We also propose a greedy algorithm for phasor measuring units placement that aims at minimizing the complexity of the neural network required for realizing the state estimation mapping. Simulation results show superior performance of the proposed method over the Gauss-Newton approach.

Symbolic Regression Methods for Reinforcement Learning

Reinforcement learning algorithms can be used to optimally solve dynamic decision-making and control problems. With continuous-valued state and input variables, reinforcement learning algorithms must rely on function approximators to represent the value function and policy mappings. Commonly used numerical approximators, such as neural networks or basis function expansions, have two main drawbacks: they are black-box models offering no insight in the mappings learned, and they require significant trial and error tuning of their meta-parameters. In this paper, we propose a new approach to constructing smooth value functions by means of symbolic regression. We introduce three off-line methods for finding value functions based on a state transition model: symbolic value iteration, symbolic policy iteration, and a direct solution of the Bellman equation. The methods are illustrated on four nonlinear control problems: velocity control under friction, one-link and two-link pendulum swing-up, and magnetic manipulation. The results show that the value functions not only yield well-performing policies, but also are compact, human-readable and mathematically tractable. This makes them potentially suitable for further analysis of the closed-loop system. A comparison with alternative approaches using neural networks shows that our method constructs well-performing value functions with substantially fewer parameters.

CUR Decompositions, Approximations, and Perturbations

This article discusses a useful tool in dimensionality reduction and low-rank matrix approximation called the CUR decomposition. Various viewpoints of this method in the literature are synergized and are compared and contrasted, included in this is a new characterization of exact CUR decompositions. A novel perturbation analysis is performed on CUR approximations of noisy versions of low-rank matrices, which compares them with the putative CUR decomposition of the underlying low-rank part. Additionally, we give new column and row sampling results which allow one to conclude that a CUR decomposition of a low-rank matrix is attained with high probability. We then illustrate the stability of these sampling methods under the perturbations studied before, and provide numerical illustrations of the methods and bounds discussed.

Explaining Reinforcement Learning to Mere Mortals: An Empirical Study

We present a user study to investigate the impact of explanations on non-experts’ understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants’ mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study.

Expert-Augmented Machine Learning

Machine Learning is proving invaluable across disciplines. However, its successis often limited by the quality and quantity of available data, while its adoption by the level of trust that models afford users. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of man and machine. Here we present Expert-Augmented Machine Learning(EAML), an automated method that guides the extraction ofexpert knowledgeand its integration intomachine-learned models. We use a large dataset of intensive care patient data to predict mortality and show that we can extract expert knowledge using an online platform, help reveal hidden confounders, improve generalizability ona different population and learn using less data. EAML presents a novel framework for high performance and dependable machine learning in critical applications.

Time Series Imputation

Multivariate time series is a very active topic in the research community and many machine learning tasks are being used in order to extract information from this type of data. However, in real-world problems data has missing values, which may difficult the application of machine learning techniques to extract information. In this paper we focus on the task of imputation of time series. Many imputation methods for time series are based on regression methods. Unfortunately, these methods perform poorly when the variables are categorical. To address this case, we propose a new imputation method based on Expectation Maximization over dynamic Bayesian networks. The approach is assessed with synthetic and real data, and it outperforms several state-of-the art methods.

Data-driven Prognostics with Predictive Uncertainty Estimation using Ensemble of Deep Ordinal Regression Models

Prognostics or Remaining Useful Life (RUL) Estimation from multi-sensor time series data is useful to enable condition-based maintenance and ensure high operational availability of equipment. We propose a novel deep learning based approach for Prognostics with Uncertainty Quantification that is useful in scenarios where: (i) access to labeled failure data is scarce due to rarity of failures (ii) future operational conditions are unobserved and (iii) inherent noise is present in the sensor readings. All three scenarios mentioned are unavoidable sources of uncertainty in the RUL estimation process often resulting in unreliable RUL estimates. To address (i), we formulate RUL estimation as an Ordinal Regression (OR) problem, and propose LSTM-OR: deep Long Short Term Memory (LSTM) network based approach to learn the OR function. We show that LSTM-OR naturally allows for incorporation of censored operational instances in training along with the failed instances, leading to more robust learning. To address (ii), we propose a simple yet effective approach to quantify predictive uncertainty in the RUL estimation models by training an ensemble of LSTM-OR models. Through empirical evaluation on C-MAPSS turbofan engine benchmark datasets, we demonstrate that LSTM-OR is significantly better than the commonly used deep metric regression based approaches for RUL estimation, especially when failed training instances are scarce. Further, our uncertainty quantification approach yields high quality predictive uncertainty estimates while also leading to improved RUL estimates compared to single best LSTM-OR models.

Toward the Evaluation of Written Proficiency on a Collaborative Social Network for Learning Languages: Yask

Yask is an online social collaborative network for practicing languages in a framework that includes requests, answers, and votes. Since measuring linguistic competence using current approaches is difficult, expensive and in many cases imprecise, we present a new alternative approach based on social networks. Our method, called Proficiency Rank, extends the well-known Page Rank algorithm to measure the reputation of users in a collaborative social graph. First, we extended Page Rank so that it not only considers positive links (votes) but also negative links. Second, in addition to using explicit links, we also incorporate other 4 types of signals implicit in the social graph. These extensions allow Proficiency Rank to produce proficiency rankings for almost all users in the data set used, where only a minority contributes by answering, while the majority contributes only by voting. This overcomes the intrinsic limitation of Page Rank of only being able to rank the nodes that have incoming links. Our experimental validation showed that the reputation/importance of the users in Yask is significantly correlated with their language proficiency. In contrast, their written production was poorly correlated with the vocabulary profiles of the Common European Framework of Reference. In addition, we found that negative signals (votes) are considerably more informative than positive ones. We concluded that the use of this technology is a promising tool for measuring second language proficiency, even for relatively small groups of people.

Data Poisoning against Differentially-Private Learners: Attacks and Defenses

Data poisoning attacks aim to manipulate the model produced by a learning algorithm by adversarially modifying the training set. We consider differential privacy as a defensive measure against this type of attack. We show that such learners are resistant to data poisoning attacks when the adversary is only able to poison a small number of items. However, this protection degrades as the adversary poisons more data. To illustrate, we design attack algorithms targeting objective and output perturbation learners, two standard approaches to differentially-private machine learning. Experiments show that our methods are effective when the attacker is allowed to poison sufficiently many training items.

Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

Most work in text classification and Natural Language Processing (NLP) focuses on English or a handful of other languages that have text corpora of hundreds of millions of words. This is creating a new version of the digital divide: the artificial intelligence (AI) divide. Transfer-based approaches, such as Cross-Lingual Text Classification (CLTC) – the task of categorizing texts written in different languages into a common taxonomy, are a promising solution to the emerging AI divide. Recent work on CLTC has focused on demonstrating the benefits of using bilingual word embeddings as features, relegating the CLTC problem to a mere benchmark based on a simple averaged perceptron. In this paper, we explore more extensively and systematically two flavors of the CLTC problem: news topic classification and textual churn intent detection (TCID) in social media. In particular, we test the hypothesis that embeddings with context are more effective, by multi-tasking the learning of multilingual word embeddings and text classification; we explore neural architectures for CLTC; and we move from bi- to multi-lingual word embeddings. For all architectures, types of word embeddings and datasets, we notice a consistent gain trend in favor of multilingual joint training, especially for low-resourced languages.

sharpDARTS: Faster and More Accurate Differentiable Architecture Search

Neural Architecture Search (NAS) has been a source of dramatic improvements in neural network design, with recent results meeting or exceeding the performance of hand-tuned architectures. However, our understanding of how to represent the search space for neural net architectures and how to search that space efficiently are both still in their infancy. We have performed an in-depth analysis to identify limitations in a widely used search space and a recent architecture search method, Differentiable Architecture Search (DARTS). These findings led us to introduce novel network blocks with a more general, balanced, and consistent design; a better-optimized Cosine Power Annealing learning rate schedule; and other improvements. Our resulting sharpDARTS (sharp Differentiable Architecture Search) search is 50% faster with a 20-30% relative improvement in final model error on CIFAR-10 when compared to DARTS. Our best single model run has 1.93% (1.98+/-0.07) validation error on CIFAR-10 and 5.5% error (5.8+/-0.3) on the recently released CIFAR-10.1 test set. To our knowledge, both are state of the art for models of similar size. This model also generalizes competitively to ImageNet at 25.1% top-1 (7.8% top-5) error. We found improvements for existing search spaces but does DARTS generalize to new domains? We propose Differentiable Hyperparameter Grid Search and the HyperCuboid search space, which are representations designed to leverage DARTS for more general parameter optimization. Here we find that DARTS fails to generalize when compared against a human’s one shot choice of models. We look back to the DARTS and sharpDARTS search spaces to understand why, and an ablation study reveals an unusual generalization gap. We finally propose Max-W regularization to solve this problem, which proves significantly better than the handmade design. Code will be made available.

Deep recommender engine based on efficient product embeddings neural pipeline

Predictive analytics systems are currently one of the most important areas of research and development within the Artificial Intelligence domain and particularly in Machine Learning. One of the ‘holy grails’ of predictive analytics is the research and development of the ‘perfect’ recommendation system. In our paper we propose an advanced pipeline model for the multi-task objective of determining product complementarity, similarity and sales prediction using deep neural models applied to big-data sequential transaction systems. Our highly parallelized hybrid pipeline consists of both unsupervised and supervised models, used for the objectives of generating semantic product embeddings and predicting sales, respectively. Our experimentation and benchmarking have been done using very large pharma-industry retailer Big Data stream.

Resource Optimization of Product Development Projects with Time-Varying Dependency Structure

Project managers are continuously under pressure to shorten product development durations. One practical approach for reducing the project duration is lessening dependencies between different development components and teams. However, most of the resource allocation strategies for lessening dependencies place the implicit and simplistic assumption that the dependency structure between components is static (i.e., does not change over time). This assumption, however, does not necessarily hold true in all product development projects. In this paper, we present an analytical framework for optimally allocating resources to shorten the lead-time of product development projects having a time-varying dependency structure. We build our theoretical framework on a linear system model of product development processes, in which system integration and local development teams exchange information asynchronously and aperiodically. By utilizing a convexity result from the matrix theory, we show that the optimal resource allocation can be efficiently found by solving a convex optimization problem. We provide illustrative examples to demonstrate the proposed framework. We also present boundary analyses based on major graph models to provide managerial guidelines for improving empirical PD processes.

Needle in a Haystack: A Framework for Seeking Small Objects in Big Datasets

Images from social media can reflect diverse viewpoints, heated arguments, and expressions of creativity — adding new complexity to search tasks. Researchers working on Content-Based Image Retrieval (CBIR) have traditionally tuned their search algorithms to match filtered results with user search intent. However, we are now bombarded with composite images of unknown origin, authenticity, and even meaning. With such uncertainty, users may not have an initial idea of what the results of a search query should look like. For instance, hidden people, spliced objects, and subtly altered scenes can be difficult for a user to detect initially in a meme image, but may contribute significantly to its composition. We propose a new framework for image retrieval that models object-level regions using image keypoints retrieved from an image index, which are then used to accurately weight small contributing objects within the results, without the need for costly object detection steps. We call this method Needle-Haystack (NH) scoring, and it is optimized for fast matrix operations on CPUs. We show that this method not only performs comparably to state-of-the-art methods in classic CBIR problems, but also outperforms them in fine-grained object- and instance-level retrieval on the Oxford 5K, Paris 6K, Google-Landmarks, and NIST MFC2018 datasets, as well as meme-style imagery from Reddit.

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Data augmentation is rapidly gaining attention in machine learning. Synthetic data can be generated by simple transformations or through the data distribution. In the latter case, the main challenge is to estimate the label associated to new synthetic patterns. This paper studies the effect of generating synthetic data by convex combination of patterns and the use of these as unsupervised information in a semi-supervised learning framework with support vector machines, avoiding thus the need to label synthetic examples. We perform experiments on a total of 53 binary classification datasets. Our results show that this type of data over-sampling supports the well-known cluster assumption in semi-supervised learning, showing outstanding results for small high-dimensional datasets and imbalanced learning problems.

Generalization of k-means Related Algorithms

This article briefly introduced Arthur and Vassilvitshii’s work on \textbf{k-means++} algorithm and further generalized the center initialization process. It is found that choosing the most distant sample point from the nearest center as new center can mostly have the same effect as the center initialization process in the \textbf{k-means++} algorithm.

A Formalization of Robustness for Deep Neural Networks

Deep neural networks have been shown to lack robustness to small input perturbations. The process of generating the perturbations that expose the lack of robustness of neural networks is known as adversarial input generation. This process depends on the goals and capabilities of the adversary, In this paper, we propose a unifying formalization of the adversarial input generation process from a formal methods perspective. We provide a definition of robustness that is general enough to capture different formulations. The expressiveness of our formalization is shown by modeling and comparing a variety of adversarial attack techniques.