Overoptimization Failures and Specification Gaming in Multi-agent Systems

Overoptimization failures in machine learning and AI can involve specification gaming, reward hacking, fragility to distributional shifts, and Goodhart’s or Campbell’s law. These failure modes are an important challenge in building safe AI systems, but multi-agent systems have additional related failure modes. These failure modes are more complex, more problematic, and less well understood in the multi-agent setting, at least partially because they are not yet observed in practice. This paper explains why this is the case, then lays out some of the classes of such failure, such as accidental steering, coordination failures, adversarial misalignment, input spoofing or filtering, and goal co-option or direct hacking.

Classifying and Visualizing Emotions with Emotional DAN

Classification of human emotions remains an important and challenging task for many computer vision algorithms, especially in the era of humanoid robots which coexist with humans in their everyday life. Currently proposed methods for emotion recognition solve this task using multi-layered convolutional networks that do not explicitly infer any facial features in the classification phase. In this work, we postulate a fundamentally different approach to solve emotion recognition task that relies on incorporating facial landmarks as a part of the classification loss function. To that end, we extend a recently proposed Deep Alignment Network (DAN) with a term related to facial features. Thanks to this simple modification, our model called EmotionalDAN is able to outperform state-of-the-art emotion classification methods on two challenging benchmark dataset by up to 5%. Furthermore, we visualize image regions analyzed by the network when making a decision and the results indicate that our EmotionalDAN model is able to correctly identify facial landmarks responsible for expressing the emotions.

Convolutional Set Matching for Graph Similarity

We introduce GSimCNN (Graph Similarity Computation via Convolutional Neural Networks) for predicting the similarity score between two graphs. As the core operation of graph similarity search, pairwise graph similarity computation is a challenging problem due to the NP-hard nature of computing many graph distance/similarity metrics. We demonstrate our model using the Graph Edit Distance (GED) as the example metric. Experiments on three real graph datasets demonstrate that our model achieves the state-of-the-art performance on graph similarity search.

A mathematical theory of semantic development in deep neural networks

An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep learning dynamics to give rise to these regularities.

Inverse reinforcement learning for video games

Deep reinforcement learning achieves superhuman performance in a range of video game environments, but requires that a designer manually specify a reward function. It is often easier to provide demonstrations of a target behavior than to design a reward function describing that behavior. Inverse reinforcement learning (IRL) algorithms can infer a reward from demonstrations in low-dimensional continuous control environments, but there has been little work on applying IRL to high-dimensional video games. In our CNN-AIRL baseline, we modify the state-of-the-art adversarial IRL (AIRL) algorithm to use CNNs for the generator and discriminator. To stabilize training, we normalize the reward and increase the size of the discriminator training dataset. We additionally learn a low-dimensional state representation using a novel autoencoder architecture tuned for video game environments. This embedding is used as input to the reward network, improving the sample efficiency of expert demonstrations. Our method achieves high-level performance on the simple Catcher video game, substantially outperforming the CNN-AIRL baseline. We also score points on the Enduro Atari racing game, but do not match expert performance, highlighting the need for further work.

Exchangeable, Markov multi-state survival process

We consider exchangeable Markov multi-state survival processes — temporal processes taking values over a state-space\mathcal{S} with at least one absorbing failure state \flat \in \mathcal{S} that satisfy natural invariance properties of exchangeability and consistency under subsampling. The set of processes contains many well-known examples from health and epidemiology — survival, illness-death, competing risk, and comorbidity processes; an extension leads to recurrent event processes. We characterize exchangeable Markov multi-state survival processes in both discrete and continuous time. Statistical considerations impose natural constraints on the space of models appropriate for applied work. In particular, we describe constraints arising from the notion of composable systems. We end with an application of the developed models to irregularly sampled and potentially censored multi-state survival data, developing a Markov chain Monte Carlo algorithm for posterior computation.

Continual Classification Learning Using Generative Models

Continual learning is the ability to sequentially learn over time by accommodating knowledge while retaining previously learned experiences. Neural networks can learn multiple tasks when trained on them jointly, but cannot maintain performance on previously learned tasks when tasks are presented one at a time. This problem is called catastrophic forgetting. In this work, we propose a classification model that learns continuously from sequentially observed tasks, while preventing catastrophic forgetting. We build on the lifelong generative capabilities of [10] and extend it to the classification setting by deriving a new variational bound on the joint log likelihood, \log p(x; y).

Toward Robust Neural Networks via Sparsification

It is by now well-known that small adversarial perturbations can induce classification errors in deep neural networks. In this paper, we make the case that a systematic exploitation of sparsity is key to defending against such attacks, and that a ‘locally linear’ model for neural networks can be used to develop a theoretical foundation for crafting attacks and defenses. We consider two defenses. The first is a sparsifying front end, which attenuates the impact of the attack by a factor of roughly K/N where N is the data dimension and K is the sparsity level. The second is sparsification of network weights, which attenuates the worst-case growth of an attack as it flows up the network. We also devise attacks based on the locally linear model that outperform the well-known FGSM attack. We provide experimental results for the MNIST and Fashion-MNIST datasets, showing the efficacy of the proposed sparsity-based defenses.

Dynamic Graph Neural Networks

Graphs, which describe pairwise relations between objects, are essential representations of many real-world data such as social networks. In recent years, graph neural networks, which extend the neural network models to graph data, have attracted increasing attention. Graph neural networks have been applied to advance many different graph related tasks such as reasoning dynamics of the physical system, graph classification, and node classification. Most of the existing graph neural network models have been designed for static graphs, while many real-world graphs are inherently dynamic. For example, social networks are naturally evolving as new users joining and new relations being created. Current graph neural network models cannot utilize the dynamic information in dynamic graphs. However, the dynamic information has been proven to enhance the performance of many graph analytical tasks such as community detection and link prediction. Hence, it is necessary to design dedicated graph neural networks for dynamic graphs. In this paper, we propose DGNN, a new {\bf D}ynamic {\bf G}raph {\bf N}eural {\bf N}etwork model, which can model the dynamic information as the graph evolving. In particular, the proposed framework can keep updating node information by capturing the sequential information of edges, the time intervals between edges and information propagation coherently. Experimental results on various dynamic graphs demonstrate the effectiveness of the proposed framework.

On the analysis of scheduling algorithms for structured parallel computations

Algorithms for scheduling structured parallel computations have been widely studied in the literature. For some time now, Work Stealing is one of the most popular for scheduling such computations, and its performance has been studied in both theory and practice. Although it delivers provably good performances, the effectiveness of its underlying load balancing strategy is known to be limited for certain classes of computations, particularly the ones exhibiting irregular parallelism (e.g. depth first searches). Many studies have addressed this limitation from a purely load balancing perspective, viewing computations as sets of independent tasks, and then analyzing the expected amount of work attached to each processor as the execution progresses. However, these studies make strong assumptions regarding work generation which, despite being standard from a queuing theory perspective — where work generation can be assumed to follow some random distribution — do not match the reality of structured parallel computations — where the work generation is not random, only depending on the structure of a computation. In this paper, we introduce a formal framework for studying the performance of structured computation schedulers, define a criterion that is appropriate for measuring their performance, and present a methodology for analyzing the performance of randomized schedulers. We demonstrate the convenience of this methodology by using it to prove that the performance of Work Stealing is limited, and to analyze the performance of a Work Stealing and Spreading algorithm, which overcomes Work Stealing’s limitation.

Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search

We present a learning-based approach to computing solutions for certain NP-hard problems. Our approach combines deep learning techniques with useful algorithmic elements from classic heuristics. The central component is a graph convolutional network that is trained to estimate the likelihood, for each vertex in a graph, of whether this vertex is part of the optimal solution. The network is designed and trained to synthesize a diverse set of solutions, which enables rapid exploration of the solution space via tree search. The presented approach is evaluated on four canonical NP-hard problems and five datasets, which include benchmark satisfiability problems and real social network graphs with up to a hundred thousand nodes. Experimental results demonstrate that the presented approach substantially outperforms recent deep learning work, and performs on par with highly optimized state-of-the-art heuristic solvers for some NP-hard problems. Experiments indicate that our approach generalizes across datasets, and scales to graphs that are orders of magnitude larger than those used during training.

K For The Price Of 1: Parameter Efficient Multi-task And Transfer Learning

We introduce a novel method that enables parameter-efficient transfer and multitask learning. The basic approach is to allow a model patch – a small set of parameters – to specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases allows a network to learn a completely different embedding that could be used for different tasks (such as converting an SSD detection model into a 1000-class classification model while reusing 98% of parameters of the feature extractor). Similarly, we show that re-learning the existing low-parameter layers (such as depth-wise convolutions) also improves accuracy significantly. Our approach allows both simultaneous (multi-task) learning as well as sequential transfer learning wherein we adapt pretrained networks to solve new problems. For multi-task learning, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task-based performance.

Learning with Interpretable Structure from RNN

In structure learning, the output is generally a structure that is used as supervision information to achieve good performance. Considering the interpretation of deep learning models has raised extended attention these years, it will be beneficial if we can learn an interpretable structure from deep learning models. In this paper, we focus on Recurrent Neural Networks (RNNs) whose inner mechanism is still not clearly understood. We find that Finite State Automaton (FSA) that processes sequential data has more interpretable inner mechanism and can be learned from RNNs as the interpretable structure. We propose two methods to learn FSA from RNN based on two different clustering methods. We first give the graphical illustration of FSA for human beings to follow, which shows the interpretability. From the FSA’s point of view, we then analyze how the performance of RNNs are affected by the number of gates, as well as the semantic meaning behind the transition of numerical hidden states. Our results suggest that RNNs with simple gated structure such as Minimal Gated Unit (MGU) is more desirable and the transitions in FSA leading to specific classification result are associated with corresponding words which are understandable by human beings.

Attack Graph Convolutional Networks by Adding Fake Nodes

Graph convolutional networks (GCNs) have been widely used for classifying graph nodes in the semi-supervised setting. Previous work have shown that GCNs are vulnerable to the perturbation on adjacency and feature matrices of existing nodes. However, it is unrealistic to change existing nodes in many applications, such as existing users in social networks. In this paper, we design algorithms to attack GCNs by adding fake nodes. A greedy algorithm is proposed to generate adjacency and feature matrices of fake nodes, aiming to minimize the classification accuracy on the existing nodes. In additional, we introduce a discriminator to classify fake nodes from real nodes, and propose a Greedy-GAN attack to simultaneously update the discriminator and the attacker, to make fake nodes indistinguishable to the real ones. Our non-targeted attack decreases the accuracy of GCN down to 0.10, and our targeted attack reaches a success rate of 99% on the whole datasets, and 94% on average for attacking a single target node.

Adversarially Robust Optimization with Gaussian Processes

In this paper, we consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and we require the function value to remain as high as possible even after this perturbation. This problem is motivated by settings in which the underlying functions during optimization and implementation stages are different, or when one is interested in finding an entire region of good inputs rather than only a single point. We show that standard GP optimization algorithms do not exhibit the desired robustness properties, and provide a novel confidence-bound based algorithm StableOpt for this purpose. We rigorously establish the required number of samples for StableOpt to find a near-optimal point, and we complement this guarantee with an algorithm-independent lower bound. We experimentally demonstrate several potential applications of interest using real-world data sets, and we show that StableOpt consistently succeeds in finding a stable maximizer where several baseline methods fail.

Perceptual Visual Interactive Learning

Supervised learning methods are widely used in machine learning. However, the lack of labels in existing data limits the application of these technologies. Visual interactive learning (VIL) compared with computers can avoid semantic gap, and solve the labeling problem of small label quantity (SLQ) samples in a groundbreaking way. In order to fully understand the importance of VIL to the interaction process, we re-summarize the interactive learning related algorithms (e.g. clustering, classification, retrieval etc.) from the perspective of VIL. Note that, perception and cognition are two main visual processes of VIL. On this basis, we propose a perceptual visual interactive learning (PVIL) framework, which adopts gestalt principle to design interaction strategy and multi-dimensionality reduction (MDR) to optimize the process of visualization. The advantage of PVIL framework is that it combines computer’s sensitivity of detailed features and human’s overall understanding of global tasks. Experimental results validate that the framework is superior to traditional computer labeling methods (such as label propagation) in both accuracy and efficiency, which achieves significant classification results on dense distribution and sparse classes dataset.

GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks

One of the biggest issues facing the use of machine learning in medical imaging is the lack of availability of large, labelled datasets. The annotation of medical images is not only expensive and time consuming but also highly dependent on the availability of expert observers. The limited amount of training data can inhibit the performance of supervised machine learning algorithms which often need very large quantities of data on which to train to avoid overfitting. So far, much effort has been directed at extracting as much information as possible from what data is available. Generative Adversarial Networks (GANs) offer a novel way to unlock additional information from a dataset by generating synthetic samples with the appearance of real images. This paper demonstrates the feasibility of introducing GAN derived synthetic data to the training datasets in two brain segmentation tasks, leading to improvements in Dice Similarity Coefficient (DSC) of between 1 and 5 percentage points under different conditions, with the strongest effects seen fewer than ten training image stacks are available.

Batch Normalization Sampling

Deep Neural Networks (DNNs) thrive in recent years in which Batch Normalization (BN) plays an indispensable role. However, it has been observed that BN is costly due to the reduction operations. In this paper, we propose alleviating this problem through sampling only a small fraction of data for normalization at each iteration. Specifically, we model it as a statistical sampling problem and identify that by sampling less correlated data, we can largely reduce the requirement of the number of data for statistics estimation in BN, which directly simplifies the reduction operations. Based on this conclusion, we propose two sampling strategies, ‘Batch Sampling’ (randomly select several samples from each batch) and ‘Feature Sampling’ (randomly select a small patch from each feature map of all samples), that take both computational efficiency and sample correlation into consideration. Furthermore, we introduce an extremely simple variant of BN, termed as Virtual Dataset Normalization (VDN), that can normalize the activations well with few synthetical random samples. All the proposed methods are evaluated on various datasets and networks, where an overall training speedup by up to 20% on GPU is practically achieved without the support of any specialized libraries, and the loss on accuracy and convergence rate are negligible. Finally, we extend our work to the ‘micro-batch normalization’ problem and yield comparable performance with existing approaches at the case of tiny batch size.

On a generalized form of subjective probability

This paper is motivated by the questions of how to give the concept of probability an adequate real-world meaning, and how to explain a certain type of phenomenon that can be found, for instance, in Ellsberg’s paradox. It attempts to answer these questions by constructing an alternative theory to one that was proposed in earlier papers on the basis of various important criticisms that were raised against this earlier theory. The conceptual principles of the corresponding definition of probability are laid out and explained in detail. In particular, what is required to fully specify a probability distribution under this definition is not just the distribution function of the variable concerned, but also an assessment of the internal and/or the external strength of this function relative to other distribution functions of interest. This way of defining probability is applied to various examples and problems including, perhaps most notably, to a long-running controversy concerning the distinction between Bayesian and fiducial inference. The characteristics of this definition of probability are carefully evaluated in terms of the issues that it sets out to address.

SilentPhone: Inferring User Unavailability based Opportune Moments to Minimize Call Interruptions
Critical Neuromorphic Computing based on Explosive Synchronization
Finding the best design parameters for optical nanostructures using reinforcement learning
Towards a compact representation of temporal rasters
Helix modelling through the Mardia-Holmes model framework and an extension of the Mardia-Holmes model
Active Decoupling of Transmit and Receive Coils for Full-Duplex MRI
Planification en temps réel avec agenda de buts et sauts
MGP: Un algorithme de planification temps réel prenant en compte l’évolution dynamique du but
Une architecture cognitive et affective orient{é}e interaction
Une approche totalement instanciée pour la planification HTN
Visual Rendering of Shapes on 2D Display Devices Guided by Hand Gestures
Learning from the Syndrome
A Weak Martingale Approach to Linear-Quadratic McKean-Vlasov Stochastic Control Problems
Complementary Lipschitz continuity results for the distribution of intersections or unions of independent random sets in finite spaces
Segmentation Analysis in Human Centric Cyber-Physical Systems using Graphical Lasso
Fast and accurate object detection in high resolution 4K and 8K video using GPUs
Leave-one-out cross-validation for non-factorizable normal models
Friezes satisfying higher SL$_k$-determinants
A Relaxed Optimization Approach for Cardinality-Constrained Portfolio Optimization
Multimodal Polynomial Fusion for Detecting Driver Distraction
Clinical Concept Extraction with Contextual Word Embedding
Patch-based Interferometric Phase Estimation via Mixture of Gaussian Density Modelling & Non-local Averaging in the Complex Domain
Local calibration of verbal autopsy algorithms
Cops and Robbers on Toroidal Chess Graphs
On the Real Stability Radius of Sparse Systems
Statistical mechanics of bipartite $z$-matchings
New insights on concentration inequalities for self-normalized martingales
Reasoning about Norms Revision
The speaker-independent lipreading play-off; a survey of lipreading machines
Loop corrections in spin models through density consistency
Many-body localization in Landau level subbands
Scheduling computations with provably low synchronization overheads
Using Personal Environmental Comfort Systems to Mitigate the Impact of Occupancy Prediction Errors on HVAC Performance
A Reliability Model for Dependent and Distributed MDS Disk Array Units
Minsum $k$-Sink Problem on Path Networks
Strong laws of large numbers for arrays of random variables and stable random fields
Lower Bounds for Oblivious Data Structures
Compositional Set Invariance in Network Systems with Assume-Guarantee Contracts
Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure
A Multilingual Study of Compressive Cross-Language Text Summarization
Predicting the Semantic Textual Similarity with Siamese CNN and LSTM
Graph isomorphism and Gaussian boson sampling
Multi-level Memory for Task Oriented Dialogs
Stationary distributions of the multi-type ASEP
Meta-modeling game for deriving theoretical-consistent, micro-structural-based traction-separation laws via deep reinforcement learning
Sample-Efficient Learning of Nonprehensile Manipulation Policies via Physics-Based Informed State Distributions
Understand, Compose and Respond – Answering Visual Questions by a Composition of Abstract Procedures
Sports Camera Calibration via Synthetic Data
Near-Optimal Online Knapsack Strategy for Real-Time Bidding in Internet Advertising
Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health
Engaging Image Captioning Via Personality
Coding for Computing Arbitrary Functions Unknown to the Encoder
Truncated Back-propagation for Bilevel Optimization
Structure Learning of Deep Networks via DNA Computing Algorithm
SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization
A New Class of Symmetric Distributions Including the Elliptically Symmetric Logistic
Spectral Embedding Norm: Looking Deep into the Spectrum of the Graph Laplacian
Alzheimer’s Disease Prediction Using Longitudinal and Heterogeneous Magnetic Resonance Imaging
Beamforming Optimization for Intelligent Reflecting Surface with Discrete Phase Shifts
Generalized Beamspace Modulation Using Multiplexing: A Breakthrough in mmWave MIMO
Convolutional Deblurring for Natural Imaging
Speaker Selective Beamformer with Keyword Mask Estimation
Wearable Affective Robot
Gaussian Message Passing for Overloaded Massive MIMO-NOMA
Word Embedding based Edit Distance
A novel bidding method for combined heat and power units in district heating systems
Low-Power Wide-Area Networks for Sustainable IoT
A two-phase stochastic programming approach to biomass supply planning for combined heat and power plants
Operational planning and bidding for district heating systems with uncertain renewable energy production
Efficient Learning of Restricted Boltzmann Machines Using Covariance estimates
Supervised Classification Methods for Flash X-ray single particle diffraction Imaging
Identifying multi-scale communities in networks by asymptotic surprise
Non-Coherent Sensor Fusion via Entropy Regularized Optimal Mass Transport
The Logoscope: a Semi-Automatic Tool for Detecting and Documenting French New Words
Spanning Tests for Markowitz Stochastic Dominance
Adaptive motor control and learning in a spiking neural network realised on a mixed-signal neuromorphic processor
Tackling Sequence to Sequence Mapping Problems with Neural Networks
Analyzing Visual Mappings of Traditional and Alternative Music Notation
Adaptive Online Learning in Dynamic Environments
HANDS18: Methods, Techniques and Applications for Hand Observation
Compressed Sensing Plus Motion (CS+M): A New Perspective for Improving Undersampled MR Image Reconstruction
Exactly Solving the Maximum Weight Independent Set Problem on Large Real-World Graphs
Use of Magnetoresistive Random-Access Memory as Approximate Memory for Training Neural Networks
Exploiting NOMA for Multi-Beam UAV Communication in Cellular Uplink
An Adversarial Learning Approach to Medical Image Synthesis for Lesion Removal
Training of a Skull-Stripping Neural Network with efficient data augmentation
Structure learning of undirected graphical models for count data
Approximation of trees by self-nested trees
Wireless Map-Reduce Distributed Computing with Full-Duplex Radios and Imperfect CSI
Dynamic Oracles for Top-Down and In-Order Shift-Reduce Constituent Parsing
Fast Exact Bayesian Inference for Sparse Signals in the Normal Sequence Model
Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings
Investigating the Automatic Classification of Algae Using Fusion of Spectral and Morphological Characteristics of Algae via Deep Residual Learning
Linearly Convergent Variable Sample-Size Schemes for Stochastic Nash Games: Best-Response Schemes and Distributed Gradient-Response Schemes
Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs
Adversarial Semantic Scene Completion from a Single Depth Image
A Markov model for inferring flows in directed contact networks
Bayesian Compression for Natural Language Processing
HAR-Net:Fusing Deep Representation and Hand-crafted Features for Human Activity Recognition
A Deamortization Approach for Dynamic Spanner and Dynamic Maximal Matching
Practical Shape Analysis and Segmentation Methods for Point Cloud Models
Evading classifiers in discrete domains with provable optimality guarantees
Maximum Independent Sets in Subcubic Graphs: New Results
Alzheimer’s Disease Diagnosis Based on Cognitive Methods in Virtual Environments and Emotions Analysis
Understanding the Role of Two-Sided Argumentation in Online Consumer Reviews: A Language-Based Perspective
Training Generative Adversarial Networks Via Turing Test
Differential Variable Speed Limits Control for Freeway Recurrent Bottlenecks via Deep Reinforcement learning
A Preliminary Study on Hyperparameter Configuration for Human Activity Recognition
Art Gallery Problem with Rook Vision
Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features
Efficient Load Sampling for Worst-Case Structural Analysis Under Force Location Uncertainty
Improving the condition number of estimated covariance matrices
Nuclear Norm Regularized Estimation of Panel Regression Models
Heterogeneous Treatment Effect Estimation through Deep Learning