Learning a Prior over Intent via Meta-Inverse Reinforcement Learning

A significant challenge for the practical application of reinforcement learning in the real world is the need to specify an oracle reward function that correctly defines a task. Inverse reinforcement learning (IRL) seeks to avoid this challenge by instead inferring a reward function from expert behavior. While appealing, it can be impractically expensive to collect datasets of demonstrations that cover the variation common in the real world (e.g. opening any type of door). Thus in practice, IRL must commonly be performed with only a limited set of demonstrations where it can be exceedingly difficult to unambiguously recover a reward function. In this work, we exploit the insight that demonstrations from other tasks can be used to constrain the set of possible reward functions by learning a ‘prior’ that is specifically optimized for the ability to infer expressive reward functions from limited numbers of demonstrations. We demonstrate that our method can efficiently recover rewards from images for novel tasks and provide intuition as to how our approach is analogous to learning a prior.

Bayesian inference in decomposable graphical models using sequential Monte Carlo methods

In this study we present a sequential sampling methodology for Bayesian inference in decomposable graphical models. We recast the problem of graph estimation, which in general lacks natural sequential interpretation, into a sequential setting. Specifically, we propose a recursive Feynman-Kac model which generates a flow of junction tree distributions over a space of increasing dimensions and develop an efficient sequential Monte Carlo sampler. As a key ingredient of the proposal kernel in our sampler we use the Christmas tree algorithm developed in the companion paper Olsson et al. [2018]. We focus on particle MCMC methods, in particular particle Gibbs (PG) as it allows for generating MCMC chains with global moves on an underlying space of decomposable graphs. To further improve the algorithm mixing properties of this PG, we incorporate a systematic refreshment step implemented through direct sampling from a backward kernel. The theoretical properties of the algorithm are investigated, showing in particular that the refreshment step improves the algorithm performance in terms of asymptotic variance of the estimated distribution. Performance accuracy of the graph estimators are illustrated through a collection of numerical examples demonstrating the feasibility of the suggested approach in both discrete and continuous graphical models.

Fusion Graph Convolutional Networks

Semi-supervised node classification involves learning to classify unlabelled nodes given a partially labeled graph. In transductive learning, all unlabelled nodes to be classified are observed during training and in inductive learning, predictions are to be made for nodes not seen at training. In this paper, we focus on both these settings for node classification in attributed graphs, i.e., graphs in which nodes have additional features. State-of-the-art models for node classification on such attributed graphs use differentiable recursive functions. These differentiable recursive functions enable aggregation and filtering of neighborhood information from multiple hops (depths). Despite being powerful, these variants are limited in their ability to combine information from different hops efficiently. In this work, we analyze this limitation of recursive graph functions in terms of their representation capacity to effectively capture multi-hop neighborhood information. Further, we provide a simple fusion component which is mathematically motivated to address this limitation and improve the existing models to explicitly learn the importance of information from different hops. This proposed mechanism is shown to improve over existing methods across 8 popular datasets from different domains. Specifically, our model improves the Graph Convolutional Network (GCN) and a variant of Graph SAGE by a significant margin providing highly competitive state-of-the-art results.

On GANs and GMMs

A longstanding problem in machine learning is to find unsupervised methods that can learn the statistical structure of high dimensional signals. In recent years, GANs have gained much attention as a possible solution to the problem, and in particular have shown the ability to generate remarkably realistic high resolution sampled images. At the same time, many authors have pointed out that GANs may fail to model the full distribution (‘mode collapse’) and that using the learned models for anything other than generating samples may be very difficult. In this paper, we examine the utility of GANs in learning statistical models of images by comparing them to perhaps the simplest statistical model, the Gaussian Mixture Model. First, we present a simple method to evaluate generative models based on relative proportions of samples that fall into predetermined bins. Unlike previous automatic methods for evaluating models, our method does not rely on an additional neural network nor does it require approximating intractable computations. Second, we compare the performance of GANs to GMMs trained on the same datasets. While GMMs have previously been shown to be successful in modeling small patches of images, we show how to train them on full sized images despite the high dimensionality. Our results show that GMMs can generate realistic samples (although less sharp than those of GANs) but also capture the full distribution, which GANs fail to do. Furthermore, GMMs allow efficient inference and explicit representation of the underlying statistical structure. Finally, we discuss how a pix2pix network can be used to add high-resolution details to GMM samples while maintaining the basic diversity.

Learning Tree Distributions by Hidden Markov Models

Hidden tree Markov models allow learning distributions for tree structured data while being interpretable as nondeterministic automata. We provide a concise summary of the main approaches in literature, focusing in particular on the causality assumptions introduced by the choice of a specific tree visit direction. We will then sketch a novel non-parametric generalization of the bottom-up hidden tree Markov model with its interpretation as a nondeterministic tree automaton with infinite states.

Reinforced Continual Learning

Most artificial intelligence models have limiting ability to solve new tasks faster, without forgetting previously acquired knowledge. The recently emerging paradigm of continual learning aims to solve this issue, in which the model learns various tasks in a sequential fashion. In this work, a novel approach for continual learning is proposed, which searches for the best neural architecture for each coming task via sophisticatedly designed reinforcement learning strategies. We name it as Reinforced Continual Learning. Our method not only has good performance on preventing catastrophic forgetting but also fits new tasks well. The experiments on sequential classification tasks for variants of MNIST and CIFAR-100 datasets demonstrate that the proposed approach outperforms existing continual learning alternatives for deep networks.

Root-cause Analysis for Time-series Anomalies via Spatiotemporal Graphical Modeling in Distributed Complex Systems

Performance monitoring, anomaly detection, and root-cause analysis in complex cyber-physical systems (CPSs) are often highly intractable due to widely diverse operational modes, disparate data types, and complex fault propagation mechanisms. This paper presents a new data-driven framework for root-cause analysis, based on a spatiotemporal graphical modeling approach built on the concept of symbolic dynamics for discovering and representing causal interactions among sub-systems of complex CPSs. We formulate the root-cause analysis problem as a minimization problem via the proposed inference based metric and present two approximate approaches for root-cause analysis, namely the sequential state switching (S^3, based on free energy concept of a restricted Boltzmann machine, RBM) and artificial anomaly association (A^3, a classification framework using deep neural networks, DNN). Synthetic data from cases with failed pattern(s) and anomalous node(s) are simulated to validate the proposed approaches. Real dataset based on Tennessee Eastman process (TEP) is also used for comparison with other approaches. The results show that: (1) S^3 and A^3 approaches can obtain high accuracy in root-cause analysis under both pattern-based and node-based fault scenarios, in addition to successfully handling multiple nominal operating modes, (2) the proposed tool-chain is shown to be scalable while maintaining high accuracy, and (3) the proposed framework is robust and adaptive in different fault conditions and performs better in comparison with the state-of-the-art methods.

Collaborative Human-AI (CHAI): Evidence-Based Interpretable Melanoma Classification in Dermoscopic Images

Automated dermoscopic image analysis has witnessed rapid growth in diagnostic performance. Yet adoption faces resistance, in part, because no evidence is provided to support decisions. In this work, an approach for evidence-based classification is presented. A feature embedding is learned with CNNs, triplet-loss, and global average pooling, and used to classify via kNN search. Evidence is provided as both the discovered neighbors, as well as localized image regions most relevant to measuring distance between query and neighbors. To ensure that results are relevant in terms of both label accuracy and human visual similarity for any skill level, a novel hierarchical triplet logic is implemented to jointly learn an embedding according to disease labels and non-expert similarity. Results are improved over baselines trained on disease labels alone, as well as standard multiclass loss. Quantitative relevance of results, according to non-expert similarity, as well as localized image regions, are also significantly improved.

How Important Is a Neuron

The problem of attributing a deep network’s prediction to its \emph{input/base} features is well-studied. We introduce the notion of \emph{conductance} to extend the notion of attribution to the understanding the importance of \emph{hidden} units. Informally, the conductance of a hidden unit of a deep network is the \emph{flow} of attribution via this hidden unit. We use conductance to understand the importance of a hidden unit to the prediction for a specific input, or over a set of inputs. We evaluate the effectiveness of conductance in multiple ways, including theoretical properties, ablation studies, and a feature selection task. The empirical evaluations are done using the Inception network over ImageNet data, and a sentiment analysis network over reviews. In both cases, we demonstrate the effectiveness of conductance in identifying interesting insights about the internal workings of these networks.

Context Exploitation using Hierarchical Bayesian Models

We consider the problem of how to improve automatic target recognition by fusing the naive sensor-level classification decisions with ‘intuition,’ or context, in a mathematically principled way. This is a general approach that is compatible with many definitions of context, but for specificity, we consider context as co-occurrence in imagery. In particular, we consider images that contain multiple objects identified at various confidence levels. We learn the patterns of co-occurrence in each context, then use these patterns as hyper-parameters for a Hierarchical Bayesian Model. The result is that low-confidence sensor classification decisions can be dramatically improved by fusing those readings with context. We further use hyperpriors to address the case where multiple contexts may be appropriate. We also consider the Bayesian Network, an alternative to the Hierarchical Bayesian Model, which is computationally more efficient but assumes that context and sensor readings are uncorrelated.

U-statistical inference for hierarchical clustering

Clustering methods are a valuable tool for the identification of patterns in high dimensional data with applications in many scientific problems. However, quantifying uncertainty in clustering is a challenging problem, particularly when dealing with High Dimension Low Sample Size (HDLSS) data. We develop here a U-statistics based clustering approach that assesses statistical significance in clustering and is specifically tailored to HDLSS scenarios. These non-parametric methods rely on very few assumptions about the data, and thus can be applied to a wide range of datasets for which the euclidean distance captures relevant features. We propose two significance clustering algorithms, a hierarchical method and a non-nested version. In order to do so, we first propose an extension of a relevant U-statistics and develop its asymptotic theory. Our methods are tested through extensive simulations and found to be more powerful than competing alternatives. They are further showcased in two applications ranging from genetics to image recognition problems.

Why do deep convolutional networks generalize so poorly to small image transformations

Deep convolutional network architectures are often assumed to guarantee generalization for small image translations and deformations. In this paper we show that modern CNNs (VGG16, ResNet50, and InceptionResNetV2) can drastically change their output when an image is translated in the image plane by a few pixels, and that this failure of generalization also happens with other realistic small image transformations. Furthermore, the deeper the network the more we see these failures to generalize. We show that these failures are related to the fact that the architecture of modern CNNs ignores the classical sampling theorem so that generalization is not guaranteed. We also show that biases in the statistics of commonly used image datasets makes it unlikely that CNNs will learn to be invariant to these transformations. Taken together our results suggest that the performance of CNNs in object recognition falls far short of the generalization capabilities of humans.

Channel Gating Neural Networks

Employing deep neural networks to obtain state-of-the-art performance on computer vision tasks can consume billions of floating point operations and several Joules of energy per evaluation. Network pruning, which statically removes unnecessary features and weights, has emerged as a promising way to reduce this computation cost. In this paper, we propose channel gating, a dynamic, fine-grained, training-based computation-cost-reduction scheme. Channel gating works by identifying the regions in the features which contribute less to the classification result and turning off a subset of the channels for computing the pixels within these uninteresting regions. Unlike static network pruning, the channel gating optimizes computations exploiting characteristics specific to each input at run-time. We show experimentally that applying channel gating in state-of-the-art networks can achieve 66% and 60% reduction in FLOPs with 0.22% and 0.29% accuracy loss on the CIFAR-10 and CIFAR-100 datasets, respectively.

On Acceleration with Noise-Corrupted Gradients
Diverse and Controllable Image Captioning with Part-of-Speech Guidance
Age of Information in G/G/1/1 Systems
Approximation complexity of homogeneous sums of random processes
A comparison of gerrymandering metrics
Approximate Knowledge Compilation by Online Collapsed Importance Sampling
Modeling 4D fMRI Data via Spatio-Temporal Convolutional Neural Networks (ST-CNN)
The Complexity of Splitting Necklaces and Bisecting Ham Sandwiches
Renormalization of Sparse Disorder in the Ising Model
The integer quantum Hall plateau transition is a current algebra after all
Long-time predictive modeling of nonlinear dynamical systems using neural networks
Decoding Algorithms for Hypergraph Subsystem Codes and Generalized Subsystem Surface Codes
A further study on the opioid epidemic dynamical model with random perturbation
Analysis of Fast Structured Dictionary Learning
Numerical Simulation of 2.5-Set of Multiple Ito Stochastic Integrals of Multiplicities 1 to 5
Fully Automated Organ Segmentation in Male Pelvic CT Images
Uncertainty Quantification and Propagation of Imprecise Probabilities with Copula Dependence Modeling
Whole Brain Susceptibility Mapping Using Harmonic Incompatibility Removal
Incremental Natural Language Processing: Challenges, Strategies, and Evaluation
Scaling provable adversarial defenses
On the Origins of Memes by Means of Fringe Web Communities
Cyberattack Detection using Deep Generative Models with Variational Inference
Accurate pedestrian localization in overhead depth images via Height-Augmented HOG
Asymptotic performance of regularized multi-task learning
Robust Gyroscope-Aided Camera Self-Calibration
The long-term impact of ranking algorithms in growing networks
Eshelby description of highly viscous flow — half model, half theory
Practical Study of Deterministic Regular Expressions from Large-scale XML and Schema Data
Improving the Results of Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective
Multi-Label Transfer Learning for Semantic Similarity
A faster hafnian formula for complex matrices and its benchmarking on the Titan supercomputer
Sequential Attacks on Agents for Long-Term Adversarial Goals
Density estimates for the solutions of backward stochastic differential equations driven by Gaussian processes
Lagrangian subspaces, delta-matroids and four-term relations
Fiber bundle model under heterogeneous loading
Distributed Estimation of Gaussian Correlations
Neural Network Acceptability Judgments
New lower bounds to the output entropy of multi-mode quantum Gaussian channels
A Method Based on Convex Cone Model for Image-Set Classification with CNN Features
Quasi-shuffle algebras and applications
Hitting probabilities for compound Poisson processes in a bipartite network
Grid-side Flexibility of Power Systems in Integrating Large-scale Renewable Generations: A Critical Review on Concepts, Formulations and Solution Approaches
Simulation of Random Variables under Rényi Divergence Measures of All Orders
The depth of a reflexive polytope
Optimizing Quantum Circuits for Arithmetic
New Feature Detection Mechanism for Extended Kalman Filter Based Monocular SLAM with 1-Point RANSAC
Central limit theorems for the $L_p$-error of smooth isotonic estimators
HOPF: Higher Order Propagation Framework for Deep Collective Classification
A Robust Iterative Scheme for Symmetric Indefinite Systems
One-shot domain adaptation in multiple sclerosis lesion segmentation using convolutional neural networks
‘Constant in gain Lead in phase’ element – Application in precision motion control
On Prefix Normal Words
Breaking-down the Ontology Alignment Task with a Lexical Index and Neural Embeddings
Tropical Foundations for Probability & Statistics on Phylogenetic Tree Space
Deep Learning with unsupervised data labeling for weeds detection on UAV images
KG^2: Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings
Sample Reuse via Importance Sampling in Information Geometric Optimization
Agents and Devices: A Relative Definition of Agency
SemEval 2019 Shared Task: Cross-lingual Semantic Parsing with UCCA – Call for Participation
Computing all Wardrop Equilibria parametrized by the Flow Demand
Superensemble classifier for learning from imbalanced business school data set
On decoupling in Banach spaces
Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update
A step beyond Freiman’s theorem for set addition modulo a prime
Lip Reading Using Convolutional Auto Encoders as Feature Extractor
Forgetting Memories and their Attractiveness
Physical Layer Security over Fluctuating Two-Ray Fading Channels
Light Field Denoising via Anisotropic Parallax Analysis in a CNN Framework
Deep Energy: Using Energy Functions for Unsupervised Training of DNNs
Return probability for the Anderson model on the random regular graph
Classification of volcanic ash particles using a convolutional neural network and probability
DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder
Assessing diversity in multiplex networks
Crowdsourcing for Reminiscence Chatbot Design
Optimal cyclic $(r,δ)$ locally repairable codes with unbounded length
An ADMM-Based Interior-Point Method for Large-Scale Linear Programming
Least-square based recursive optimization for distance-based source localization
Hallucinating robots: Inferring obstacle distances from partial laser measurements
False-Accept/False-Reject Trade-offs in Biometric Authentication Systems
On representation power of neural network-based graph embedding and beyond
Simultaneous Optical Flow and Segmentation (SOFAS) using Dynamic Vision Sensor
Metric on Nonlinear Dynamical Systems with Koopman Operators
DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation
Geometric Active Learning via Enclosing Ball Boundary
QuickIM: Efficient, Accurate and Robust Influence Maximization Algorithm on Billion-Scale Networks
Skyblocking: Learning Blocking Schemes on the Skyline
Multiaccuracy: Black-Box Post-Processing for Fairness in Classification
Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data
Radio Vortex Wireless Communications With Non-Coaxial UCA Transceiver
Conformation Clustering of Long MD Protein Dynamics with an Adversarial Autoencoder
Collaborative Multi-modal deep learning for the personalized product retrieval in Facebook Marketplace
Stackelberg Game Approaches for Anti-jamming Defence in Wireless Networks
Attention-Based LSTM for Psychological Stress Detection from Spoken Language Using Distant Supervision
Start Late or Finish Early: A Distributed Graph Processing System with Redundancy Reduction
Adversarial Attacks on Face Detectors using Neural Net based Constrained Optimization
Rotation Equivariance and Invariance in Convolutional Neural Networks
Robust MIMO Radar Target Localization based on Lagrange Programming Neural Network
Evaluating Reinforcement Learning Algorithms in Observational Health Settings
Conormal Varieties on the Cominuscule Grassmannian – II
Image-Dependent Local Entropy Models for Learned Image Compression
Empirical Evaluation of Character-Based Model on Neural Named-Entity Recognition in Indonesian Conversational Texts
Efficient Traffic-Sign Recognition with Scale-aware CNN
On the Impact of Various Types of Noise on Neural Machine Translation
Millimeter-Wave NOMA Transmission in Cellular M2M Communications for Internet of Things
FPGA-based Acceleration of FT Convolution for Pulsar Search Using OpenCL
Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC
Optimization of the Energy-Efficient Relay-Based massive IoT Network
Learning Factorized Representations for Open-set Domain Adaptation
Optimized Participation of Multiple Fusion Functions in Consensus Creation: An Evolutionary Approach
Rehabilitating the Color Checker Dataset for Illuminant Estimation
Harmonic-summing Module of SKA on FPGA–Optimising the Irregular Memory Accesses
Bayesian forecasting of mortality rates using latent Gaussian models
Note on the robustification of the Student $t$-test statistic using the median and the median absolute deviation
Multi-Resolution 3D Convolutional Neural Networks for Object Recognition
Sequential Experimental Design for Optimal Structural Intervention in Gene Regulatory Networks Based on the Mean Objective Cost of Uncertainty
Lower bounds for Laplacian spread and relations with invariant parameters revisited
Optimal Sample Size Planning for the Wilcoxon-Mann-Whitney-Test
On the influence of the interaction graph on a finite dynamical system
Mining gold from implicit models to improve likelihood-free inference
Novel Video Prediction for Large-scale Scene using Optical Flow
Efficient Dispersion of Mobile Robots on Graphs
High-Quality Disjoint and Overlapping Community Structure in Large-Scale Complex Networks
A General Convergence Result for Mirror Descent with Armijo Line Search
New quantum codes constructed from some self-dual additive $\mathbb{F}_4$-codes
Dense labeling of large remote sensing imagery with convolutional neural networks: a simple and faster alternative to stitching output label maps
Recurrent Deep Embedding Networks for Genotype Clustering and Ethnicity Prediction
Introducing shrinkage in heavy-tailed state space models to predict equity excess returns
A Web-scale system for scientific knowledge exploration
Conjoining uncooperative societies facilitates evolution of cooperation
Mixing time and cutoff for the weakly asymmetric simple exclusion process
Monodromy Solver: Sequential and Parallel
Profiling presence patterns and segmenting user locations from cell phone data
Cascade Centrality with heterogeneous nodal influence in a noisy environment
Bayesian Nonparametric Higher Order Hidden Markov Models
Low Dimensional Dynamics of the Kuramoto Model with Rational Frequency Distributions
Plasticity as the $Γ$-Limit of a Two-Dimensional Dislocation Energy: the Critical Regime without the Assumption of Well-Separateness
Predicting the last zero of a spectrally negative Lévy process
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
Computing Small Unit-Distance Graphs with Chromatic Number 5
Deep Segment Hash Learning for Music Generation
Learning Video Summarization Using Unpaired Data
A Flexible Multi-Objective Bayesian Optimization Approach using Random Scalarizations
Supervised Mixed Norm Autoencoder for Kinship Verification in Unconstrained Videos
Quantum proof systems for iterated exponential time, and beyond
What the Vec Towards Probabilistically Grounded Embeddings
There Is No Free Lunch In Adversarial Robustness (But There Are Unexpected Benefits)
Machine learning many-body localization: Search for the elusive nonergodic metal
Improved bounds for the regularity of powers of edge ideals of graphs
Data-driven Design: A Case for Maximalist Game Design
Structural Isomprphism in Mathematical Expressions: A Simple Coding Scheme
The Arbitrarily Varying Gaussian Relay Channel with Sender Frequency Division
Groups of automorphisms of p-adic integers and the problem of the existence of fully homomorphic ciphers
On Schottky Noise and Shot Noise
Opinion Forming in Binomial Random Graph and Expanders
An Improved Active Disturbance Rejection Control for a Differential Drive Mobile Robot with Mismatched Disturbances and Uncertainties