Learning Equations for Extrapolation and Control

We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task.


Identifying Causal Effects with the R Package causaleffect

Do-calculus is concerned with estimating the interventional distribution of an action from the observed joint probability distribution of the variables in a given causal structure. All identifiable causal effects can be derived using the rules of do-calculus, but the rules themselves do not give any direct indication whether the effect in question is identifiable or not. Shpitser and Pearl constructed an algorithm for identifying joint interventional distributions in causal models, which contain unobserved variables and induce directed acyclic graphs. This algorithm can be seen as a repeated application of the rules of do-calculus and known properties of probabilities, and it ultimately either derives an expression for the causal distribution, or fails to identify the effect, in which case the effect is non-identifiable. In this paper, the R package causaleffect is presented, which provides an implementation of this algorithm. Functionality of causaleffect is also demonstrated through examples.


Simplifying Probabilistic Expressions in Causal Inference

Obtaining a non-parametric expression for an interventional distribution is one of the most fundamental tasks in causal inference. Such an expression can be obtained for an identifiable causal effect by an algorithm or by manual application of do-calculus. Often we are left with a complicated expression which can lead to biased or inefficient estimates when missing data or measurement errors are involved. We present an automatic simplification algorithm that seeks to eliminate symbolically unnecessary variables from these expressions by taking advantage of the structure of the underlying graphical model. Our method is applicable to all causal effect formulas and is readily available in the R package causaleffect.


Evaluating Ex Ante Counterfactual Predictions Using Ex Post Causal Inference

We derive a formal, decision-based method for comparing the performance of counterfactual treatment regime predictions using the results of experiments that give relevant information on the distribution of treated outcomes. Our approach allows us to quantify and assess the statistical significance of differential performance for optimal treatment regimes estimated from structural models, extrapolated treatment effects, expert opinion, and other methods. We apply our method to evaluate optimal treatment regimes for conditional cash transfer programs across countries where predictions are generated using data from experimental evaluations in other countries and pre-program data in the country of interest.


Neural Ordinary Differential Equations

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.


Neural Code Comprehension: A Learnable Representation of Code Semantics

With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation, treating it like sentences written in a natural language. However, none of the existing methods are sufficient to comprehend program semantics robustly, due to structural features such as function calls, branching, and interchangeable order of statements. In this paper, we propose a novel processing technique to learn code semantics, and apply it to a variety of program analysis tasks. In particular, we stipulate that a robust distributional hypothesis of code applies to both human- and machine-generated programs. Following this hypothesis, we define an embedding space, inst2vec, based on an Intermediate Representation (IR) of the code that is independent of the source programming language. We provide a novel definition of contextual flow for this IR, leveraging both the underlying data- and control-flow of the program. We then analyze the embeddings qualitatively using analogies and clustering, and evaluate the learned representation on three different high-level tasks. We show that with a single RNN architecture and pre-trained fixed embeddings, inst2vec outperforms specialized approaches for performance prediction (compute device mapping, optimal thread coarsening); and algorithm classification from raw code (104 classes), where we set a new state-of-the-art.


Forest Packing: Fast, Parallel Decision Forests

Machine learning has an emerging critical role in high-performance computing to modulate simulations, extract knowledge from massive data, and replace numerical models with efficient approximations. Decision forests are a critical tool because they provide insight into model operation that is critical to interpreting learned results. While decision forests are trivially parallelizable, the traversals of tree data structures incur many random memory accesses and are very slow. We present memory packing techniques that reorganize learned forests to minimize cache misses during classification. The resulting layout is hierarchical. At low levels, we pack the nodes of multiple trees into contiguous memory blocks so that each memory access fetches data for multiple trees. At higher levels, we use leaf cardinality to identify the most popular paths through a tree and collocate those paths in cache lines. We extend this layout with out-of-order execution and cache-line prefetching to increase memory throughput. Together, these optimizations increase the performance of classification in ensembles by a factor of four over an optimized C++ implementation and a actor of 50 over a popular R language implementation.


Learning from Chunk-based Feedback in Neural Machine Translation

We empirically investigate learning from partial feedback in neural machine translation (NMT), when partial feedback is collected by asking users to highlight a correct chunk of a translation. We propose a simple and effective way of utilizing such feedback in NMT training. We demonstrate how the common machine translation problem of domain mismatch between training and deployment can be reduced solely based on chunk-level user feedback. We conduct a series of simulation experiments to test the effectiveness of the proposed method. Our results show that chunk-level feedback outperforms sentence based feedback by up to 2.61% BLEU absolute.


SMarTplan: a Task Planner for Smart Factories

Smart factories are on the verge of becoming the new industrial paradigm, wherein optimization permeates all aspects of production, from concept generation to sales. To fully pursue this paradigm, flexibility in the production means as well as in their timely organization is of paramount importance. AI is planning a major role in this transition, but the scenarios encountered in practice might be challenging for current tools. Task planning is one example where AI enables more efficient and flexible operation through an online automated adaptation and rescheduling of the activities to cope with new operational constraints and demands. In this paper we present SMarTplan, a task planner specifically conceived to deal with real-world scenarios in the emerging smart factory paradigm. Including both special-purpose and general-purpose algorithms, SMarTplan is based on current automated reasoning technology and it is designed to tackle complex application domains. In particular, we show its effectiveness on a logistic scenario, by comparing its specialized version with the general purpose one, and extending the comparison to other state-of-the-art task planners.


Instance-Level Explanations for Fraud Detection: A Case Study

Fraud detection is a difficult problem that can benefit from predictive modeling. However, the verification of a prediction is challenging; for a single insurance policy, the model only provides a prediction score. We present a case study where we reflect on different instance-level model explanation techniques to aid a fraud detection team in their work. To this end, we designed two novel dashboards combining various state-of-the-art explanation techniques. These enable the domain expert to analyze and understand predictions, dramatically speeding up the process of filtering potential fraud cases. Finally, we discuss the lessons learned and outline open research issues.


Restricted Boltzmann Machines: Introduction and Review

The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is attractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation.


Deep Neural Decision Trees

Deep neural networks have been proven powerful at processing perceptual data, such as images and audio. However for tabular data, tree-based models are more popular. A nice property of tree-based models is their natural interpretability. In this work, we present Deep Neural Decision Trees (DNDT) — tree models realised by neural networks. A DNDT is intrinsically interpretable, as it is a tree. Yet as it is also a neural network (NN), it can be easily implemented in NN toolkits, and trained with gradient descent rather than greedy splitting. We evaluate DNDT on several tabular datasets, verify its efficacy, and investigate similarities and differences between DNDT and vanilla decision trees. Interestingly, DNDT self-prunes at both split and feature-level.


A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress

Inverse reinforcement learning is the problem of inferring the reward function of an observed agent, given its policy or behavior. Researchers perceive IRL both as a problem and as a class of methods. By categorically surveying the current literature in IRL, this article serves as a reference for researchers and practitioners in machine learning to understand the challenges of IRL and select the approaches best suited for the problem on hand. The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size. The article elaborates how the current methods mitigate these challenges. We further discuss the extensions of traditional IRL methods: (i) inaccurate and incomplete perception, (ii) incomplete model, (iii) multiple rewards, and (iv) non-linear reward functions. This discussion concludes with some broad advances in the research area and currently open research questions.


Tensor-Tensor Product Toolbox

Tensors are higher-order extensions of matrices. In recent work [Kilmer and Martin, 2011], the authors introduced the notion of the t-product, a generalization of matrix multiplication for tensors of order three. The multiplication is based on a convolution-like operation, which can be implemented efficiently using the Fast Fourier Transform (FFT). Based on t-product, there has a similar linear algebraic structure of tensors to matrices. For example, there has the tensor SVD (t-SVD) which is computable. By using some properties of FFT, we have a more efficient way for computing t-product and t-SVD in [C. Lu, et al., 2018]. We develop a Matlab toolbox to implement several basic operations on tensors based on t-product. The toolbox is available at https://…/tproduct.


In situ TensorView: In situ Visualization of Convolutional Neural Networks

Convolutional Neural Networks(CNNs) are complex systems. They are trained so they can adapt their internal connections to recognize images, texts and more. It is both interesting and helpful to visualize the dynamics within such deep artificial neural networks so that people can understand how these artificial networks are learning and making predictions. In the field of scientific simulations, visualization tools like Paraview have long been utilized to provide insights and understandings. We present in situ TensorView to visualize the training and functioning of CNNs as if they are systems of scientific simulations. In situ TensorView is a loosely coupled in situ visualization open framework that provides multiple viewers to help users to visualize and understand their networks. It leverages the capability of co-processing from Paraview to provide real-time visualization during training and predicting phases. This avoid heavy I/O overhead for visualizing large dynamic systems. Only a small number of lines of codes are injected in TensorFlow framework. The visualization can provide guidance to adjust the architecture of networks, or compress the pre-trained networks. We showcase visualizing the training of LeNet-5 and VGG16 using in situ TensorView.


Meta Continual Learning

Using neural networks in practical settings would benefit from the ability of the networks to learn new tasks throughout their lifetimes without forgetting the previous tasks. This ability is limited in the current deep neural networks by a problem called catastrophic forgetting, where training on new tasks tends to severely degrade performance on previous tasks. One way to lessen the impact of the forgetting problem is to constrain parameters that are important to previous tasks to stay close to the optimal parameters. Recently, multiple competitive approaches for computing the importance of the parameters with respect to the previous tasks have been presented. In this paper, we propose a learning to optimize algorithm for mitigating catastrophic forgetting. Instead of trying to formulate a new constraint function ourselves, we propose to train another neural network to predict parameter update steps that respect the importance of parameters to the previous tasks. In the proposed meta-training scheme, the update predictor is trained to minimize loss on a combination of current and past tasks. We show experimentally that the proposed approach works in the continual learning setting.


A Graph-Theoretic Analysis of Distributed Replicator Dynamic
Relating the cut distance and the weak* topology for graphons
State equation from the spectral structure of human brain activity
GOE Statistics for Levy Matrices
Quadratic Approximation of Generalized Tribonacci Sequences
No Threshold graphs are cospectral
Records from partial comparisons and discrete approximations
Deterministic $O(1)$-Approximation Algorithms to 1-Center Clustering with Outliers
Faster SGD training by minibatch persistency
Opportunistic Scheduling in Underlay Cognitive Radio based Systems: User Selection Probability Analysis
Statistical Optimal Transport via Geodesic Hubs
Couplings for determinantal point processes and their reduced Palm distributions with a view to quantifying repulsiveness
Reducing Property Graph Queries to Relational Algebra for Incremental View Maintenance
Quantum Nash equilibrium in the thermodynamic limit
A Reputation System for Artificial Societies
Movement-efficient Sensor Deployment in Wireless Sensor Networks with Limited Communication Range
Rate-Memory Trade-Off for Caching and Delivery of Correlated Sources
Hybrid Coordination and Control for Multiagent Systems with Input Constraints
Simultaneous Signal Subspace Rank and Model Selection with an Application to Single-snapshot Source Localization
Cluster-robust Standard Errors for Linear Regression Models with Many Controls
Recommending Scientific Videos based on Metadata Enrichment using Linked Open Data
A Novel Mobile Data Contract Design with Time Flexibility
Estimation from Non-Linear Observations via Convex Programming with Application to Bilinear Regression
A variational approach to Data Assimilation in the Solar Wind
Dynamic Multi-Level Multi-Task Learning for Sentence Simplification
Canonical Tensor Decomposition for Knowledge Base Completion
End-to-End Neural Ranking for eCommerce Product Search: an application of task models and textual embeddings
Variance Reduced Three Operator Splitting
On pathwise quadratic variation for cadlag functions
A one-shot quantum joint typicality lemma
Inner bounds via simultaneous decoding in quantum network information theory
Efficient data augmentation for multivariate probit models with panel data: An application to general practitioner decision-making about contraceptives
Unsupervised Deep Multi-focus Image Fusion
COUNTDOWN – three, two, one, low power! A Run-time Library for Energy Saving in MPI Communication Primitives
vsgoftest: An Package for Goodness-of-Fit Testing Based on Kullback-Leibler Divergence
Learning Conditioned Graph Structures for Interpretable Visual Question Answering
NISQ circuit compilers: search space structure and heuristics
PaMpeR: Proof Method Recommendation System for Isabelle/HOL
Magnetic Resonance Spectroscopy Quantification using Deep Learning
ASIC Implementation of Time-Domain Digital Backpropagation with Deep-Learned Chromatic Dispersion Filters
Self-adaptive Privacy Concern Detection for User-generated Content
Solving Fractional Polynomial Problems by Polynomial Optimization Theory
Painting and Correspondence Coloring of Squares of Planar Graphs with no 4-cycles
Unsupervised Imitation Learning
Agent-Mediated Social Choice
Independent graph of the finite group
Stable Gaussian Process based Tracking Control of Euler-Lagrange Systems
Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task
Mixed batches and symmetric discriminators for GAN training
LIL type behaviour of multivariate Levy processes at zero
Belousov-Zhabotinsky reaction in liquid marbles
When Is the Achievable Rate Region Convex in Two-User Massive MIMO Systems
Letter to the Editor
FRnet-DTI: Convolutional Neural Networks for Drug-Target Interaction
Surrogate Outcomes and Transportability
Non-deterministic Behavior of Ranking-based Metrics when Evaluating Embeddings
Markov chains with heavy-tailed increments and asymptotically zero drift
Approximation Strategies for Incomplete MaxSAT
The determinant of the second additive compound of a square matrix: a formula and applications
Semi-supervised Hashing for Semi-Paired Cross-View Retrieval
Automatic segmentation of prostate zones
Properization
Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models
Large-Scale Stochastic Sampling from the Probability Simplex
Feature learning based on visual similarity triplets in medical image analysis: A case study of emphysema in chest CT scans
FineTag: Multi-label Retrieval of Attributes at Fine-grained Level in Images
Cooperative Queuing Policies for Effective Human-Multi-Robot Interaction
Gradient flow approach to local mean-field spin systems
Infrared and Visible Image Fusion with ResNet and zero-phase component analysis
Positioning Data-Rate Trade-off in mm-Wave Small Cells and Service Differentiation for 5G Networks
ConFusion: Sensor Fusion for Complex Robotic Systems using Nonlinear Optimization
Facing Multiple Attacks in Adversarial Patrolling Games with Alarmed Targets
Modality Distillation with Multiple Stream Networks for Action Recognition
Diffeomorphic brain shape modelling using Gauss-Newton optimisation
Improving brain computer interface performance by data augmentation with conditional Deep Convolutional Generative Adversarial Networks
Nivat’s Conjecture and Pattern Complexity in Algebraic Subshifts
Online Linear Quadratic Control
End-to-End Speech Recognition From the Raw Waveform
Breaking the 6/5 threshold for sums and products modulo a prime
Enhancing Identification of Causal Effects by Pruning
Itemsets of interest for negative association rules
Distributed Optimization over Directed Graphs with Row Stochasticity and Constraint Regularity
Learning to Update for Object Tracking
Transfer Learning with Human Corneal Tissues: An Analysis of Optimal Cut-Off Layer
A New COLD Feature based Handwriting Analysis for Ethnicity/Nationality Identification
Optimizing Leader Influence in Networks through Selection of Direct Followers
A new distance-regular graph of diameter $3$ on $1024$ vertices
Cancer Metastasis Detection With Neural Conditional Random Field
A model-driven approach for a new generation of adaptive libraries
Effect of Hyper-Parameter Optimization on the Deep Learning Model Proposed for Distributed Attack Detection in Internet of Things Environment
Capacitor Based Activity Sensing for Kinetic Powered Wearable IoTs
Impact of Building-Level Motor Protection on Power System Transient Behaviors
MoE-SPNet: A Mixture-of-Experts Scene Parsing Network
Bayesian Sequential Inference in Dynamic Survival Models
Fast Mixing of Metropolis-Hastings with Unimodal Targets
Matrix valued inverse problems on graphs with application to elastodynamic networks
Response Generation by Context-aware Prototype Editing
Defective and Clustered Colouring of Sparse Graphs
EmotionX-DLC: Self-Attentive BiLSTM for Detecting Sequential Emotions in Dialogue
Translating MFM into FOL: towards plant operation planning
Deep neural network based sparse measurement matrix for image compressed sensing
On the Metric Distortion of Embedding Persistence Diagrams into Reproducing Kernel Hilbert Spaces
Complete regular dessins and skew-morphisms of cyclic groups
Maximum average degree and relaxed coloring
On the Cauchy problem for parabolic integro-differential equations in generalized Hölder spaces
The strong chromatic index of $(3,Δ)$-bipartite graphs
Covering 2-connected 3-regular graphs with disjoint paths
Strong chromatic index of graphs with maximum degree four
VirtualHome: Simulating Household Activities via Programs
Thermodynamics of the Minimum Description Length on Community Detection
Maximally Invariant Data Perturbation as Explanation
Theoretical Analysis of Image-to-Image Translation with Adversarial Learning
Emotional Conversation Generation Orientated Syntactically Constrained Bidirectional-asynchronous Framework
Private Text Classification
Optimization over Nonnegative and Convex Polynomials With and Without Semidefinite Programming
Smoothed SVD-based Beamforming for FBMC/OQAM Systems Based on Frequency Spreading
Fast Multiple Landmark Localisation Using a Patch-based Iterative Network
Soft Sampling for Robust Object Detection
Classification of remote sensing images using attribute profiles and feature profiles from different trees: a comparative study
Repetition Estimation
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
A Web of Blocks
Using Mode Connectivity for Loss Landscape Analysis
Towards Gene Expression Convolutions using Gene Interaction Graphs
Bayesian monotonic errors-in-variables models with applications to pathogen susceptibility testing
On the Bias of Reed-Muller Codes over Odd Prime Fields
Comparative Analysis of Neural QA models on SQuAD
Deconvolving convolution neural network for cell detection
Proportional Choosability: A New List Analogue of Equitable Coloring
High-frequency analysis of parabolic stochastic PDEs
A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation
Overlapping Clustering Models, and One (class) SVM to Bind Them All
Reconstruction methods for networks: the case of economic and financial systems
Bayesian Prediction of Future Street Scenes through Importance Sampling based Optimization
Delegated Search Approximates Efficient Search
The domination number of plane triangulations
Paths in ordered trees
Implementation of Peridynamics utilizing HPX — the C++ standard library for parallelism and concurrency
Designing Optimal Binary Rating Systems
Cyclic triangle factors in regular tournaments
Some remarks on the bias distribution analysis of discrete-time identification algorithms based on pseudo-linear regressions
Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images
Learning to Decode 7T-like MR Image Reconstruction from 3T MR Images
The Minimax Learning Rate of Normal and Ising Undirected Graphical Models
Manifold Learning & Stacked Sparse Autoencoder for Robust Breast Cancer Classification from Histopathological Images
Learning Distributed Representations from Reviews for Collaborative Filtering
Combining Word Feature Vector Method with the Convolutional Neural Network for Slot Filling in Spoken Language Understanding
Continuous-variable quantum neural networks
The Off-Topic Memento Toolkit
Strong coupling limit of the Polaron measure and the Pekar process
Age-Minimal Transmission for Energy Harvesting Sensors with Finite Batteries: Online Policies
Beyond Local Nash Equilibria for Adversarial Networks
Coupled Fluid Density and Motion from Single Views
The graphs with all but two eigenvalues equal to $2$ or $-1$
A Hybrid Fuzzy Regression Model for Optimal Loss Reserving in Insurance
Emergent Open-Endedness from Contagion of the Fittest
On the relation between Sion’s minimax theorem and existence of Nash equilibrium in asymmetric multi-players zero-sum game with only one alien
Two Stream Self-Supervised Learning for Action Recognition
G2D: from GTA to Data
A Proof of Delta Conjecture
A Scalable Machine Learning Approach for Inferring Probabilistic US-LI-RADS Categorization
Semantic Image Retrieval by Uniting Deep Neural Networks and Cognitive Architectures
Implicit Quantile Networks for Distributional Reinforcement Learning
Maximum a Posteriori Policy Optimisation
Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network
Deep Sequence Learning with Auxiliary Information for Traffic Prediction
Deep Learning based Estimation of Weaving Target Maneuvers
Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control
A One-Sided Classification Toolkit with Applications in the Analysis of Spectroscopy Data
DeepTerramechanics: Terrain Classification and Slip Estimation for Ground Robots via Deep Learning
A Graph Transduction Game for Multi-target Tracking
Pressure Predictions of Turbine Blades with Deep Learning
Understanding Patch-Based Learning by Explaining Predictions
Task Driven Generative Modeling for Unsupervised Domain Adaptation: Application to X-ray Image Segmentation
DropBack: Continuous Pruning During Training
Multilingual Scene Character Recognition System using Sparse Auto-Encoder for Efficient Local Features Representation in Bag of Features
An optimized system to solve text-based CAPTCHA
DFNet: Semantic Segmentation on Panoramic Images with Dynamic Loss Weights and Residual Fusion Block
Auto-Meta: Automated Gradient Based Meta Learner Search
Distributional Advantage Actor-Critic
Localizing and Quantifying Damage in Social Media Images
A maximal energy pointset configuration problem