Learning and Testing Causal Models with Interventions

We consider testing and learning problems on causal Bayesian networks as defined by Pearl (Pearl, 2009). Given a causal Bayesian network \mathcal{M} on a graph with n discrete variables and bounded in-degree and bounded `confounded components’, we show that O(\log n) interventions on an unknown causal Bayesian network \mathcal{X} on the same graph, and \tilde{O}(n/\epsilon^2) samples per intervention, suffice to efficiently distinguish whether \mathcal{X}=\mathcal{M} or whether there exists some intervention under which \mathcal{X} and \mathcal{M} are farther than \epsilon in total variation distance. We also obtain sample/time/intervention efficient algorithms for: (i) testing the identity of two unknown causal Bayesian networks on the same graph; and (ii) learning a causal Bayesian network on a given graph. Although our algorithms are non-adaptive, we show that adaptivity does not help in general: \Omega(\log n) interventions are necessary for testing the identity of two unknown causal Bayesian networks on the same graph, even adaptively. Our algorithms are enabled by a new subadditivity inequality for the squared Hellinger distance between two causal Bayesian networks.

A Practical Algorithm for Distributed Clustering and Outlier Detection

We study the classic k-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by labeling them as outliers. We propose a simple approach based on constructing small summary for the original dataset. The proposed method is time and communication efficient, has good approximation guarantees, and can identify the global outliers effectively. To the best of our knowledge, this is the first practical algorithm with theoretical guarantees for distributed clustering with outliers. Our experiments on both real and synthetic data have demonstrated the clear superiority of our algorithm against all the baseline algorithms in almost all metrics.

A Generalized Active Learning Approach for Unsupervised Anomaly Detection

This work formalizes the new framework for anomaly detection, called active anomaly detection. This framework has, in practice, the same cost of unsupervised anomaly detection but with the possibility of much better results. We show that unsupervised anomaly detection is an undecidable problem and that a prior needs to be assumed for the anomalies probability distribution in order to have performance guarantees. Finally, we also present a new layer that can be attached to any deep learning model designed for unsupervised anomaly detection to transform it into an active anomaly detection method, presenting results on both synthetic and real anomaly detection datasets.

Working Memory Networks: Augmenting Memory Networks with a Relational Reasoning Module

During the last years, there has been a lot of interest in achieving some kind of complex reasoning using deep neural networks. To do that, models like Memory Networks (MemNNs) have combined external memory storages and attention mechanisms. These architectures, however, lack of more complex reasoning mechanisms that could allow, for instance, relational reasoning. Relation Networks (RNs), on the other hand, have shown outstanding results in relational reasoning tasks. Unfortunately, their computational cost grows quadratically with the number of memories, something prohibitive for larger problems. To solve these issues, we introduce the Working Memory Network, a MemNN architecture with a novel working memory storage and reasoning module. Our model retains the relational reasoning abilities of the RN while reducing its computational complexity from quadratic to linear. We tested our model on the text QA dataset bAbI and the visual QA dataset NLVR. In the jointly trained bAbI-10k, we set a new state-of-the-art, achieving a mean error of less than 0.5%. Moreover, a simple ensemble of two of our models solves all 20 tasks in the joint version of the benchmark.

Hyperbolic Attention Networks

We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact.

Entropy and mutual information in models of deep neural networks

We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method. (iii) We propose an experiment framework with generative models of synthetic datasets, on which we train deep neural networks with a weight constraint designed so that the assumption in (i) is verified during learning. We study the behavior of entropies and mutual informations throughout learning and conclude that, in the proposed setting, the relationship between compression and generalization remains elusive.

Log Gaussian Cox Process Networks

We generalize the log Gaussian Cox process (LGCP) framework to model multiple correlated point data jointly. The resulting log Gaussian Cox process network (LGCPN) considers the observations as realizations of multiple LGCPs, whose log intensities are given by linear combinations of latent functions drawn from Gaussian process priors. The coefficients of these linear combinations are also drawn from Gaussian processes and can incorporate additional dependencies a priori. We derive closed-form expressions for the moments of the intensity functions in our model and use them to develop an efficient variational inference algorithm that is orders of magnitude faster than competing deterministic and stochastic approximations of multivariate LGCP and coregionalization models. Our approach outperforms the state of the art in jointly estimating multiple bovine tuberculosis incidents in Cornwall, UK, and multiple crime type intensities across New York city.

Towards Robust Evaluations of Continual Learning

Continual learning experiments used in current deep learning papers do not faithfully assess fundamental challenges of learning continually, masking weak-points of the suggested approaches instead. We study gaps in such existing evaluations, proposing essential experimental evaluations that are more representative of continual learning’s challenges, and suggest a re-prioritization of research efforts in the field. We show that current approaches fail with our new evaluations and, to analyse these failures, we propose a variational loss which unifies many existing solutions to continual learning under a Bayesian framing, as either ‘prior-focused’ or ‘likelihood-focused’. We show that while prior-focused approaches such as EWC and VCL perform well on existing evaluations, they perform dramatically worse when compared to likelihood-focused approaches on other simple tasks.

Hierarchical Clustering with Structural Constraints

Hierarchical clustering is a popular unsupervised data analysis method. For many real-world applications, we would like to exploit prior information about the data that imposes constraints on the clustering hierarchy, and is not captured by the set of features available to the algorithm. This gives rise to the problem of ‘hierarchical clustering with structural constraints’. Structural constraints pose major challenges for bottom-up approaches like average/single linkage and even though they can be naturally incorporated into top-down divisive algorithms, no formal guarantees exist on the quality of their output. In this paper, we provide provable approximation guarantees for two simple top-down algorithms, using a recently introduced optimization viewpoint of hierarchical clustering with pairwise similarity information [Dasgupta, 2016]. We show how to find good solutions even in the presence of conflicting prior information, by formulating a ‘constraint-based regularization’ of the objective. Finally, we explore a variation of this objective for dissimilarity information [Cohen-Addad et al., 2018] and improve upon current techniques.

Deep Reinforcement Learning For Sequence to Sequence Models

In recent years, sequence-to-sequence (seq2seq) models are used in a variety of tasks from machine translation, headline generation, text summarization, speech to text, to image caption generation. The underlying framework of all these models are usually a deep neural network which contains an encoder and decoder. The encoder processes the input data and a decoder receives the output of the encoder and generates the final output. Although simply using an encoder/decoder model would, most of the time, produce better result than traditional methods on the above-mentioned tasks, researchers proposed additional improvements over these sequence to sequence models, like using an attention-based model over the input, pointer-generation models, and self-attention models. However, all these seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently a completely fresh point of view emerged in solving these two problems in seq2seq models by using methods in Reinforcement Learning (RL). In these new researches, we try to look at the seq2seq problems from the RL point of view and we try to come up with a formulation that could combine the power of RL methods in decision-making and sequence to sequence models in remembering long memories. In this paper, we will summarize some of the most recent frameworks that combines concepts from RL world to the deep neural network area and explain how these two areas could benefit from each other in solving complex seq2seq tasks. In the end, we will provide insights on some of the problems of the current existing models and how we can improve them with better RL models. We also provide the source code for implementing most of the models that will be discussed in this paper on the complex task of abstractive text summarization.

Cautious Deep Learning

Most classifiers operate by selecting the maximum of an estimate of the conditional distribution p(y|x) where x stands for the features of the instance to be classified and y denotes its label. This often results in a hubristic bias: overconfidence in the assignment of a definite label. Usually, the observations are concentrated on a small volume but the classifier provides definite predictions for the entire space. We propose constructing conformal prediction sets [vovk2005algorithmic] which contain a set of labels rather than a single label. These conformal prediction sets contain the true label with probability 1-\alpha. Our construction is based on p(x|y) rather than p(y|x) which results in a classifier that is very cautious: it outputs the null set – meaning `I don’t know’ — when the object does not resemble the training examples. An important property of our approach is that classes can be added or removed without having to retrain the classifier. We demonstrate the performance on the ImageNet ILSVRC dataset using high dimensional features obtained from state of the art convolutional neural networks.

Probing entanglement in a many-body-localized system
Stereo Magnification: Learning View Synthesis using Multiplane Images
Quantum information measures of the one-dimensional Robin quantum well
Adversarial Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation
Implicit Autoencoders
Thermodynamic properties of the one-dimensional Robin quantum well
Meta-Gradient Reinforcement Learning
Prediction of Autism Treatment Response from Baseline fMRI using Random Forests and Tree Bagging
New Insights into Bootstrapping for Bandits
Multi-Task Zipping via Layer-wise Neuron Sharing
How Many Directions Determine a Shape and other Sufficiency Results for Two Topological Transforms
Mining Procedures from Technical Support Documents
Tie-Line Characteristics based Partitioning for Distributed Optimization of Power Systems
Enumeration of border-strip decompositions
The parallel texts of books translations in the quality evaluation of basic models and algorithms for the similarity of symbol strings
Local SGD Converges Fast and Communicates Little
Modular bootstrap agrees with path integral in the large moduli limit
Automorphism groups of maps, hypermaps and dessins
Geographical Hidden Markov Tree for Flood Extent Mapping (With Proof Appendix)
Ultra-Reliable Communication over Arbitrarily Varying Channels under Block-Restricted Jamming
Random Walks on Dynamical Random Environments with Non-Uniform Mixing
Mobile Face Tracking: A Survey and Benchmark
Semi-Random Graphs with Planted Sparse Vertex Cuts: Algorithms for Exact and Approximate Recovery
Letting Emotions Flow: Success Prediction by Modeling the Flow of Emotions in Books
A Simple Proof of the DPRZ-Theorem for 2D Cover Times
Impact of delayed acceleration feedback on the classical car-following model
The detection of professional fraud in automobile insurance using social network analysis
On the spectral structure of Jordan-Kronecker products of symmetric and skew-symmetric matrices
Estimating Population Average Causal Effects in the Presence of Non-Overlap: A Bayesian Approach
Johnson-Mehl Cell-based Analysis of UL Cellular Network with Coupled User and BS Locations
Rainbow fractional matchings
Image-to-image translation for cross-domain disentanglement
One dimensional critical Kinetic Fokker-Planck equations, Bessel and stable processes
Phase Diagram of Quantum Hall Breakdown and Non-linear Phenomena for InGaAs/InP Quantum Wells
Computing the resolvent of the sum of maximally monotone operators with the averaged alternating modified reflections algorithm
Learning convex polytopes with margin
Rare slips in fluctuating synchronized oscillator networks
Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms
Minimum Information Exchange in Distributed Systems
Autonomously and Simultaneously Refining Deep Neural Network Parameters by Generative Adversarial Networks
Triangle-factors in pseudorandom graphs
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
No More Differentiator in PID:Development of Nonlinear Lead for Precision Mechatronics
R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering
Convex method for selection of fixed effects in high-dimensional linear mixed models
Been There, Done That: Meta-Learning with Episodic Recall
Forming IDEAS Interactive Data Exploration & Analysis System
LF-Net: Learning Local Features from Images
On the sum of $k$-th largest distance eigenvalues of graphs
Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals
Uncertainty-Aware Attention for Reliable Interpretation and Prediction
Stochastic integration and differential equations for typical paths
Nonlinear Acceleration of Deep Neural Networks
Decentralized MPC based Obstacle Avoidance for Multi-Robot Target Tracking Scenarios
Reliable Dispatch of Renewable Generation via Charging of Dynamic PEV Populations
Eternal dominating sets on digraphs and orientations of graphs
SOSELETO: A Unified Approach to Transfer Learning and Training with Noisy Labels
Backpropagation with N-D Vector-Valued Neurons Using Arbitrary Bilinear Products
Optimal pricing for a peer-to-peer sharing platform under network externalities
A0C: Alpha Zero in Continuous Action Space
Non-Preemptive Flow-Time Minimization via Rejections
On interrelations between strongly, weakly and chord separated set-systems (a geometric approach)
Multi-Scale DenseNet-Based Electricity Theft Detection
Native Language Cognate Effects on Second Language Lexical Choice
Computing the Star Chromatic Index of Every Tree in Polynomial Time
Residual Networks as Geodesic Flows of Diffeomorphisms
Vehicular Communication Networks in Automated Driving Era
Model-based inference of conditional extreme value distributions with hydrological applications
Coarse-to-fine Seam Estimation for Image Stitching
Primal-Dual Wasserstein GAN
Hawkes Process Kernel Structure Parametric Search with Renormalization Factors
A Unified Probabilistic Model for Learning Latent Factors and Their Connectivities from High-Dimensional Data
WSD-algorithm based on new method of vector-word contexts proximity calculation via epsilon-filtration
A Hybrid Approach to Music Playlist Continuation Based on Playlist-Song Membership
Phase Retrieval via Polytope Optimization: Geometry, Phase Transitions, and New Algorithms
Hierarchical burst model for complex bursty dynamics
Martin boundaries of the duals of free unitary quantum groups
Finite Blocklength Communications in Smart Grids for Dynamic Spectrum Access and Locally Licensed Scenarios
Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
An Accurate Data Cleaning Procedure for Electron Cyclotron Emission Imaging on EAST Tokamak Based on Methodology of Machine Learning
Cameron-Liebler sets of k-spaces in PG(n,q)
An optimal bound on the solution sets of one-variable word equations and its consequences
Entropy Productions and Their Mathematical Representations: Clausius’ vs. Kelvin’s Views of the Second Law and Irreversibility
Stable specification search in structural equation model with latent variables
AVID: Adversarial Visual Irregularity Detection
Upper Bounds for Ordered Ramsey Numbers of Graphs on Four Vertices
Homfly polynomials for periodic knots via state model
Stable Super-Resolution of Images
You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery
Estimating Carotid Pulse and Breathing Rate from Near-infrared Video of the Neck
A small-world search for quantum speedup: How small-world interactions can lead to improved quantum annealer designs
Kernel-estimated Nonparametric Overlap-Based Syncytial Clustering
AutoAugment: Learning Augmentation Policies from Data
Effective intervals and regular Dirichlet subspaces
A network biology-based approach to evaluating the effect of environmental contaminants on human interactome and diseases
Intelligent Trainer for Model-Based Reinforcement Learning
A data-independent distance to infeasibility for linear conic systems
Multi-Level Deep Cascade Trees for Conversion Rate Prediction
Optimal Algorithms for Continuous Non-monotone Submodular and DR-Submodular Maximization
VisualBackProp for learning using privileged information with CNNs
Deploy Large-Scale Deep Neural Networks in Resource Constrained IoT Devices with Local Quantization Region
Taming Convergence for Asynchronous Stochastic Gradient Descent with Unbounded Delay in Non-Convex Learning
Local structure of multi-dimensional martingale optimal transport
Bayesian predictive densities as an interpretation of a class of Skew–Student $t$ distributions with application to medical data
Log-Sobolev-type inequalities for solutions to stationary Fokker-Planck-Kolmogorov equations
Energy Efficient Delay Sensitive Optimization in SWIPT-MIMO
Simple and practical algorithms for $\ell_p$-norm low-rank approximation
On the SINR Distribution of SWIPT MU-MIMO with Antenna Selection
Complex Relations in a Deep Structured Prediction Model for Fine Image Segmentation
Evading the Adversary in Invariant Representation
Solving Large-Scale Optimization Problems with a Convergence Rate Independent of Grid Size
Large Data and Zero Noise Limits of Graph-Based Semi-Supervised Learning Algorithms
Euclidean Embedding of the Poisson Weighted Infinite Tree and Application to Mobility Models
Incomplete Nested Dissection
Implicit Language Model in LSTM for OCR
Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions
A Two-Stage Subspace Trust Region Approach for Deep Neural Network Training
Recursive functions on conditional Galton–Watson trees
Optimal Hashing in External Memory
Use of symmetric kernels for convolutional neural networks
Statistical properties of lambda terms
Diffractive electron-nucleus scattering and ancestry in branching random walks
Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and Saddle Point Escape Time
Bayesian method for inferring the impact of geographical distance on intensity of communication
Robust one-bit compressed sensing with non-Gaussian measurements
Non-convex non-local flows for saliency detection
Scalable Bayesian Learning for State Space Models using Variational Inference with SMC Samplers
A Projection Approach to Equality Constrained Iterative Linear Quadratic Optimal Control
A hybrid approach of interpolations and CNN to obtain super-resolution
The 2d-directed spanning forest converges to the Brownian web
Identification in Nonparametric Models for Dynamic Treatment Effects
Douglas-Rachford splitting for a Lipschitz continuous and a strongly monotone operator
Coloring general Kneser graphs and hypergraphs via high-discrepancy hypergraphs
pMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity
Classifying cooking object’s state using a tuned VGG convolutional neural network
Embedding Syntax and Semantics of Prepositions via Tensor Decomposition
Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator
Predictive Local Smoothness for Stochastic Gradient Methods
Anonymizing k-Facial Attributes via Adversarial Perturbations
Convolutional Polar Codes on Channels with Memory
Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients
Cumulative subtraction games
Intriguing maximally monotone operators derived from nonsunny nonexpansive retractions
Semi-supervised classification by reaching consensus among modalities
Learning Contextual Bandits in a Non-stationary Environment
On a lower bound for the eccentric connectivity index of graphs
Deep Reinforcement Learning of Marked Temporal Point Processes
Network topology near criticality in adaptive epidemics
Scoring Lexical Entailment with a Supervised Directional Similarity Network
On the Skitovich-Darmois theorem for some locally compact Abelian groups
An infinite-server queueing model MMAPkGk in semi-Markov random environment with marked MAP arrival and subject to catastrophes
Infinite-server queueing model with MAPkGk Markov arrival streams, random volume of customers in random environment subject to catastrophe
The Thickness of K_1,n,n and K_2,n,n
Phocas: dimensional Byzantine-resilient stochastic gradient descent
A New Approach for 4DVar Data Assimilation
Duadic negacyclic codes over a finite non-chain ring and their Gray images
GraphChallenge.org: Raising the Bar on Graph Analytic Performance
A D-vine copula mixed model for joint meta-analysis and comparison of diagnostic tests
First-Hitting Times Under Additive Drift
Optimizing state change detection in functional temporal networks through dynamic community detection
Learning compositionally through attentive guidance
Global-Locally Self-Attentive Dialogue State Tracker
Partial Cartesian Graph Product
DINFRA: A One Stop Shop for Computing Multilingual Semantic Relatedness
Corpus Conversion Service: A machine learning platform to ingest documents at scale [Poster abstract]