How to Make Causal Inferences Using Texts

New text as data techniques offer a great promise: the ability to inductively discover measures that are useful for testing social science theories of interest from large collections of text. We introduce a conceptual framework for making causal inferences with discovered measures as a treatment or outcome. Our framework enables researchers to discover high-dimensional textual interventions and estimate the ways that observed treatments affect text-based outcomes. We argue that nearly all text-based causal inferences depend upon a latent representation of the text and we provide a framework to learn the latent representation. But estimating this latent representation, we show, creates new risks: we may introduce an identification problem or overfit. To address these risks we describe a split-sample framework and apply it to estimate causal effects from an experiment on immigration attitudes and a study on bureaucratic response. Our work provides a rigorous foundation for text-based causal inferences.

Granger-causal Attentive Mixtures of Experts

Several methods have recently been proposed to detect salient input features for outputs of neural networks. Those methods offer a qualitative glimpse at feature importance, but they fall short of providing quantifiable attributions that can be compared across decisions and measures of the expected quality of their explanations. To address these shortcomings, we present an attentive mixture of experts (AME) that couples attentive gating with a Granger-causal objective to jointly produce accurate predictions as well as measures of feature importance. We demonstrate the utility of AMEs by determining factors driving demand for medical prescriptions, comparing predictive features for Parkinson’s disease and pinpointing discriminatory genes across cancer types.

MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster

In this paper, we present a co-designed petascale high-density GPU cluster to expedite distributed deep learning training with synchronous Stochastic Gradient Descent~(SSGD). This architecture of our heterogeneous cluster is inspired by Harvard architecture. Regarding to different roles in the system, nodes are configured as different specifications. Based on the topology of the whole system’s network and properties of different types of nodes, we develop and implement a novel job server parallel software framework, named by ‘\textit{MiMatrix}’, for distributed deep learning training. Compared to the parameter server framework, in which parameter server is a bottleneck of data transfer in AllReduce algorithm of SSGD, the job server undertakes all of controlling, scheduling and monitoring tasks without model data transfer. In MiMatrix, we propose a novel GPUDirect Remote direct memory access~(RDMA)-aware parallel algorithm of AllReucde executed by computing servers, which both computation and handshake message are O(1) at each epoch

Augmented Artificial Intelligence

All artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes (‘non-human’ errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This talk presents an initial summary report of project taking new and systematic approach to improving the intellectual effectiveness of the individual AI by communities of AIs. We combine some ideas of learning in heterogeneous multiagent systems with new and original mathematical approaches for non-iterative corrections of errors of legacy AI systems.

Recent Advances in Neural Program Synthesis

In recent years, deep learning has made tremendous progress in a number of fields that were previously out of reach for artificial intelligence. The successes in these problems has led researchers to consider the possibilities for intelligent systems to tackle a problem that humans have only recently themselves considered: program synthesis. This challenge is unlike others such as object recognition and speech translation, since its abstract nature and demand for rigor make it difficult even for human minds to attempt. While it is still far from being solved or even competitive with most existing methods, neural program synthesis is a rapidly growing discipline which holds great promise if completely realized. In this paper, we start with exploring the problem statement and challenges of program synthesis. Then, we examine the fascinating evolution of program induction models, along with how they have succeeded, failed and been reimagined since. Finally, we conclude with a contrastive look at program synthesis and future research recommendations for the field.

Yes, but Did It Work?: Evaluating Variational Inference

While it’s always possible to compute a variational approximation to a posterior distribution, it can be difficult to discover problems with this approximation’. We propose two diagnostic algorithms to alleviate this problem. The Pareto-smoothed importance sampling (PSIS) diagnostic gives a goodness of fit measurement for joint distributions, while simultaneously improving the error in the estimate. The variational simulation-based calibration (VSBC) assesses the average performance of point estimates.

VISER: Visual Self-Regularization

In this work, we propose the use of large set of unlabeled images as a source of regularization data for learning robust visual representation. Given a visual model trained by a labeled dataset in a supervised fashion, we augment our training samples by incorporating large number of unlabeled data and train a semi-supervised model. We demonstrate that our proposed learning approach leverages an abundance of unlabeled images and boosts the visual recognition performance which alleviates the need to rely on large labeled datasets for learning robust representation. To increment the number of image instances needed to learn robust visual models in our approach, each labeled image propagates its label to its nearest unlabeled image instances. These retrieved unlabeled images serve as local perturbations of each labeled image to perform Visual Self-Regularization (VISER). To retrieve such visual self regularizers, we compute the cosine similarity in a semantic space defined by the penultimate layer in a fully convolutional neural network. We use the publicly available Yahoo Flickr Creative Commons 100M dataset as the source of our unlabeled image set and propose a distributed approximate nearest neighbor algorithm to make retrieval practical at that scale. Using the labeled instances and their regularizer samples we show that we significantly improve object categorization and localization performance on the MS COCO and Visual Genome datasets where objects appear in context.

Critical Percolation as a Framework to Analyze the Training of Deep Networks
Asymptotic Analysis of Normalized SNR-Based Scheduling in Uplink Cellular Networks with Truncated Channel Inversion Power Control
On the asymptotic of exit problems for controlled Markov diffusion processes with random jumps and vanishing diffusion terms
On the polynomial Szemerédi’s theorem in finite fields
On the Preliminary Investigation of Selfish Mining Strategy with Multiple Selfish Miners
Scalable Meta-Learning for Bayesian Optimization
Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries
De Finetti’s theorem: rate of convergence in Kolmogorov distance
Approximation Methods for Bilevel Programming
An Imputation-Consistency Algorithm for High-Dimensional Missing Data Problems and Beyond
Trajectory-driven Influential Billboard Placement
Second order backward SDE with random terminal time
Dynamic regulation of resource transport induces criticality in multilayer networks of excitable units
Erasure correction of scalar codes in the presence of stragglers
Learning interacting particle systems: diffusion parameter estimation for aggregation equations
Error correction in fast matrix multiplication and inverse
Universal Deep Neural Network Compression
A Critical Investigation of Deep Reinforcement Learning for Navigation
From Game-theoretic Multi-agent Log Linear Learning to Reinforcement Learning
Four-coloring $P_6$-free graphs. I. Extending an excellent precoloring
Four-coloring $P_6$-free graphs. II. Finding an excellent precoloring
An MCMC Algorithm for Estimating the Q-matrix in a Bayesian Framework
On the sum of projectors onto convex sets
Spatial Modulation Assisted Multi-Antenna Non-Orthogonal Multiple Access
Spectral Image Visualization Using Generative Adversarial Networks
A comprehensive review of 3D point cloud descriptors
Random taste heterogeneity in discrete choice models: Flexible nonparametric finite mixture distributions
Game Data Mining Competition on Churn Prediction and Survival Analysis using Commercial Game Log Data
An example showing that A-lower semi-continuity is essential for minimax continuity theorems
Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder
Impacts of Environmental Noises upon Asymptotic Behavior of SIRS Models
An Empirical Evaluation of Deep Learning for ICD-9 Code Assignment using MIMIC-III Clinical Notes
On The Hardness of Approximate and Exact (Bichromatic) Maximum Inner Product
Reduced basis approximation and a~posteriori error bounds for 4D-Var data assimilation
A Schematic Definition of Quantum Polynomial Time Computability
Improved Incremental First-Order Oracle Complexity of Variance Reduced Methods for Nonsmooth Convex Stochastic Composition Optimization
Outlier Detection for Robust Multi-dimensional Scaling
Energy-Efficient CMOS Memristive Synapses for Mixed-Signal Neuromorphic System-on-a-Chip
Multi-View Bayesian Correlated Component Analysis
On Synthesis of Reversible Circuits with Small Number of Additional Inputs Consisting of NOT, CNOT and 2-CNOT Gates
SlideRunner – A Tool for Massive Cell Annotations in Whole Slide Images
Large-deviation Properties of Linear-programming Computational Hardness of the Vertex Cover Problem
Wishart laws and variance function on homogeneous cones
Scaling limits of general population processes – Wright-Fisher and branching processes in random environment
Group kernels for Gaussian process metamodels with categorical inputs
Activity induced synchronization
ShakeDrop regularization
Optimal data structures for stochastic driven simulations
The $b$-branching problem in digraphs
New Cramer-Rao-Type Bound for Constrained Parameter Estimation
Definable Ellipsoid Method, Sums-of-Squares Proofs, and the Isomorphism Problem
Real zeros of random analytic functions associated with geometries of constant curvature
An improved upper bound for critical value of the contact process on $\mathbb{Z}^d$ with $d\geq 3$
Super-resolution of spatiotemporal event-based image
Unique Quasi-Stationary Distribution, with a possibly stabilizing extinction
Gundy-Varopoulos martingale transforms and their projection operators on manifolds and vector bundles
Analysis of stochastic bifurcations with phase portraits
Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
A polynomial time algorithm for the linearization problem of the QSPP and its applications
Maintenance of diversity in a parasite population capable of persistence and reinfection
Evolutionary Computation plus Dynamic Programming for the Bi-Objective Travelling Thief Problem
Stochastic Deconvolutional Neural Network Ensemble Training on Generative Pseudo-Adversarial Networks
Pixel-Level Alignment of Facial Images for High Accuracy Recognition Using Ensemble of Patches
Tropicalized quartics and canonical embeddings for tropical curves of genus 3
Field extensions, Derivations, and Matroids over Skew Hyperfields
Mixtures of Factor Analyzers with Fundamental Skew Symmetric Distributions
Efficient Learning of Bounded-Treewidth Bayesian Networks from Complete and Incomplete Data Sets
On the gaps between consecutive primes
Out-of-Band Radiation from Antenna Arrays Clarified
On the Stability of Independence Polynomials
BROJA-2PID: A robust estimator for bivariate partial information decomposition
SCH-GAN: Semi-supervised Cross-modal Hashing by Generative Adversarial Network
The law of a point process of Brownian excursions in a domain is determined by the law of its trace
Privacy preserving clustering with constraints
Spectral Learning of Binomial HMMs for DNA Methylation Data
The intrinsic geometry of coarse median spaces and their intervals
Cadre Modeling: Simultaneously Discovering Subpopulations and Predictive Models
Origin of Pseudo-Stability in Stress-Induced Damage Evolution Process
DeepHeart: Semi-Supervised Sequence Learning for Cardiovascular Risk Prediction
Large deviations of reaction fluxes
Disconnection by level sets of the discrete Gaussian free field and entropic repulsion
Combinatorial views on persistent characters in phylogenetics
Spin-charge separation and many-body localization
Classification of Things in DBpedia using Deep Neural Networks
Fair comparison of skin detection approaches on publicly available datasets
A Spatial Mapping Algorithm with Applications in Deep Learning-Based Structure Classification
Factors of generalised polynomials and automatic sequences
FixaTons: A collection of Human Fixations Datasets and Metrics for Scanpath Similarity
Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers
Self-stabilizing processes
A Dynamic Programming Approach to Evaluating Multivariate Gaussian Probabilities
Learning One Convolutional Layer with Overlapping Patches
Semi-Amortized Variational Autoencoders
Local Convergence Properties of SAGA/Prox-SVRG and Acceleration
Current Flow Group Closeness Centrality for Complex Networks
Sparse Linear Discriminant Analysis under the Neyman-Pearson Paradigm
Intentional control of type I error over unconscious data distortion: a Neyman-Pearson classification approach
Cubic Preferences and the Character Admissibility Problem
Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning
Fair-by-design algorithms: matching problems and beyond
A global linear and local superlinear/quadratic inexact non-interior continuation method for variational inequalities
Applying Cooperative Machine Learning to Speed Up the Annotation of Social Signals in Large Multi-modal Corpora
An improved multi-parametric programming algorithm for flux balance analysis of metabolic networks