Paper: Boosting Local Causal Discovery in High-Dimensional Expression Data

We study how well Local Causal Discovery (LCD), a simple and efficient constraint-based method for causal discovery, is able to predict causal effects in large-scale gene expression data. We construct practical estimators specific to the high-dimensional regime. Inspired by ICP, we use an optional preselection method and two different statistical tests. Empirically, the resulting LCD estimator is seen to closely approach the accuracy of ICP, the state-of-the-art method, while it is algorithmically simpler and computationally more efficient.


Paper: Identifying causal effects in maximally oriented partially directed acyclic graphs

We develop a necessary and sufficient causal identification criterion for maximally oriented partially directed acyclic graphs (MPDAGs). MPDAGs as a class of graphs include directed acyclic graphs (DAGs), completed partially directed acyclic graphs (CPDAGs), and CPDAGs with added background knowledge. As such, they represent the type of graph that can be learned from observational data and background knowledge under the assumption of no latent variables. Our identification criterion can be seen as a generalization of the g-formula of Robins (1986). We further obtain the generalization of the truncated factorization formula for DAGs (Pearl, 2009) and compare our criterion to the generalized adjustment criterion of Perkovi\’c et al. (2017).


Paper: Kernel-based Approach to Handle Mixed Data for Inferring Causal Graphs

Causal learning is a beneficial approach to analyze the cause and effect relationships among variables in a dataset. A causal graph can be generated from a dataset using a particular causal algorithm, for instance, the PC algorithm or Fast Causal Inference (FCI). Generating a causal graph from a dataset that contains different data types (mixed data) is not trivial. This research offers an easy way to handle the mixed data so that it can be used to learn causal graphs using the existing application of the PC algorithm and FCI. This research proposes using kernel functions and Kernel Alignment to handle mixed data. Two main steps of this approach are computing a kernel matrix for each variable and calculating a pseudo-correlation matrix using Kernel Alignment. Kernel Alignment is used as a substitute for the correlation matrix for the conditional independence test for Gaussian data in the PC Algorithm and FCI. The advantage of this idea is that is possible to handle any data type by using a suitable kernel function to compute a kernel matrix for an observed variable. The proposed method is successfully applied to learn a causal graph from mixed data containing categorical, binary, ordinal, and continuous variables.


Paper: Combining No-regret and Q-learning

Counterfactual Regret Minimization (CFR) has found success in settings like poker which have both terminal states and perfect recall. We seek to understand how to relax these requirements. As a first step, we introduce a simple algorithm, local no-regret learning (LONR), which uses a Q-learning-like update rule to allow learning without terminal states or perfect recall. We prove its convergence for the basic case of MDPs (and limited extensions of them) and present empirical results showing that it achieves last iterate convergence in a number of settings, most notably NoSDE games, a class of Markov games specifically designed to be challenging to learn where no prior algorithm is known to achieve convergence to a stationary equilibrium even on average.


Paper: Causal Inference for Comprehensive Cohort Studies

In a comprehensive cohort study of two competing treatments (say, A and B), clinically eligible individuals are first asked to enroll in a randomized trial and, if they refuse, are then asked to enroll in a parallel observational study in which they can choose treatment according to their own preference. We consider estimation of two estimands: (1) comprehensive cohort causal effect — the difference in mean potential outcomes had all patients in the comprehensive cohort received treatment A vs. treatment B and (2) randomized trial causal effect — the difference in mean potential outcomes had all patients enrolled in the randomized trial received treatment A vs. treatment B. For each estimand, we consider inference under various sets of unconfoundedness assumptions and construct semiparametric efficient and robust estimators. These estimators depend on nuisance functions, which we estimate, for illustrative purposes, using generalized additive models. Using the theory of sample splitting, we establish the asymptotic properties of our proposed estimators. We also illustrate our methodology using data from the Bypass Angioplasty Revascularization Investigation (BARI) randomized trial and observational registry to evaluate the effect of percutaneous transluminal coronary balloon angioplasty versus coronary artery bypass grafting on 5-year mortality. To evaluate the finite sample performance of our estimators, we use the BARI dataset as the basis of a realistic simulation study.


Paper: MIM: Mutual Information Machine

We introduce the Mutual Information Machine (MIM), an autoencoder model for learning joint distributions over observations and latent states. The model formulation reflects two key design principles: 1) symmetry, to encourage the encoder and decoder to learn consistent factorizations of the same underlying distribution; and 2) mutual information, to encourage the learning of useful representations for downstream tasks. The objective comprises the Jensen-Shannon divergence between the encoding and decoding joint distributions, plus a mutual information term. We show that this objective can be bounded by a tractable cross-entropy loss between the true model and a parameterized approximation, and relate this to maximum likelihood estimation and variational autoencoders. Experiments show that MIM is capable of learning a latent representation with high mutual information, and good unsupervised clustering, while providing data log likelihood comparable to VAE (with a sufficiently expressive architecture).


Paper: Optimal Training of Fair Predictive Models

Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance, without requiring parametric models for high-dimensional feature vectors.


Python Library: whynot-estimators

Companion package to whynot: A collection of causal estimators.