Paper: Confounding and Regression Adjustment in Difference-in-Differences
Difference-in-differences (diff-in-diff) is a study design that compares outcomes of two groups (treated and comparison) at two time points (pre- and post-treatment) and is widely used in evaluating new policy implementations. For instance, diff-in-diff has been used to estimate the effect that increasing minimum wage has on employment rates and to assess the Affordable Care Act’s effect on health outcomes. Although diff-in-diff appears simple, potential pitfalls lurk. In this paper, we discuss one such complication: time-varying confounding. We provide rigorous definitions for confounders in diff-in-diff studies and explore regression strategies to adjust for confounding. In simulations, we show how and when regression adjustment can ameliorate confounding for both time-invariant and time-varying covariates. We compare our regression approach to those models commonly fit in applied literature, which often fail to address the time-varying nature of confounding in diff-in-diff.
Paper: Actionable Interpretability through Optimizable Counterfactual Explanations for Tree Ensembles
Counterfactual explanations help users understand why machine learned models make certain decisions, and more specifically, how these decisions can be changed. In this work, we frame the problem of finding counterfactual explanations — the minimal perturbation to an input such that the prediction changes — as an optimization task. Previously, optimization techniques for generating counterfactual examples could only be applied to differentiable models, or alternatively via query access to the model by estimating gradients from randomly sampled perturbations. In order to accommodate non-differentiable models such as tree ensembles, we propose using probabilistic model approximations in the optimization framework. We introduce a novel approximation technique that is effective for finding counterfactual explanations while also closely approximating the original model. Our results show that our method is able to produce counterfactual examples that are closer to the original instance in terms of Euclidean, Cosine, and Manhattan distance compared to other methods specifically designed for tree ensembles.
Article: An introduction to Causal inference
Causal inference goes beyond prediction by modeling the outcome of interventions and formalizing counterfactual reasoning. In this blog post, I provide an introduction to the graphical approach to causal inference in the tradition of Sewell Wright, Judea Pearl, and others. We first rehash the common adage that correlation is not causation. We then move on to climb what Pearl calls the ‘ladder of causal inference’, from association (seeing) to intervention (doing) to counterfactuals (imagining). We will discover how directed acyclic graphs describe conditional (in)dependencies; how the do-calculus describes interventions; and how Structural Causal Models allow us to imagine what could have been. This blog post is by no means exhaustive, but should give you a first appreciation of the concepts that surround causal inference; references to further readings are provided below. Let’s dive in!1
Python Library: causalnex
Paper: Causal inference of hazard ratio based on propensity score matching
Propensity score matching is commonly used to draw causal inference from observational survival data. However, there is no gold standard approach to analyze survival data after propensity score matching, and variance estimation after matching is open to debate. We derive the statistical properties of the propensity score matching estimator of the marginal causal hazard ratio based on matching with replacement and a fixed number of matches. We also propose a double-resampling technique for variance estimation that takes into account the uncertainty due to propensity score estimation prior to matching.
Paper: Improving Model Robustness Using Causal Knowledge
For decades, researchers in fields, such as the natural and social sciences, have been verifying causal relationships and investigating hypotheses that are now well-established or understood as truth. These causal mechanisms are properties of the natural world, and thus are invariant conditions regardless of the collection domain or environment. We show in this paper how prior knowledge in the form of a causal graph can be utilized to guide model selection, i.e., to identify from a set of trained networks the models that are the most robust and invariant to unseen domains. Our method incorporates prior knowledge (which can be incomplete) as a Structural Causal Model (SCM) and calculates a score based on the likelihood of the SCM given the target predictions of a candidate model and the provided input variables. We show on both publicly available and synthetic datasets that our method is able to identify more robust models in terms of generalizability to unseen out-of-distribution test examples and domains where covariates have shifted.
Marginal structural models (MSMs) are commonly used to estimate causal intervention effects in longitudinal non-randomised studies. A common issue when analysing data from observational studies is the presence of incomplete confounder data, which might lead to bias in the intervention effect estimates if they are not handled properly in the statistical analysis. However, there is currently no recommendation on how to address missing data on covariates in MSMs under a variety of missingness mechanisms encountered in practice. We reviewed existing methods to handling missing data in MSMs and performed a simulation study to compare the performance of complete case (CC) analysis, the last observation carried forward (LOCF), the missingness pattern approach (MPA), multiple imputation (MI) and inverse-probability-of-missingness weighting (IPMW). We considered three mechanisms for non-monotone missing data which are common in observational studies using electronic health record data. Whereas CC analysis lead to biased estimates of the intervention effect in almost all scenarios, the performance of the other approaches varied across scenarios. The LOCF approach led to unbiased estimates only under a specific non-random mechanism in which confounder values were missing when their values remained unchanged since the previous measurement. In this scenario, MI, the MPA and IPMW were biased. MI and IPMW led to the estimation of unbiased effects when data were missing at random, given the covariates or the treatment but only MI was unbiased when the outcome was a predictor of missingness. Furthermore, IPMW generally lead to very large standard errors. Lastly, regardless of the missingness mechanism, the MPA led to unbiased estimates only when the failure to record a confounder at a given time-point modified the subsequent relationships between the partially observed covariate and the outcome.
Paper: Entropy, mutual information, and systematic measures of structured spiking neural networks
The aim of this paper is to investigate various information-theoretic measures, including entropy, mutual information, and some systematic measures that based on mutual information, for a class of structured spiking neuronal network. In order to analyze and compute these information-theoretic measures for large networks, we coarse-grained the data by ignoring the order of spikes that fall into the same small time bin. The resultant coarse-grained entropy mainly capture the information contained in the rhythm produced by a local population of the network. We first proved that these information theoretical measures are well-defined and computable by proving the stochastic stability and the law of large numbers. Then we use three neuronal network examples, from simple to complex, to investigate these information-theoretic measures. Several analytical and computational results about properties of these information-theoretic measures are given.