Paper: Bayesian Matrix Completion Approach to Causal Inference with Panel Data

This study proposes a new Bayesian approach to infer average treatment effect. The approach treats counterfactual untreated outcomes as missing observations and infers them by completing a matrix composed of realized and potential untreated outcomes using a data augmentation technique. We also develop a tailored prior that helps in the identification of parameters and induces the matrix of the untreated outcomes to be approximately low rank. While the proposed approach is similar to synthetic control methods and other relevant methods, it has several notable advantages. Unlike synthetic control methods, the proposed approach does not require stringent assumptions. Whereas synthetic control methods do not have a statistically grounded method to quantify uncertainty about inference, the proposed approach can estimate credible sets in a straightforward and consistent manner. Our proposal approach has a better finite sample performance than the existing Bayesian and non-Bayesian approaches, as we show through a series of simulation studies.

Paper: Stabilizing Variable Selection and Regression

We consider regression in which one predicts a response $Y$ with a set of predictors $X$ across different experiments or environments. This is a common setup in many data-driven scientific fields and we argue that statistical inference can benefit from an analysis that takes into account the distributional changes across environments. In particular, it is useful to distinguish between stable and unstable predictors, i.e., predictors which have a fixed or a changing functional dependence on the response, respectively. We introduce stabilized regression which explicitly enforces stability and thus improves generalization performance to previously unseen environments. Our work is motivated by an application in systems biology. Using multiomic data, we demonstrate how hypothesis generation about gene function can benefit from stabilized regression. We believe that a similar line of arguments for exploiting heterogeneity in data can be powerful for many other applications as well. We draw a theoretical connection between multi-environment regression and causal models, which allows to graphically characterize stable versus unstable functional dependence on the response. Formally, we introduce the notion of a stable blanket which is a subset of the predictors that lies between the direct causal predictors and the Markov blanket. We prove that this set is optimal in the sense that a regression based on these predictors minimizes the mean squared prediction error given that the resulting regression generalizes to unseen new environments.

Article: Three Essays on Statistical Inference and Estimation for Heterogeneous Causal Effects in Economics

Learning about causal relationships on the basis of empirical observation is not a trivial task. The classical scientific approach of running repeated independent controlled randomized experiments to measure effects of causes is limited in practice due to ethical, financial, political, temporal or other constraints. In particular, it can be infeasible to replicate controlled conditions to answer real-world questions regarding the impact of a policy or a program currently in place, i.e. to measure the effect of a treatment on a set of outcomes of interest. However, for empirically informed policy or general decision making, knowing about the fundamental causal relationships is crucial.

Paper: Why X rather than Y? Explaining Neural Model’ Predictions by Generating Intervention Counterfactual Samples

Even though the topic of explainable AI/ML is very popular in text and computer vision domain, most of the previous literatures are not suitable for explaining black-box models’ predictions on general data mining datasets. This is because these datasets are usually in high-dimensional vectored features format that are not as friendly and comprehensible as texts and images to the end users. In this paper, we combine the best of both worlds: ‘explanations by intervention’ from causality and ‘explanations are contrastive’ from philosophy and social science domain to explain neural models’ predictions for tabular datasets. Specifically, given a model’s prediction as label X, we propose a novel idea to intervene and generate minimally modified contrastive sample to be classified as Y, that then results in a simple natural text giving answer to the question ‘Why X rather than Y?’. We carry out experiments with several datasets of different scales and compare our approach with other baselines on three different areas: fidelity, reasonableness and explainability.

Paper: Integrating Markov processes with structural causal modeling enables counterfactual inference in complex systems

This manuscript contributes a general and practical framework for casting a Markov process model of a system at equilibrium as a structural causal model, and carrying out counterfactual inference. Markov processes mathematically describe the mechanisms in the system, and predict the system’s equilibrium behavior upon intervention, but do not support counterfactual inference. In contrast, structural causal models support counterfactual inference, but do not identify the mechanisms. This manuscript leverages the benefits of both approaches. We define the structural causal models in terms of the parameters and the equilibrium dynamics of the Markov process models, and counterfactual inference flows from these settings. The proposed approach alleviates the identifiability drawback of the structural causal models, in that the counterfactual inference is consistent with the counterfactual trajectories simulated from the Markov process model. We showcase the benefits of this framework in case studies of complex biomolecular systems with nonlinear dynamics. We illustrate that, in presence of Markov process model misspecification, counterfactual inference leverages prior data, and therefore estimates the outcome of an intervention more accurately than a direct simulation.

Paper: MBCAL: A Simple and Efficient Reinforcement Learning Method for Recommendation Systems

It has been widely regarded that only considering the immediate user feedback is not sufficient for modern industrial recommendation systems. Many previous works attempt to maximize the long term rewards with Reinforcement Learning(RL). However, model-free RL suffers from problems including significant variance in gradient, long convergence period, and requirement of sophisticated online infrastructures. While model-based RL provides a sample-efficient choice, the cost of planning in an online system is unacceptable. To achieve high sample efficiency in practical situations, we propose a novel model-based reinforcement learning method, namely the model-based counterfactual advantage learning(MBCAL). In the proposed method, a masking item is introduced in the environment model learning. With the masking item and the environment model, we introduce the counterfactual future advantage, which eliminates most of the noises in long term rewards. The proposed method selects through approximating the immediate reward and future advantage separately. It is easy to implement, yet it requires reasonable cost in both training and inference processes. In the experiments, we compare our methods with several baselines, including supervised learning, model-free RL, and other model-based RL methods in carefully designed experiments. Results show that our method transcends all the baselines in both sample efficiency and asymptotic performance.

Paper: Group Average Treatment Effects for Observational Studies

The paper proposes an estimator to make inference on key features of heterogeneous treatment effects sorted by impact groups (GATES) for non-randomised experiments. Observational studies are standard in policy evaluation from labour markets, educational surveys, and other empirical studies. To control for a potential selection-bias we implement a doubly-robust estimator in the first stage. Keeping the flexibility to use any machine learning method to learn the conditional mean functions as well as the propensity score we also use machine learning methods to learn a function for the conditional average treatment effect. The group average treatment effect is then estimated via a parametric linear model to provide p-values and confidence intervals. The result is a best linear predictor for effect heterogeneity based on impact groups. Cross-splitting and averaging for each observation is a further extension to avoid biases introduced through sample splitting. The advantage of the proposed method is a robust estimation of heterogeneous group treatment effects under mild assumptions, which is comparable with other models and thus keeps its flexibility in the choice of machine learning methods. At the same time, its ability to deliver interpretable results is ensured.

Paper: Deriving pairwise transfer entropy from network structure and motifs

Transfer entropy is an established method for quantifying directed statistical dependencies in neuroimaging and complex systems datasets. The pairwise (or bivariate) transfer entropy from a source to a target node in a network does not depend solely on the local source-target link weight, but on the wider network structure that the link is embedded in. This relationship is studied using a discrete-time linearly-coupled Gaussian model, which allows us to derive the transfer entropy for each link from the network topology. It is shown analytically that the dependence on the directed link weight is only a first approximation, valid for weak coupling. More generally, the transfer entropy increases with the in-degree of the source and decreases with the in-degree of the target, indicating an asymmetry of information transfer between hubs and low-degree nodes. In addition, the transfer entropy is directly proportional to weighted motif counts involving common parents or multiple walks from the source to the target, which are more abundant in networks with a high clustering coefficient than in random networks. Our findings also apply to Granger causality, which is equivalent to transfer entropy for Gaussian variables. Moreover, similar empirical results on random Boolean networks suggest that the dependence of the transfer entropy on the in-degree extends to nonlinear dynamics.