Paper: Dis-entangling Mixture of Interventions on a Causal Bayesian Network Using Aggregate Observations
We study the problem of separating a mixture of distributions, all of which come from interventions on a known causal bayesian network. Given oracle access to marginals of all distributions resulting from interventions on the network, and estimates of marginals from the mixture distribution, we want to recover the mixing proportions of different mixture components. We show that in the worst case, mixing proportions cannot be identified using marginals only. If exact marginals of the mixture distribution were known, under a simple assumption of excluding a few distributions from the mixture, we show that the mixing proportions become identifiable. Our identifiability proof is constructive and gives an efficient algorithm recovering the mixing proportions exactly. When exact marginals are not available, we design an optimization framework to estimate the mixing proportions. Our problem is motivated from a real-world scenario of an e-commerce business, where multiple interventions occur at a given time, leading to deviations in expected metrics. We conduct experiments on the well known publicly available ALARM network and on a proprietary dataset from a large e-commerce company validating the performance of our method.
Paper: Efficient adjustment sets for population average treatment effect estimation in non-parametric causal graphical models
The method of covariate adjustment is often used for estimation of population average treatment effects in observational studies. Graphical rules for determining all valid covariate adjustment sets from an assumed causal graphical model are well known. Restricting attention to causal linear models, a recent article derived two novel graphical criteria: one to compare the asymptotic variance of linear regression treatment effect estimators that control for certain distinct adjustment sets and another to identify the optimal adjustment set that yields the least squares treatment effect estimator with the smallest asymptotic variance among consistent adjusted least squares estimators. In this paper we show that the same graphical criteria can be used in non-parametric causal graphical models when treatment effects are estimated by contrasts involving non-parametrically adjusted estimators of the interventional means. We also provide a graphical criterion for determining the optimal adjustment set among the minimal adjustment sets, which is valid for both linear and non-parametric estimators. We provide a new graphical criterion for comparing time dependent adjustment sets, that is, sets comprised by covariates that adjust for future treatments and that are themselves affected by earlier treatments. We show by example that uniformly optimal time dependent adjustment sets do not always exist. In addition, for point interventions, we provide a sound and complete graphical criterion for determining when a non-parametric optimally adjusted estimator of an interventional mean, or of a contrast of interventional means, is as efficient as an efficient estimator of the same parameter that exploits the information in the conditional independencies encoded in the non-parametric causal graphical model.
Paper: Algorithms of Data Development For Deep Learning and Feedback Design
Recent research reveals that deep learning is an effective way of solving high dimensional Hamilton-Jacobi-Bellman equations. The resulting feedback control law in the form of a neural network is computationally efficient for real-time applications of optimal control. A critical part of this design method is to generate data for training the neural network and validating its accuracy. In this paper, we provide a survey of existing algorithms that can be used to generate data. All the algorithms surveyed in this paper are causality-free, i.e., the solution at a point is computed without using the value of the function at any other points. At the end of the paper, an illustrative example of optimal feedback design using deep learning is given.
Article: Features correlations: data leakage, confounded features and other things that can make your Deep Learning model fail
As it seems from the plot that boss is showing, the more employees have shaved heads, the more the company sales increase. If you where that boss, would you consider the same action on your employees? Probably not. In fact, you recognize that there is no causality between the two sets of events, and their behaviour is similar just by chance. More clearly: the shaved heads do not cause the sales. So, we just spotted the existence of at least two possible categories of correlations: without and with causality. We also agreed that only the second one is interesting, while the other is useless, when not misleading. But let’s dive deeper.
Article: How A.I. Will Help Address the Causation Problem in Economics
For decades, economists have done their research based on data sets only as large as their research assistants could handle, hence severely limiting the scope and precision of their work. However, AI will enable economists to dramatically enlarge these data sets and analyze them at the fastest ever speeds. As both computing power and the ability to collect data about the economy increase, AI algorithms will become better and better at detecting patterns and trends in the economy. The issue of causality is the one economists have been grappling with for aeons. As you have probably heard a thousand times, correlation does not imply causation.
Paper: Simpson’s Paradox and the implications for medical trials
This paper describes Simpson’s paradox, and explains its serious implications for randomised control trials. In particular, we show that for any number of variables we can simulate the result of a controlled trial which uniformly points to one conclusion (such as ‘drug is effective’) for every possible combination of the variable states, but when a previously unobserved confounding variable is included every possible combination of the variables state points to the opposite conclusion (‘drug is not effective’). In other words no matter how many variables are considered, and no matter how ‘conclusive’ the result, one cannot conclude the result is truly ‘valid’ since there is theoretically an unobserved confounding variable that could completely reverse the result.
Paper: Confounding Adjustment Methods for Multi-level Treatment Comparisons Under Lack of Positivity and Unknown Model Specification
Imbalances in covariates between treatment groups are frequent in observational studies and can lead to biased comparisons. Various adjustment methods can be employed to correct these biases in the context of multi-level treatments ($>$ 2). However, analytical challenges, such as positivity violations and incorrect model specification, may affect their ability to yield unbiased estimates. Adjustment methods that present the best potential to deal with those challenges were identified: the overlap weights, augmented overlap weights, bias-corrected matching and targeted maximum likelihood. A simple variance estimator for the overlap weight estimators that can naturally be combined with machine learning algorithms is proposed. In a simulation study, we investigated the empirical performance of these methods as well as those of simpler alternatives, standardization, inverse probability weighting and matching. Our proposed variance estimator performed well, even at a sample size of 500. Adjustment methods that included an outcome modeling component performed better than those that only modeled the treatment mechanism. Additionally, a machine learning implementation was observed to efficiently compensate for the unknown model specification for the former methods, but not the latter. Based on these results, the wildfire data were analyzed using the augmented overlap weight estimator. With respect to effectiveness of alternate fire-suppression interventions, the results were counter-intuitive, indeed the opposite of what would be expected on subject-matter grounds. This suggests the presence in the data of unmeasured confounding bias.
Paper: Counterfactual Explanation Algorithms for Behavioral and Textual Data
We study the interpretability of predictive systems that use high-dimensonal behavioral and textual data. Examples include predicting product interest based on online browsing data and detecting spam emails or objectionable web content. Recently, counterfactual explanations have been proposed for generating insight into model predictions, which focus on what is relevant to a particular instance. Conducting a complete search to compute counterfactuals is very time-consuming because of the huge dimensionality. To our knowledge, for behavioral and text data, only one model-agnostic heuristic algorithm (SEDC) for finding counterfactual explanations has been proposed in the literature. However, there may be better algorithms for finding counterfactuals quickly. This study aligns the recently proposed Linear Interpretable Model-agnostic Explainer (LIME) and Shapley Additive Explanations (SHAP) with the notion of counterfactual explanations, and empirically benchmarks their effectiveness and efficiency against SEDC using a collection of 13 data sets. Results show that LIME-Counterfactual (LIME-C) and SHAP-Counterfactual (SHAP-C) have low and stable computation times, but mostly, they are less efficient than SEDC. However, for certain instances on certain data sets, SEDC’s run time is comparably large. With regard to effectiveness, LIME-C and SHAP-C find reasonable, if not always optimal, counterfactual explanations. SHAP-C, however, seems to have difficulties with highly unbalanced data. Because of its good overall performance, LIME-C seems to be a favorable alternative to SEDC, which failed for some nonlinear models to find counterfactuals because of the particular heuristic search algorithm it uses. A main upshot of this paper is that there is a good deal of room for further research. For example, we propose algorithmic adjustments that are direct upshots of the paper’s findings.