We discuss promising recent contributions on quantifying feature relevance using Shapley values, where we observed some confusion on which probability distribution is the right one for dropped features. We argue that the confusion is based on not carefully distinguishing between *observational* and *interventional* conditional probabilities and try a clarification based on Pearl’s seminal work on causality. We conclude that *unconditional* rather than *conditional* expectations provide the right notion of *dropping* features in contradiction to the theoretical justification of the software package SHAP. Parts of SHAP are unaffected because unconditional expectations (which we argue to be conceptually right) are used as *approximation* for the conditional ones, which encouraged others to ‘improve’ SHAP in a way that we believe to be flawed. Further, our criticism concerns TreeExplainer in SHAP, which really uses conditional expectations (without approximating them by unconditional ones)
One of the most common mistakes made when performing data analysis is attributing causal meaning to regression coefficients. Formally, a causal effect can only be computed if it is identifiable from a combination of observational data and structural knowledge about the domain under investigation (Pearl, 2000, Ch. 5). Building on the literature of instrumental variables (IVs), a plethora of methods has been developed to identify causal effects in linear systems. Almost invariably, however, the most powerful such methods rely on exponential-time procedures. In this paper, we investigate graphical conditions to allow efficient identification in arbitrary linear structural causal models (SCMs). In particular, we develop a method to efficiently find unconditioned instrumental subsets, which are generalizations of IVs that can be used to tame the complexity of many canonical algorithms found in the literature. Further, we prove that determining whether an effect can be identified with TSID (Weihs et al., 2017), a method more powerful than unconditioned instrumental sets and other efficient identification algorithms, is NP-Complete. Finally, building on the idea of flow constraints, we introduce a new and efficient criterion called Instrumental Cutsets (IC), which is able to solve for parameters missed by all other existing polynomial-time algorithms.
Recent conceptual discussion on the nature of the explainability of Artificial Intelligence (AI) has largely been limited to data-driven investigations. This paper identifies some shortcomings of this approach to help strengthen the debate on this subject. Building on recent philosophical work on the nature of explanations, I demonstrate the significance of two non-data driven, non-causal explanatory elements: (1) mathematical structures that are the grounds for capturing the decision-making situation; (2) statistical and optimality facts in terms of which the algorithm is designed and implemented. I argue that these elements feature directly in important aspects of AI explainability. I then propose a hierarchical framework that acknowledges the existence of various types of explanation, each of which reveals an aspect of explanation, and answers to a different kind of why-question. The usefulness of this framework will be illustrated by bringing it to bear on some salient normative concerns about the use of AI decision-making systems in society.
Causal inference can be formalized as Bayesian inference that combines a prior distribution over causal models and likelihoods that account for both observations and interventions. We show that it is possible to implement this approach using a sufficiently expressive probabilistic programming language. Priors are represented using probabilistic programs that generate source code in a domain specific language. Interventions are represented using probabilistic programs that edit this source code to modify the original generative process. This approach makes it straightforward to incorporate data from atomic interventions, as well as shift interventions, variance-scaling interventions, and other interventions that modify causal structure. This approach also enables the use of general-purpose inference machinery for probabilistic programs to infer probable causal structures and parameters from data. This abstract describes a prototype of this approach in the Gen probabilistic programming language.
Recent developments have linked causal inference with Algorithmic Information Theory, and methods have been developed that utilize Conditional Kolmogorov Complexity to determine causation between two random variables. However, past methods that utilize Algorithmic Information Theory do not provide off-the-shelf support for continuous data types and thus lose the essence of the shape of data. We present a method for inferring causal direction between continuous variables by using an MDL Binning technique for data discretization and complexity calculation. Our method captures the shape of the data and uses it to determine which variable has more information about the other. Its high predictive performance and robustness is shown on several real world use cases.
Reasoning about observed effects and their causes is important in multi-agent contexts. While there has been much work on causality from an objective standpoint, causality from the point of view of some particular agent has received much less attention. In this paper, we address this issue by incorporating an epistemic dimension to an existing formal model of causality. We define what it means for an agent to know the causes of an effect. Then using a counterexample, we prove that epistemic causality is a different notion from its objective counterpart.
Granger causality and variants of this concept allow the study of complex dynamical systems as networks constructed from multivariate time series. In this work, a large number of Granger causality measures used to form causality networks from multivariate time series are assessed. These measures are in the time domain, such as model-based and information measures, the frequency domain and the phase domain. The study aims also to compare bivariate and multivariate measures, linear and nonlinear measures, as well as the use of dimension reduction in linear model-based measures and information measures. The latter is particular relevant in the study of high-dimensional time series. For the performance of the multivariate causality measures, low and high dimensional coupled dynamical systems are considered in discrete and continuous time, as well as deterministic and stochastic. The measures are evaluated and ranked according to their ability to provide causality networks that match the original coupling structure. The simulation study concludes that the Granger causality measures using dimension reduction are superior and should be preferred particularly in studies involving many observed variables, such as multi-channel electroencephalograms and financial markets.
Constructing gene regulatory networks is a critical step in revealing disease mechanisms from transcriptomic data. In this work, we present NO-BEARS, a novel algorithm for estimating gene regulatory networks. The NO-BEARS algorithm is built on the basis of the NOTEARS algorithm with two improvements. First, we propose a new constraint and its fast approximation to reduce the computational cost of the NO-TEARS algorithm. Next, we introduce a polynomial regression loss to handle non-linearity in gene expressions. Our implementation utilizes modern GPU computation that can decrease the time of hours-long CPU computation to seconds. Using synthetic data, we demonstrate improved performance, both in processing time and accuracy, on inferring gene regulatory networks from gene expression data.