Paper: mgcpy: A Comprehensive High Dimensional Independence Testing Python Package

With the increase in the amount of data in many fields, a method to consistently and efficiently decipher relationships within high dimensional data sets is important. Because many modern datasets are high-dimensional, univariate independence tests are not applicable. While many multivariate independence tests have R packages available, the interfaces are inconsistent, most are not available in Python. mgcpy is an extensive Python library that includes many state of the art high-dimensional independence testing procedures using a common interface. The package is easy-to-use and is flexible enough to enable future extensions. This manuscript provides details for each of the tests as well as extensive power and run-time benchmarks on a suite of high-dimensional simulations previously used in different publications. The appendix includes demonstrations of how the user can interact with the package, as well as links and documentation.

Paper: Interpretable Counterfactual Explanations Guided by Prototypes

We propose a fast, model agnostic method for finding interpretable counterfactual explanations of classifier predictions by using class prototypes. We show that class prototypes, obtained using either an encoder or through class specific k-d trees, significantly speed up the the search for counterfactual instances and result in more interpretable explanations. We introduce two novel metrics to quantitatively evaluate local interpretability at the instance level. We use these metrics to illustrate the effectiveness of our method on an image and tabular dataset, respectively MNIST and Breast Cancer Wisconsin (Diagnostic). The method also eliminates the computational bottleneck that arises because of numerical gradient evaluation for $\textit{black box}$ models.

Paper: Explaining Predictions from Tree-based Boosting Ensembles

Understanding how ‘black-box’ models arrive at their predictions has sparked significant interest from both within and outside the AI community. Our work focuses on doing this by generating local explanations about individual predictions for tree-based ensembles, specifically Gradient Boosting Decision Trees (GBDTs). Given a correctly predicted instance in the training set, we wish to generate a counterfactual explanation for this instance, that is, the minimal perturbation of this instance such that the prediction flips to the opposite class. Most existing methods for counterfactual explanations are (1) model-agnostic, so they do not take into account the structure of the original model, and/or (2) involve building a surrogate model on top of the original model, which is not guaranteed to represent the original model accurately. There exists a method specifically for random forests; we wish to extend this method for GBDTs. This involves accounting for (1) the sequential dependency between trees and (2) training on the negative gradients instead of the original labels.

Paper: Analyses of ‘change scores’ do not estimate causal effects in observational data

Background: In longitudinal data, it is common to create ‘change scores’ by subtracting measurements taken at baseline from those taken at follow-up, and then to analyse the resulting ‘change’ as the outcome variable. In observational data, this approach can produce misleading causal effect estimates. The present article uses directed acyclic graphs (DAGs) and simple simulations to provide an accessible explanation of why change scores do not estimate causal effects in observational data. Methods: Data were simulated to match three general scenarios where the variable representing measurements of the outcome at baseline was a 1) competing exposure, 2) confounder, or 3) mediator for the total causal effect of the exposure on the variable representing measurements of the outcome at follow-up. Regression coefficients were compared between change-score analyses and DAG-informed analyses. Results: Change-score analyses do not provide meaningful causal effect estimates unless the variable representing measurements of the outcome at baseline is a competing exposure, as in a randomised experiment. Where such variables (i.e. baseline measurements of the outcome) are confounders or mediators, the conclusions drawn from analyses of change scores diverge (potentially substantially) from those of DAG-informed analyses. Conclusions: Future observational studies that seek causal effect estimates should avoid analysing change scores and adopt alternative analytical strategies.

Paper: Invariant Risk Minimization

We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.

Paper: Kolmogorov’s Algorithmic Mutual Information Is Equivalent to Bayes’ Law

Given two events $A$ and $B$, Bayes’ law is based on the argument that the probability of $A$ given $B$ is proportional to the probability of $B$ given $A$. When probabilities are interpreted in the Bayesian sense, Bayes’ law constitutes a learning algorithm which shows how one can learn from a new observation to improve their belief in a theory that is consistent with that observation. Kolmogorov’s notion of algorithmic information, which is based on the theory of algorithms, proposes an objective measure of the amount of information in a finite string about itself and concludes that for any two finite strings $x$ and $y$, the amount of information in $x$ about $y$ is almost equal to the amount of information in $y$ about $x$. We view this conclusion of Kolmogorov as the algorithmic information version of Bayes’ law. This can be easily demonstrated if one considers the work of Levin on prefix Kolmogorov complexity and then expresses the amount of Kolmogorov mutual information between two finite strings using Solomonoff’s a priori probability.