Paper: Compositional Hierarchical Tensor Factorization: Representing Hierarchical Intrinsic and Extrinsic Causal Factors

Visual objects are composed of a recursive hierarchy of perceptual wholes and parts, whose properties, such as shape, reflectance, and color, constitute a hierarchy of intrinsic causal factors of object appearance. However, object appearance is the compositional consequence of both an object’s intrinsic and extrinsic causal factors, where the extrinsic causal factors are related to illumination, and imaging conditions. Therefore, this paper proposes a unified tensor model of wholes and parts, and introduces a compositional hierarchical tensor factorization that disentangles the hierarchical causal structure of object image formation, and subsumes multilinear block tensor decomposition as a special case. The resulting object representation is an interpretable combinatorial choice of wholes’ and parts’ representations that renders object recognition robust to occlusion and reduces training data requirements. We demonstrate ourapproach in the context of face recognition by training on an extremely reduced dataset of synthetic images, and report encouragingface verification results on two datasets – the Freiburg dataset, andthe Labeled Face in the Wild (LFW) dataset consisting of real world images, thus, substantiating the suitability of our approach for data starved domains.


Paper: A Unified Framework for Causal Inference with Multiple Imputation Using Martingale

Multiple imputation is widely used to handle confounders missing at random in causal inference. Although Rubin’s combining rule is simple, it is not clear weather or not the standard multiple imputation inference is consistent when coupled with the commonly-used average causal effect (ACE) estimators. This article establishes a unified martingale representation for the average causal effect (ACE) estimators after multiple imputation. This representation invokes the wild bootstrap inference to provide consistent variance estimation. Our framework applies to asymptotically normal ACE estimators, including the regression imputation, weighting, and matching estimators. We extend to the scenarios when both outcome and confounders are subject to missingness and when the data are missing not at random.


Paper: Causality-based tests to detect the influence of confounders on mobile health diagnostic applications: a comparison with restricted permutations

Machine learning practice is often impacted by confounders. Confounding can be particularly severe in remote digital health studies where the participants self-select to enter the study. While many different confounding adjustment approaches have been proposed in the literature, most of these methods rely on modeling assumptions, and it is unclear how robust they are to violations of these assumptions. This realization has recently motivated the development of restricted permutation methods to quantify the influence of observed confounders on the predictive performance of a machine learning models and evaluate if confounding adjustment methods are working as expected. In this paper we show, nonetheless, that restricted permutations can generate biased estimates of the contribution of the confounders to the predictive performance of a learner, and we propose an alternative approach to tackle this problem. By viewing a classification task from a causality perspective, we are able to leverage conditional independence tests between predictions and test set labels and confounders in order to detect confounding on the predictive performance of a classifier. We illustrate the application of our causality-based approach to data collected from mHealth study in Parkinson’s disease.


Paper: Long-range Event-level Prediction and Response Simulation for Urban Crime and Global Terrorism with Granger Networks

Large-scale trends in urban crime and global terrorism are well-predicted by socio-economic drivers, but focused, event-level predictions have had limited success. Standard machine learning approaches are promising, but lack interpretability, are generally interpolative, and ineffective for precise future interventions with costly and wasteful false positives. Here, we are introducing Granger Network inference as a new forecasting approach for individual infractions with demonstrated performance far surpassing past results, yet transparent enough to validate and extend social theory. Considering the problem of predicting crime in the City of Chicago, we achieve an average AUC of ~90\% for events predicted a week in advance within spatial tiles approximately $1000$ ft across. Instead of pre-supposing that crimes unfold across contiguous spaces akin to diffusive systems, we learn the local transport rules from data. As our key insights, we uncover indications of suburban bias — how law-enforcement response is modulated by socio-economic contexts with disproportionately negative impacts in the inner city — and how the dynamics of violent and property crimes co-evolve and constrain each other — lending quantitative support to controversial pro-active policing policies. To demonstrate broad applicability to spatio-temporal phenomena, we analyze terror attacks in the middle-east in the recent past, and achieve an AUC of ~80% for predictions made a week in advance, and within spatial tiles measuring approximately 120 miles across. We conclude that while crime operates near an equilibrium quickly dissipating perturbations, terrorism does not. Indeed terrorism aims to destabilize social order, as shown by its dynamics being susceptible to run-away increases in event rates under small perturbations.


Paper: Guidelines for estimating causal effects in pragmatic randomized trials

Pragmatic randomized trials are designed to provide evidence for clinical decision-making rather than regulatory approval. Common features of these trials include the inclusion of heterogeneous or diverse patient populations in a wide range of care settings, the use of active treatment strategies as comparators, unblinded treatment assignment, and the study of long-term, clinically relevant outcomes. These features can greatly increase the usefulness of the trial results for patients, clinicians, and other stakeholders. However, these features also introduce an increased risk of non-adherence, which reduces the value of the intention-to-treat effect as a patient-centered measure of causal effect. In these settings, the per-protocol effect provides useful complementary information for decision making. Unfortunately, there is little guidance for valid estimation of the per-protocol effect. Here, we present our full guidelines for analyses of pragmatic trials that will result in more informative causal inferences for both the intention-to-treat effect and the per-protocol effect.


Python Library: cause-ml

Causal ML benchmarking and development tools


Article: The 10 Bias and Causality Techniques of that Everyone Needs to Master.

In the end what does Causality have to do with machine learning? Machine Learning is about prediction and causality about real effects, do these two themes have something in common? Yes, a lot in common and this series of posts tries to bridge these two sub-areas of Data Science. I like to think that Machine Learning is just a Data Grinder, if you put good quality data, you get good quality predictions, but if you put garbage, it will keep grinding, but don’t expect good predictions to come out, it’s just ground garbage , and that’s what we’ll talk about in this post.


Paper: Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling

Convolutional neural networks (CNNs) with dilated filters such as the Wavenet or the Temporal Convolutional Network (TCN) have shown good results in a variety of sequence modelling tasks. However, efficiently modelling long-term dependencies in these sequences is still challenging. Although the receptive field of these models grows exponentially with the number of layers, computing the convolutions over very long sequences of features in each layer is time and memory-intensive, prohibiting the use of longer receptive fields in practice. To increase efficiency, we make use of the ‘slow feature’ hypothesis stating that many features of interest are slowly varying over time. For this, we use a U-Net architecture that computes features at multiple time-scales and adapt it to our auto-regressive scenario by making convolutions causal. We apply our model (‘Seq-U-Net’) to a variety of tasks including language and audio generation. In comparison to TCN and Wavenet, our network consistently saves memory and computation time, with speed-ups for training and inference of over 4x in the audio generation experiment in particular, while achieving a comparable performance in all tasks.