Paper: Predicting Treatment Initiation from Clinical Time Series Data via Graph-Augmented Time-Sensitive Model

Many computational models were proposed to extract temporal patterns from clinical time series for each patient and among patient group for predictive healthcare. However, the common relations among patients (e.g., share the same doctor) were rarely considered. In this paper, we represent patients and clinicians relations by bipartite graphs addressing for example from whom a patient get a diagnosis. We then solve for the top eigenvectors of the graph Laplacian, and include the eigenvectors as latent representations of the similarity between patient-clinician pairs into a time-sensitive prediction model. We conducted experiments using real-world data to predict the initiation of first-line treatment for Chronic Lymphocytic Leukemia (CLL) patients. Results show that relational similarity can improve prediction over multiple baselines, for example a 5% incremental over long-short term memory baseline in terms of area under precision-recall curve.

Paper: Adjustment Criteria for Recovering Causal Effects from Missing Data

Confounding bias, missing data, and selection bias are three common obstacles to valid causal inference in the data sciences. Covariate adjustment is the most pervasive technique for recovering casual effects from confounding bias. In this paper, we introduce a covariate adjustment formulation for controlling confounding bias in the presence of missing-not-at-random data and develop a necessary and sufficient condition for recovering causal effects using the adjustment. We also introduce an adjustment formulation for controlling both confounding and selection biases in the presence of missing data and develop a necessary and sufficient condition for valid adjustment. Furthermore, we present an algorithm that lists all valid adjustment sets and an algorithm that finds a valid adjustment set containing the minimum number of variables, which are useful for researchers interested in selecting adjustment sets with desired properties.

Paper: Causal models on probability spaces

We describe the interface between measure theoretic probability and causal inference by constructing causal models on probability spaces within the potential outcomes framework. We find that measure theory provides a precise and instructive language for causality and that consideration of the probability spaces underlying causal models offers clarity into central concepts of causal inference. By closely studying simple, instructive examples, we demonstrate insights into causal effects, causal interactions, matching procedures, and randomization. Additionally, we introduce a simple technique for visualizing causal models on probability spaces that is useful both for generating examples and developing causal intuition. Finally, we provide an axiomatic framework for causality and make initial steps towards a formal theory of general causal models.

Paper: Circuit-Based Intrinsic Methods to Detect Overfitting

The focus of this paper is on intrinsic methods to detect overfitting. These rely only on the model and the training data, as opposed to traditional extrinsic methods that rely on performance on a test set or on bounds from model complexity. We propose a family of intrinsic methods called Counterfactual Simulation (CFS) which analyze the flow of training examples through the model by identifying and perturbing rare patterns. By applying CFS to logic circuits we get a method that has no hyper-parameters and works uniformly across different types of models such as neural networks, random forests and lookup tables. Experimentally, CFS can separate models with different levels of overfit using only their logic circuit representations without any access to the high level structure. By comparing lookup tables, neural networks, and random forests using CFS, we get insight into why neural networks generalize. In particular, we find that stochastic gradient descent in neural nets does not lead to ‘brute force’ memorization, but finds common patterns (whether we train with actual or randomized labels), and neural networks are not unlike forests in this regard. Finally, we identify a limitation with our proposal that makes it unsuitable in an adversarial setting, but points the way to future work on robust intrinsic methods.

Paper: On Open-Universe Causal Reasoning

We extend two kinds of causal models, structural equation models and simulation models, to infinite variable spaces. This enables a semantics for conditionals founded on a calculus of intervention, and axiomatization of causal reasoning for rich, expressive generative models—including those in which a causal representation exists only implicitly—in an open-universe setting. Further, we show that under suitable restrictions the two kinds of models are equivalent, perhaps surprisingly as their axiomatizations differ substantially in the general case. We give a series of complete axiomatizations in which the open-universe nature of the setting is seen to be essential.

Paper: Graphical Criteria for Efficient Total Effect Estimation via Adjustment in Causal Linear Models

Covariate adjustment is commonly used for total causal effect estimation. In recent years, graphical criteria have been developed to identify all covariate sets that can be used for this purpose. Different valid adjustment sets typically provide causal effect estimates of varying accuracies. We introduce a graphical criterion to compare the asymptotic variance provided by certain valid adjustment sets in a causal linear model. We employ this result to develop two further graphical tools. First, we introduce a simple variance reducing pruning procedure for any given valid adjustment set. Second, we give a graphical characterization of a valid adjustment set that provides the optimal asymptotic variance among all valid adjustment sets. Our results depend only on the graphical structure and not on the specific error variances or the edge coefficients of the underlying causal linear model. They can be applied to DAGs, CPDAGs and maximally oriented PDAGs. We present simulations and a real data example to support our results and show their practical applicability.