Paper: Unified model selection approach based on minimum description length principle in Granger causality analysis

Granger causality analysis (GCA) provides a powerful tool for uncovering the patterns of brain connectivity mechanism using neuroimaging techniques. Conventional GCA applies two different mathematical theories in a two-stage scheme: (1) the Bayesian information criterion (BIC) or Akaike information criterion (AIC) for the regression model orders associated with endogenous and exogenous information; (2) F-statistics for determining the causal effects of exogenous variables. While specifying endogenous and exogenous effects are essentially the same model selection problem, this could produce different benchmarks in the two stages and therefore degrade the performance of GCA. In this course, we present a unified model selection approach based on the minimum description length (MDL) principle for GCA in the context of the general regression model paradigm. Compared with conventional methods, our approach emphasize that a single mathematical theory should be held throughout the GCA process. Under this framework, all candidate models within the model space might be compared freely in the context of the code length, without the need for an intermediate model. We illustrate its advantages over conventional two-stage GCA approach in a 3-node network and a 5-node network synthetic experiments. The unified model selection approach is capable of identifying the actual connectivity while avoiding the false influences of noise. More importantly, the proposed approach obtained more consistent results in a challenge fMRI dataset for causality investigation, mental calculation network under visual and auditory stimulus, respectively. The proposed approach has potential to accommodate other Granger causality representations in other function space. The comparison between different GC representations in different function spaces can also be naturally deal with in the framework.


Article: Model-Robust Inference for Clinical Trials that Improve Precision by Stratified Randomization and Adjustment for Additional Baseline Variables

We focus on estimating the average treatment effect in clinical trials that involve stratified randomization, which is commonly used. It is important to understand the large sample properties of estimators that adjust for stratum variables (those used in the randomization procedure) and additional baseline variables, since this can lead to substantial gains in precision and power. Surprisingly, to the best of our knowledge, this is an open problem. It was only recently that a simpler problem was solved by Bugni et al. (2018) for the case with no additional baseline variables, continuous outcomes, the analysis of covariance (ANCOVA) estimator, and no missing data. We generalize their results in three directions. First, in addition to continuous outcomes, we handle binary and time-to-event outcomes; this broadens the applicability of the results. Second, we allow adjustment for an additional, preplanned set of baseline variables, which can improve precision. Third, we handle missing outcomes under the missing at random assumption. We prove that a wide class of estimators is asymptotically normally distributed under stratified randomization and has equal or smaller asymptotic variance than under simple randomization. For each estimator in this class, we give a consistent variance estimator. This is important in order to fully capitalize on the combined precision gains from stratified randomization and adjustment for additional baseline variables. The above results also hold for the biased-coin covariate-adaptive design. We demonstrate our results using completed trial data sets of treatments for substance use disorder, where adjustment for additional baseline variables brings substantial variance reduction.


Python Library: pycute

Information-Theoretic Causal Inference on Event Sequences. Pycute is a infromation-theoretic causal inference method for event sequences based on Granger-causality.


Paper: CXPlain: Causal Explanations for Model Interpretation under Uncertainty

Feature importance estimates that inform users about the degree to which given inputs influence the output of a predictive model are crucial for understanding, validating, and interpreting machine-learning models. However, providing fast and accurate estimates of feature importance for high-dimensional data, and quantifying the uncertainty of such estimates remain open challenges. Here, we frame the task of providing explanations for the decisions of machine-learning models as a causal learning task, and train causal explanation (CXPlain) models that learn to estimate to what degree certain inputs cause outputs in another machine-learning model. CXPlain can, once trained, be used to explain the target model in little time, and enables the quantification of the uncertainty associated with its feature importance estimates via bootstrap ensembling. We present experiments that demonstrate that CXPlain is significantly more accurate and faster than existing model-agnostic methods for estimating feature importance. In addition, we confirm that the uncertainty estimates provided by CXPlain ensembles are strongly correlated with their ability to accurately estimate feature importance on held-out data.


Paper: Deep causal representation learning for unsupervised domain adaptation

Studies show that the representations learned by deep neural networks can be transferred to similar prediction tasks in other domains for which we do not have enough labeled data. However, as we transition to higher layers in the model, the representations become more task-specific and less generalizable. Recent research on deep domain adaptation proposed to mitigate this problem by forcing the deep model to learn more transferable feature representations across domains. This is achieved by incorporating domain adaptation methods into deep learning pipeline. The majority of existing models learn the transferable feature representations which are highly correlated with the outcome. However, correlations are not always transferable. In this paper, we propose a novel deep causal representation learning framework for unsupervised domain adaptation, in which we propose to learn domain-invariant causal representations of the input from the source domain. We simulate a virtual target domain using reweighted samples from the source domain and estimate the causal effect of features on the outcomes. The extensive comparative study demonstrates the strengths of the proposed model for unsupervised domain adaptation via causal representations.


Paper: Estimation and inference for the indirect effect in high-dimensional linear mediation models

Mediation analysis is difficult when the number of potential mediators is larger than the sample size. In this paper we propose new inference procedures for the indirect effect in the presence of high-dimensional mediators for linear mediation models. We develop methods for both incomplete mediation, where a direct effect may exist, as well as complete mediation, where the direct effect is known to be absent. We prove consistency and asymptotic normality of our indirect effect estimators. Under complete mediation, where the indirect effect is equivalent to the total effect, we further prove that our approach gives a more powerful test compared to directly testing for the total effect. We confirm our theoretical results in simulations, as well as in an integrative analysis of gene expression and genotype data from a pharmacogenomic study of drug response. We present a novel analysis of gene sets to understand the molecular mechanisms of drug response, and also identify a genome-wide significant noncoding genetic variant that cannot be detected using standard analysis methods.


Article: Information-theoretic causal inference of lexical flow

This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages. A flow-based separation criterion and domain-specific directionality detection criteria are developed to make existing causal inference algorithms more robust against imperfect cognacy data, giving rise to two new algorithms. The Phylogenetic Lexical Flow Inference (PLFI) algorithm requires lexical features of proto-languages to be reconstructed in advance, but yields fully general phylogenetic networks, whereas the more complex Contact Lexical Flow Inference (CLFI) algorithm treats proto-languages as hidden common causes, and only returns hypotheses of historical contact situations between attested languages. The algorithms are evaluated both against a large lexical database of Northern Eurasia spanning many language families, and against simulated data generated by a new model of language contact that builds on the opening and closing of directional contact channels as primary evolutionary events. The algorithms are found to infer the existence of contacts very reliably, whereas the inference of directionality remains difficult. This currently limits the new algorithms to a role as exploratory tools for quickly detecting salient patterns in large lexical datasets, but it should soon be possible for the framework to be enhanced e.g. by confidence values for each directionality decision.


Paper: Adaptive Causal Network Coding with Feedback for Multipath Multi-hop Communications

We propose a novel multipath multi-hop adaptive and causal random linear network coding (AC-RLNC) algorithm with forward error correction. This algorithm generalizes our joint optimization coding solution for point-to-point communication with delayed feedback. AC-RLNC is adaptive to the estimated channel condition, and is causal, as the coding adjusts the retransmission rates using a priori and posteriori algorithms. In the multipath network, to achieve the desired throughput and delay, we propose to incorporate an adaptive packet allocation algorithm for retransmission, across the available resources of the paths. This approach is based on a discrete water filling algorithm, i.e., bit-filling, but, with two desired objectives, maximize throughput and minimize the delay. In the multipath multi-hop setting, we propose a new decentralized balancing optimization algorithm. This balancing algorithm minimizes the throughput degradation, caused by the variations in the channel quality of the paths at each hop. Furthermore, to increase the efficiency, in terms of the desired objectives, we propose a new selective recoding method at the intermediate nodes. Through simulations, we demonstrate that the performance of our adaptive and causal approach, compared to selective repeat (SR)-ARQ protocol, is capable of gains up to a factor two in throughput and a factor of more than three in delay.