Paper: Sampling distribution for single-regression Granger causality estimators

We show for the first time that, under the null hypothesis of vanishing Granger causality, the single-regression Granger-Geweke estimator converges to a generalised $\chi^2$ distribution, which may be well approximated by a $\Gamma$ distribution. We show that this holds too for Geweke’s spectral causality averaged over a given frequency band, and derive explicit expressions for the generalised $\chi^2$ and $\Gamma$-approximation parameters in both cases. We present an asymptotically valid Neyman-Pearson test based on the single-regression estimators, and discuss in detail how it may be usefully employed in realistic scenarios where autoregressive model order is unknown or infinite. We outline how our analysis may be extended to the conditional case, point-frequency spectral Granger causality, state-space Granger causality, and the Granger causality $F$-test statistic. Finally, we discuss approaches to approximating the distribution of the single-regression estimator under the alternative hypothesis.

Paper: An Innovative Approach to Addressing Childhood Obesity: A Knowledge-Based Infrastructure for Supporting Multi-Stakeholder Partnership Decision-Making in Quebec, Canada

The purpose of this paper is to describe and analyze the development of a knowledge-based infrastructure to support MSP decision-making processes. The paper emerged from a study to define specifications for a knowledge-based infrastructure to provide decision support for community-level MSPs in the Canadian province of Quebec. As part of the study, a process assessment was conducted to understand the needs of communities as they collect, organize, and analyze data to make decisions about their priorities. The result of this process is a portrait, which is an epidemiological profile of health and nutrition in their community. Portraits inform strategic planning and development of interventions and are used to assess the impact of interventions. Our key findings indicate ambiguities and disagreement among MSP decision-makers regarding causal relationships between actions and outcomes, and the relevant data needed for making decisions. MSP decision-makers expressed a desire for easy-to-use tools that facilitate the collection, organization, synthesis, and analysis of data, to enable decision-making in a timely manner. Findings inform conceptual modeling and ontological analysis to capture the domain knowledge and specify relationships between actions and outcomes. This modeling and analysis provide the foundation for an ontology, encoded using OWL 2 Web Ontology Language. The ontology is developed to provide semantic support for the MSP process, defining objectives, strategies, actions, indicators, and data sources. In the future, software interacting with the ontology can facilitate interactive browsing by decision-makers in the MSP in the form of concepts, instances, relationships, and axioms. Our ontology also facilitates the integration and interpretation of community data and can help in managing semantic interoperability between different knowledge sources.

Paper: Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality

Granger causality is a widely-used criterion for analyzing interactions in large-scale networks. As most physical interactions are inherently nonlinear, we consider the problem of inferring the existence of pairwise Granger causality between nonlinearly interacting stochastic processes from their time series measurements. Our proposed approach relies on modeling the embedded nonlinearities in the measurements using a component-wise time series prediction model based on Statistical Recurrent Units (SRUs). We make a case that the network topology of Granger causal relations is directly inferrable from a structured sparse estimate of the internal parameters of the SRU networks trained to predict the processes$’$ time series measurements. We propose a variant of SRU, called economy-SRU, which, by design has considerably fewer trainable parameters, and therefore less prone to overfitting. The economy-SRU computes a low-dimensional sketch of its high-dimensional hidden state in the form of random projections to generate the feedback for its recurrent processing. Additionally, the internal weight parameters of the economy-SRU are strategically regularized in a group-wise manner to facilitate the proposed network in extracting meaningful predictive features that are highly time-localized to mimic real-world causal events. Extensive experiments are carried out to demonstrate that the proposed economy-SRU based time series prediction model outperforms the MLP, LSTM and attention-gated CNN-based time series models considered previously for inferring Granger causality.

Paper: Interventional Markov Equivalence for Mixed Graph Models

We study the problem of characterizing Markov equivalence of graphical models under general interventions. For DAGs, this problem is solved using data from an interventional setting to refine MECs of DAGs into smaller, interventional MECs. A recent graphical characterization of interventional MECs of DAGs relates to their global Markov property. Motivated by this, we generalize interventional MECs to all loopless mixed graphs via their global Markov property and generalize the graphical characterization given for DAGs to ancestral graphs. We also extend the notion of interventional Markov equivalence probabilistically: via invariance properties of distributions Markov to acyclic directed mixed graphs (ADMGs). We show that this generalization aligns with the standard causal interpretation of ADMGs. Finally, we show the two generalizations coincide at their intersection, thereby completely generalizing the characterization for DAGs to directed ancestral graphs.

Paper: Instance Cross Entropy for Deep Metric Learning

Loss functions play a crucial role in deep metric learning thus a variety of them have been proposed. Some supervise the learning process by pairwise or tripletwise similarity constraints while others take advantage of structured similarity information among multiple data points. In this work, we approach deep metric learning from a novel perspective. We propose instance cross entropy (ICE) which measures the difference between an estimated instance-level matching distribution and its ground-truth one. ICE has three main appealing properties. Firstly, similar to categorical cross entropy (CCE), ICE has clear probabilistic interpretation and exploits structured semantic similarity information for learning supervision. Secondly, ICE is scalable to infinite training data as it learns on mini-batches iteratively and is independent of the training set size. Thirdly, motivated by our relative weight analysis, seamless sample reweighting is incorporated. It rescales samples’ gradients to control the differentiation degree over training examples instead of truncating them by sample mining. In addition to its simplicity and intuitiveness, extensive experiments on three real-world benchmarks demonstrate the superiority of ICE.

Paper: Two Causal Principles for Improving Visual Dialog

This paper is a winner report from team MReaL-BDAI for Visual Dialog Challenge 2019. We present two causal principles for improving Visual Dialog (VisDial). By ‘improving’, we mean that they can promote almost every existing VisDial model to the state-of-the-art performance on Visual Dialog 2019 Challenge leader-board. Such a major improvement is only due to our careful inspection on the causality behind the model and data, finding that the community has overlooked two causalities in VisDial. Intuitively, Principle 1 suggests: we should remove the direct input of the dialog history to the answer model, otherwise the harmful shortcut bias will be introduced; Principle 2 says: there is an unobserved confounder for history, question, and answer, leading to spurious correlations from training data. In particular, to remove the confounder suggested in Principle 2, we propose several causal intervention algorithms, which make the training fundamentally different from the traditional likelihood estimation. Note that the two principles are model-agnostic, so they are applicable in any VisDial model.

Paper: Causality for Machine Learning

Graphical causal inference as pioneered by Judea Pearl arose from research on artificial intelligence (AI), and for a long time had little connection to the field of machine learning. This article discusses where links have been and should be established, introducing key concepts along the way. It argues that the hard open problems of machine learning and AI are intrinsically related to causality, and explains how the field is beginning to understand them.

Paper: Causally Denoise Word Embeddings Using Half-Sibling Regression

Distributional representations of words, also known as word vectors, have become crucial for modern natural language processing tasks due to their wide applications. Recently, a growing body of word vector postprocessing algorithm has emerged, aiming to render off-the-shelf word vectors even stronger. In line with these investigations, we introduce a novel word vector postprocessing scheme under a causal inference framework. Concretely, the postprocessing pipeline is realized by Half-Sibling Regression (HSR), which allows us to identify and remove confounding noise contained in word vectors. Compared to previous work, our proposed method has the advantages of interpretability and transparency due to its causal inference grounding. Evaluated on a battery of standard lexical-level evaluation tasks and downstream sentiment analysis tasks, our method reaches state-of-the-art performance.