Twitter messages (tweets) contain various types of information, which include health-related information. Analysis of health-related tweets would help us understand health conditions and concerns encountered in our daily life. In this work, we evaluated an approach to extracting causal relations from tweets using natural language processing (NLP) techniques. We focused on three health-related topics: stress’, ‘insomnia’, and ‘headache’. We proposed a set of lexico-syntactic patterns based on dependency parser outputs to extract causal information. A large dataset consisting of 24 million tweets were used. The results show that our approach achieved an average precision between 74.59% and 92.27%. Analysis of extracted relations revealed interesting findings about health-related in Twitter.
The de facto standard for causal inference is the randomized controlled trial, where one compares an manipulated group with a control group in order to determine the effect of an intervention. However, this research design is not always realistically possible due to pragmatic or ethical concerns. In these situations, quasi-experimental designs may provide a solution, as these allow for causal conclusions at the cost of additional design assumptions. In this paper, we provide a generic framework for quasi-experimental design using Bayesian model comparison, and we show how it can be used as an alternative to several common research designs. We provide a theoretical motivation for a Gaussian process based approach and demonstrate its convenient use in a number of simulations. Finally, we apply the framework to determine the effect of population-based thresholds for municipality funding in France, of the 2005 smoking ban in Sicily on the number of acute coronary events, and of the effect of an alleged historical phantom border in the Netherlands on Dutch voting behaviour.
Python Library: statinf
A library for statistics and causal inference
Due to the increasing use of machine learning in practice it becomes more and more important to be able to explain the prediction and behavior of machine learning models. An instance of explanations are counterfactual explanations which provide an intuitive and useful explanations of machine learning models. In this survey we review model-specific methods for efficiently computing counterfactual explanations of many different machine learning models and propose methods for models that have not been considered in literature so far.
Propensity score methods are increasingly being used to reduce estimation bias of treatment effects for observational studies. Previous research has shown that propensity score methods consistently estimate the marginal hazard ratio for time to event data. However, recurrent data frequently arise in the biomedical literature and there is a paucity of research into the use of propensity score methods when data are recurrent in nature. The objective of this paper is to extend the existing propensity score methods to recurrent data setting. We illustrate our methods through a series of Monte Carlo simulations. The simulation results indicate that without the presence of censoring, the IPTW estimators allow us to consistently estimate the marginal hazard ratio for each event. Under administrative censoring regime, the stabilized IPTW estimator yields biased estimate of the marginal hazard ratio, and the degree of bias depends on the proportion of subjects being censored. For variance estimation, the na\’ive variance estimator often tends to substantially underestimate the variance of the IPTW estimator, while the robust variance estimator significantly reduces the estimation bias of the variance.
We consider the problem of learning from demonstrated trajectories with inverse reinforcement learning (IRL). Motivated by a limitation of the classical maximum entropy model in capturing the structure of the network of states, we propose an IRL model based on a generalized version of the causal entropy maximization problem, which allows us to generate a class of maximum entropy IRL models. Our generalized model has an advantage of being able to recover, in addition to a reward function, another expert’s function that would (partially) capture the impact of the connecting structure of the states on experts’ decisions. Empirical evaluation on a real-world dataset and a grid-world dataset shows that our generalized model outperforms the classical ones, in terms of recovering reward functions and demonstrated trajectories.
This paper studies causal inference in randomized experiments under network interference. Most existing models of interference posit that treatments assigned to alters only affect the ego’s response through a low-dimensional exposure mapping, which only depends on units within some known network radius around the ego. We propose a substantially weaker ‘approximate neighborhood interference’ (ANI) assumption, which allows treatments assigned to alters far from the ego to have a small, but potentially nonzero, impact on the ego’s response. Unlike the exposure mapping model, we can show that ANI is satisfied in well-known models of social interactions. Despite its generality, inference in a single-network setting is still possible under ANI, as we prove that standard inverse-probability weighting estimators can consistently estimate treatment and spillover effects and are asymptotically normal. For practical inference, we propose a new conservative variance estimator based on a network bootstrap and suggest a data-dependent bandwidth using the network diameter. Finally, we illustrate our results in a simulation study and empirical application.
We study Granger causality in the context of wide-sense stationary time series, where our focus is on the topological aspects of the underlying causality graph. We establish sufficient conditions (in particular, we develop the notion of a ‘strongly causal’ graph topology) under which the true causality graph can be recovered via pairwise causality testing alone, and provide examples from the gene regulatory network literature suggesting that our concept of a strongly causal graph may be applicable to this field. We implement and detail finite-sample heuristics derived from our theory, and establish through simulation the efficiency gains (both statistical and computational) which can be obtained (in comparison to LASSO-type algorithms) when structural assumptions are met.