Paper: Detecting Bias with Generative Counterfactual Face Attribute Augmentation

We introduce a simple framework for identifying biases of a smiling attribute classifier. Our method poses counterfactual questions of the form: how would the prediction change if this face characteristic had been different? We leverage recent advances in generative adversarial networks to build a realistic generative model of face images that affords controlled manipulation of specific image characteristics. We introduce a set of metrics that measure the effect of manipulating a specific property of an image on the output of a trained classifier. Empirically, we identify several different factors of variation that affect the predictions of a smiling classifier trained on CelebA.

Paper: Non parametric estimation of Joint entropy and Shannon mutual information, Asymptotic limits: Application to statistic tests

This paper proposes a new method for estimating the joint probability mass function of a pair of discrete random variables. This estimator is used to construct joint entropy and Shannon mutual information estimates of a pair of discrete random variables. Almost sure consistency and central limit Theorems are established. Theorical results are validated by simulations.

Library: Structural Equation Modeling and Confirmatory Network Analysis (psychonetrics)

Multi-group (dynamical) structural equation models in combination with confirmatory network models from cross-sectional, time-series and panel data. Allows for confirmatory testing and fit as well as exploratory model search.

Article: Feature Importance with Neural Network

One of the best challenge in Machine Learning tends to let model speak them self. Not also is important to develop a strong solution with great predicting power, but also in lot of business applications is interesting to know how the model provides these results: which variables are engage the most, the presence of correlations, the possible causation relationships and so on. These needs made Tree based model a good weapon in this field. They are scalable and permits to compute variable explanation very easy. Every software provide this option and each of us has at least once tried to compute the variable importance report with Random Forest or similar. With Neural Net this kind of benefit is considered as taboo. Neural Network are often seen as black box, from which is very difficult to extract usefull information for other purpose like feature explatations. In this post I try to provide an elegant and clever solution, that with few lines of codes, permits you to squeeze your Machine Learnig Model and extract as much information as possible, in order to provide feature importance, individuate the significant correlations and try to explain causation.

Article: Generalized interventional approach for causal mediation analysis with causally ordered multiple mediators

Causal mediation analysis has demonstrated the advantage of mechanism investigation. In conditions with causally ordered mediators, path-specific effects (PSEs) are introduced for specifying the effect subject to a certain combination of mediators. However, most PSEs are unidentifiable. To address this, an alternative approach termed interventional analogue of PSE (iPSE), is widely applied to effect decomposition. Previous studies that have considered multiple mediators have mainly focused on two-mediator cases due to the complexity of the mediation formula. This study proposes a generalized interventional approach for the settings, with the arbitrary number of ordered multiple mediators to study the causal parameter identification as well as statistical estimation. It provides a general definition of iPSEs with a recursive formula, assumptions for nonparametric identification, a regression-based method, and a g-computation algorithm to estimate all iPSEs. We demonstrate that each iPSE reduces to the result of linear structural equation modeling subject to linear or log-linear models. This approach is applied to a Taiwanese cohort study for exploring the mechanism by which hepatitis C virus infection affects mortality through hepatitis B virus infection, liver function, and hepatocellular carcinoma. Software based on a g-computation algorithm allows users to easily apply this method for data analysis subject to various model choices according to the substantive knowledge for each variable. All methods and software proposed in this study contribute to comprehensively decompose a causal effect confirmed by data science and help disentangling causal mechanisms when the natural pathways are complicated.

Paper: Replacing the do-calculus with Bayes rule

The concept of causality has a controversial history. The question of whether it is possible to represent and address causal problems with probability theory, or if fundamentally new mathematics such as the do calculus is required has been hotly debated, e.g. Pearl (2001) states ‘the building blocks of our scientific and everyday knowledge are elementary facts such as ‘mud does not cause rain’ and ‘symptoms do not cause disease’ and those facts, strangely enough, cannot be expressed in the vocabulary of probability calculus’. This has lead to a dichotomy between advocates of causal graphical modeling and the do calculus, and researchers applying Bayesian methods. In this paper we demonstrate that, while it is critical to explicitly model our assumptions on the impact of intervening in a system, provided we do so, estimating causal effects can be done entirely within the standard Bayesian paradigm. The invariance assumptions underlying causal graphical models can be encoded in ordinary Probabilistic graphical models, allowing causal estimation with Bayesian statistics, equivalent to the do calculus. Elucidating the connections between these approaches is a key step toward enabling the insights provided by each to be combined to solve real problems.