Magister Dixit

“The ‘Age of Automation’ is upon us. Companies strive to reduce their costs by using technology to replace humans at every opportunity. Business executives fight over experts in artificial intelligence and data science in hopes of attaining a competitive edge over their rivals. Even the wary flock to siren calls of ever-greater efficiency via investments in computers, robotics and software.” Mark Dickson ( 24.08.2017 )


If you did not already know

HCqa google
Question Answering (QA) systems provide easy access to the vast amount of knowledge without having to know the underlying complex structure of the knowledge. The research community has provided ad hoc solutions to the key QA tasks, including named entity recognition and disambiguation, relation extraction and query building. Furthermore, some have integrated and composed these components to implement many tasks automatically and efficiently. However, in general, the existing solutions are limited to simple and short questions and still do not address complex questions composed of several sub-questions. Exploiting the answer to complex questions is further challenged if it requires integrating knowledge from unstructured data sources, i.e., textual corpus, as well as structured data sources, i.e., knowledge graphs. In this paper, an approach (HCqa) is introduced for dealing with complex questions requiring federating knowledge from a hybrid of heterogeneous data sources (structured and unstructured). We contribute in developing (i) a decomposition mechanism which extracts sub-questions from potentially long and complex input questions, (ii) a novel comprehensive schema, first of its kind, for extracting and annotating relations, and (iii) an approach for executing and aggregating the answers of sub-questions. The evaluation of HCqa showed a superior accuracy in the fundamental tasks, such as relation extraction, as well as the federation task. …

Neural Nearest Neighbors Network google
Non-local methods exploiting the self-similarity of natural signals have been well studied, for example in image analysis and restoration. Existing approaches, however, rely on k-nearest neighbors (KNN) matching in a fixed feature space. The main hurdle in optimizing this feature space w.r.t. application performance is the non-differentiability of the KNN selection rule. To overcome this, we propose a continuous deterministic relaxation of KNN selection that maintains differentiability w.r.t. pairwise distances, but retains the original KNN as the limit of a temperature parameter approaching zero. To exploit our relaxation, we propose the neural nearest neighbors block (N3 block), a novel non-local processing layer that leverages the principle of self-similarity and can be used as building block in modern neural network architectures. We show its effectiveness for the set reasoning task of correspondence classification as well as for image restoration, including image denoising and single image super-resolution, where we outperform strong convolutional neural network (CNN) baselines and recent non-local models that rely on KNN selection in hand-chosen features spaces. …

Variational Bayesian Monte Carlo (VBMC) google
Many probabilistic models of interest in scientific computing and machine learning have expensive, black-box likelihoods that prevent the application of standard techniques for Bayesian inference, such as MCMC, which would require access to the gradient or a large number of likelihood evaluations. We introduce here a novel sample-efficient inference framework, Variational Bayesian Monte Carlo (VBMC). VBMC combines variational inference with Gaussian-process based, active-sampling Bayesian quadrature, using the latter to efficiently approximate the intractable integral in the variational objective. Our method produces both a nonparametric approximation of the posterior distribution and an approximate lower bound of the model evidence, useful for model selection. We demonstrate VBMC both on several synthetic likelihoods and on a neuronal model with data from real neurons. Across all tested problems and dimensions (up to $D = 10$), VBMC performs consistently well in reconstructing the posterior and the model evidence with a limited budget of likelihood evaluations, unlike other methods that work only in very low dimensions. Our framework shows great promise as a novel tool for posterior and model inference with expensive, black-box likelihoods. …

Semi-Supervised Conditional Generative Adversarial Network (scGAN) google
One of the frontier issues that severely hamper the development of automatic snore sound classification (ASSC) associates to the lack of sufficient supervised training data. To cope with this problem, we propose a novel data augmentation approach based on semi-supervised conditional Generative Adversarial Networks (scGANs), which aims to automatically learn a mapping strategy from a random noise space to original data distribution. The proposed approach has the capability of well synthesizing ‘realistic’ high-dimensional data, while requiring no additional annotation process. To handle the mode collapse problem of GANs, we further introduce an ensemble strategy to enhance the diversity of the generated data. The systematic experiments conducted on a widely used Munich-Passau snore sound corpus demonstrate that the scGANs-based systems can remarkably outperform other classic data augmentation systems, and are also competitive to other recently reported systems for ASSC. …

R Packages worth a look

Latin Hypercube Designs (LHDs) Algorithms (LHD)
Contains functions for finding space-filling Latin Hypercube Designs (LHDs), e.g. maximin distance LHDs. Unlike other packages, our package is particularly useful in the area of Design and Analysis of Experiments (DAE). More specifically, it is very useful in design of computer experiments. One advantage of our package is its comprehensiveness. It contains a variety of heuristic algorithms (and their modifications) for searching maximin distance LHDs. In addition to that, it also contains other useful tools for developing and constructing maximin distance LHDs. In the future, algebraic construction methods will be added. Please refer to the function documentations for the detailed references of each function. Among all the references we used, one reference should be highlighted here, which is Ruichen Jin, Wei Chen, Agus Sudjianto (2005) <doi:10.1016/j.jspi.2004.02.014>. They provided a new form of phi_p criterion, which does not lose the space-filling property and simultaneously reduces the computational complexity when evaluating (or re-evaluating) an LHD. Their new phi_p criterion is a fundamental component of our many functions. Besides, the computation nature of the new phi_p criterion enables our functions to have less CPU time.

Generalized Gauss Markov Regression (ggmr)
Implements the generalized Gauss Markov regression, this is useful when both predictor and response have uncertainty attached to them and also when covariance within the predictor, within the response and between the predictor and the response is present. Base on the results published in guide ISO/TS 28037 (2010) <https://…/44473.html>.

Tools for Matrix Algebra, Optimization and Inference (maotai)
Matrix is an universal and sometimes primary object/unit in applied mathematics and statistics. We provide a number of algorithms for selected problems in optimization and statistical inference. For general exposition to the topic with focus on statistical context, see the book by Banerjee and Roy (2014, ISBN:9781420095388).

Profiling Compliers and Non-Compliers for Instrumental Variable Analysis (ivdesc)
Estimating the mean and variance of a covariate for the complier, never-taker and always-taker subpopulation in the context of instrumental variable estimation. This package implements the method described in Marbach and Hangartner (2019) <doi:10.2139/ssrn.3380247>.

Kumaraswamy Complementary Weibull Geometric (Kw-CWG) Probability Distribution (elfDistr)
Density, distribution function, quantile function and random generation for the Kumaraswamy Complementary Weibull Geometric (Kw-CWG) lifetime probability distribution proposed in Afify, A.Z. et al (2017) <doi:10.1214/16-BJPS322>.

What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

A small example package Anomaly detection using hierarchical clustering, anomaly detector, classifiers and fast model rebuilding

Complex values in Keras – Deep learning for humans. (https://…/Convolution#Domain_of_definition ) could provide some interesting results in signal processing-based deep learning. A simple(-ish) idea is including explicit phase information of time series in neural networks. This code enables complex-valued convolution in convolutional neural networks in ( ) with the ( ) backend. This makes the network modular and interoperable with standard keras layers and operations.

konduit: Enterprise Runtime for Machine Learning Models

Python API for LANA Process Mining

Waymo Open Dataset libraries.

Companion package to whynot: A collection of causal estimators in R.

AGNES – Flexible Reinforcement Learning Framework with PyTorch

A python implementation of laravel framework for machine learning, AI, datascience and data intensive work.

Data Representation Language for Reading Heterogeneous Datasets. This library allows reading heterogeneous datasets of different formats and layouts. For more information, please visit (https://…/d-repr ).

NLP question answering service

Document worth reading: “Many perspectives on Deborah Mayo’s ‘Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars'”

The new book by philosopher Deborah Mayo is relevant to data science for topical reasons, as she takes various controversial positions regarding hypothesis testing and statistical practice, and also as an entry point to thinking about the philosophy of statistics. The present article is a slightly expanded version of a series of informal reviews and comments on Mayo’s book. We hope this discussion will introduce people to Mayo’s ideas along with other perspectives on the topics she addresses. Many perspectives on Deborah Mayo’s ‘Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars’

Finding out why

Paper: Learning from Bandit Feedback: An Overview of the State-of-the-art

In machine learning we often try to optimise a decision rule that would have worked well over a historical dataset; this is the so called empirical risk minimisation principle. In the context of learning from recommender system logs, applying this principle becomes a problem because we do not have available the reward of decisions we did not do. In order to handle this ‘bandit-feedback’ setting, several Counterfactual Risk Minimisation (CRM) methods have been proposed in recent years, that attempt to estimate the performance of different policies on historical data. Through importance sampling and various variance reduction techniques, these methods allow more robust learning and inference than classical approaches. It is difficult to accurately estimate the performance of policies that frequently perform actions that were infrequently done in the past and a number of different types of estimators have been proposed. In this paper, we review several methods, based on different off-policy estimators, for learning from bandit feedback. We discuss key differences and commonalities among existing approaches, and compare their empirical performance on the RecoGym simulation environment. To the best of our knowledge, this work is the first comparison study for bandit algorithms in a recommender system setting.

Python Library: whynot

A framework for benchmarking causal inference.

Paper: Uncovering Sociological Effect Heterogeneity using Machine Learning

Individuals do not respond uniformly to treatments, events, or interventions. Sociologists routinely partition samples into subgroups to explore how the effects of treatments vary by covariates like race, gender, and socioeconomic status. In so doing, analysts determine the key subpopulations based on theoretical priors. Data-driven discoveries are also routine, yet the analyses by which sociologists typically go about them are problematic and seldom move us beyond our expectations, and biases, to explore new meaningful subgroups. Emerging machine learning methods allow researchers to explore sources of variation that they may not have previously considered, or envisaged. In this paper, we use causal trees to recursively partition the sample and uncover sources of treatment effect heterogeneity. We use honest estimation, splitting the sample into a training sample to grow the tree and an estimation sample to estimate leaf-specific effects. Assessing a central topic in the social inequality literature, college effects on wages, we compare what we learn from conventional approaches for exploring variation in effects to causal trees. Given our use of observational data, we use leaf-specific matching and sensitivity analyses to address confounding and offer interpretations of effects based on observed and unobserved heterogeneity. We encourage researchers to follow similar practices in their work on variation in sociological effects.

Paper: Causal Modeling for Fairness in Dynamical Systems

In this work, we present causal directed acyclic graphs (DAGs) as a unifying framework for the recent literature on fairness in dynamical systems. We advocate for the use of causal DAGs as a tool in both designing equitable policies and estimating their impacts. By visualizing models of dynamic unfairness graphically, we expose implicit causal assumptions which can then be more easily interpreted and scrutinized by domain experts. We demonstrate that this method of reinterpretation can be used to critique the robustness of an existing model/policy, or uncover new policy evaluation questions. Causal models also enable a rich set of options for evaluating a new candidate policy without incurring the risk of implementing the policy in the real world. We close the paper with causal analyses of several models from the recent literature, and provide an in-depth case study to demonstrate the utility of causal DAGs for modeling fairness in dynamical systems.

Paper: Explaining Visual Models by Causal Attribution

Model explanations based on pure observational data cannot compute the effects of features reliably, due to their inability to estimate how each factor alteration could affect the rest. We argue that explanations should be based on the causal model of the data and the derived intervened causal models, that represent the data distribution subject to interventions. With these models, we can compute counterfactuals, new samples that will inform us how the model reacts to feature changes on our input. We propose a novel explanation methodology based on Causal Counterfactuals and identify the limitations of current Image Generative Models in their application to counterfactual creation.

Article: The Turf War Between Causality and Correlation In Data Science: Which One Is More Important?

Data scientists have tried to differentiate causality from correlation. Last month alone, I’ve seen 20+ posts referencing the catchphrase ‘correlation is not causality.’ What they actually want to say is correlation is not as good as causality. The tendency of ‘bias towards’ causality among the data world is understandable. It takes more training in data skills (e.g. potential outcomes framework, hypothesis testing, counterfactual, etc.) than correlation research. On a personal note, I make a judgment call of someone’s work based on the strength of his causal story. Causal inference generates actionable insights into end-users, pointing the directions for the product team. However, this shouldn’t be the reason why we treat correlational study lightly with less appreciation. There are a ton of business scenarios require both types of research.

Python Library: whynotr

Companion package to whynot: A collection of causal estimators in R.

Article: Everything you need to know about interpreting correlations

Correlation is the most widely used statistical measure to assess relationships among variables. However, correlation must be exercised cautiously; otherwise, it could lead to wrong interpretations and conclusions. An example where correlation could be misleading, is when you are working with sample data. Because an apparent correlation in a sample is not necesseraly present in the population from which the sample came from and might be only due to chance coincidence (random sampling error). That’s the reason why a correlation must be accompanied by a significance test to assess its reliability. Also, while interpreting a relationship, one should be careful to not confound correlation and causality, because although a correlation demonstrates that a relationship exists between two variables, it does not automatically imply that one causes the other (cause-and-effect relationship). This post will define correlation, types of correlation, explain how to measure correlation using correlation coefficient, and especially how to assess the reliability of a linear correlation using a significance test. If you are familiar with correlation, you can skip the introduction.

Distilled News

The Turf War Between Causality and Correlation In Data Science: Which One Is More Important?

Data scientists have tried to differentiate causality from correlation. Last month alone, I’ve seen 20+ posts referencing the catchphrase ‘correlation is not causality.’ What they actually want to say is correlation is not as good as causality. The tendency of ‘bias towards’ causality among the data world is understandable. It takes more training in data skills (e.g. potential outcomes framework, hypothesis testing, counterfactual, etc.) than correlation research. On a personal note, I make a judgment call of someone’s work based on the strength of his causal story. Causal inference generates actionable insights into end-users, pointing the directions for the product team. However, this shouldn’t be the reason why we treat correlational study lightly with less appreciation. There are a ton of business scenarios require both types of research.

Trade Smarter w/ Reinforcement Learning

A deep dive into TensorTrade – the Python framework for trading and investing using deep reinforcement learning.

Enhancing the power of SoftMax for image classification

SoftMax has achieved state-of-the-art results in many classification tasks. However it won’t perform as expected for datasets whose classes have similar features. The reason is, SoftMax can’t learn separable features that aren’t discriminative enough. To tackle this problem, many approaches have been proposed. A very efficient one is separating learned features with an angular distance(margin), which will be explained in this article.

Get the Optimal K in K-Means Clustering

Provide the quick starting guide to find the optimal number of clusters in K-means clustering.

Building a Machine Learning Model When Data Isn’t Available

Before performing any data science task such as exploratory data analysis or building a model, you must ask yourself the following important questions: What do you want to find out or discover using your data? Do you have the appropriate data to analyze? Data is key to any data science and machine learning task. Data comes in different flavors such as numerical data, categorical data, text data, image data, sound data, and video data. The predictive power of a model depends on the quality of data used in building the model.

Cluster multiple time series using K-means

I have been recently confronted to the issue of finding similarities among time-series and though about using k-means to cluster them. To illustrate the method, I’ll be using data from the Penn World Tables …

Selection bias, death, and dying

I am collaborating with a number of folks who think a lot about palliative or supportive care for people who are facing end-stage disease, such as advanced dementia, cancer, COPD, or congestive heart failure. A major concern for this population (which really includes just about everyone at some point) is the quality of life at the end of life and what kind of experiences, including interactions with the health care system, they have (and don’t have) before death. A key challenge for researchers is figuring out how to analyze events that occur just before death. For example, it is not unusual to consider hospitalization in the week or month before death as a poor outcome. For example, here is a paper in the Journal of Palliative Care Medicine that describes an association of homecare nursing and reduced hospitalizations in the week before death. While there is no denying the strength of the association, it is less clear how much of that association is causal. In particular, there is the possibility of selection bias that may be result when considering only patients who have died. In this post, I want to describe the concept of selection bias and simulate data that mimics the process of end-stage disease in order to explore how these issues might play out when we are actually evaluating the causal effect of an exposure or randomized intervention.

Choosing a Machine Learning Model

Selecting the perfect machine learning model is part art and part science. Learn how to review multiple models and pick the best in both competitive and real-world applications.

Three Things to Know About Reinforcement Learning

As an engineer, scientist, or researcher, you may want to take advantage of this new and growing technology, but where do you start? The best place to begin is to understand what the concept is, how to implement it, and whether it’s the right approach for a given problem.

Research Guide for Video Frame Interpolation with Deep Learning

In this research guide, we’ll look at deep learning papers aimed at synthesizing video frames within an existing video.

Surprise – Model Improvements Don’t Always Drive Business Impact

Data Scientists from share many lessons learned in the process of constantly improving their sophisticated ML models. Not the least of which is that improving your models doesn’t always lead to improving business outcomes.

Pre-defined sparsity for reducing complexity in neural networks

Neural networks are quite the rage nowadays. They make deep learning possible, which powers smart systems such as speech recognition and self-driving cars. These cool end results don’t really reflect the gory complexity of most modern neural networks, which have many millions of parameters needing to be trained to make the system smart. Training costs time and a lot of computational resources, which often translates to money. This forms a barrier between the haves – big tech companies with a plethora of computing power at their disposal, and the have-nots – Ph.D. students like me who can’t afford to train dozens of networks for dozens of days. Well, research is often born out of necessity.

Multi-lingual Chatbot Using Rasa and Custom Tokenizer

Tips and tricks to enhance the Rasa NLU pipeline with your own custom tokenizer for multi-lingual chatbot.

Enable ML Experiments

Recently inspired by talks from Dmitry Petrov about Machine learning model and dataset versioning practices and Data versioning in machine learning projects. In the majority of the current ML systems, there is a lack of efficient and systematic ways to deliver the value of data through Data Science into the market in an agile, continuous and maintainable way. A solution has been pointed out in excellent writing from Martin Fowler about Continuous Delivery Machine Learning (CD4ML). This article presents a thread of thoughts about systematically conducting ML experiments, which is an essential step towards the continuous delivery of machine learning applications.

Multiple Hypothesis Testing in R

In the first article of this series, we looked at understanding type I and type II errors in the context of an A/B test, and highlighted the issue of ‘peeking’. In the second, we illustrated a way to calculate always-valid p-values that were immune to peeking. We will now explore multiple hypothesis testing, or what happens when multiple tests are conducted on the same family of data.

Introduction to Clinical Natural Language Processing: Predicting Hospital Readmission with Discharge Summaries

Doctors have always written clinical notes about their patients – originally, the notes were on paper and were locked away in a cabinet. Fortunately for data scientists, doctors now enter their notes in an electronic medical record. These notes represent a vast wealth of knowledge and insight that can be utilized for predictive models using Natural Language Processing (NLP) to improve patient care and hospital workflow. As an example, I will show you how to predict hospital readmission with discharge summaries.

Mixed Formal Learning

A Path to Low Shot and Zero Shot Learning. Glints of latent variables from formal models mixed with specialized models guide learning. This paper presents Mixed Formal Learning, an architecture that learns models base on formal mathematical representations of the domain of interest that exposes latent variables. The second element in the architecture learns a particular skill, typically by using traditional prediction or classification mechanisms. Our key findings include that this architecture: (1) Enables Low Shot and Zero Shot training of machine learning without sacrificing accuracy or recall; (2) Is demonstrated for the extraction of phrase and numerical data from semi-structured documents; (3) Can enable other applications with Low Shot Learning in the document domain; (4) Can be applied to enable Low Shot and Zero Shot Learning in other domains.

Fundamentals of Reinforcement Learning: Markov Decision Processes, Policies, & Value Functions

Welcome to the second article in GradientCrescent’s special series on reinforcement learning. This series will serve as to introduce some of the fundamental concepts in reinforcement learning, primarily using Sutton’s Reinforcement Learning Textbook and the University of Alberta’s Reinforcement Learning course as source material. This series will focus on learning of concepts over their demonstrations, serving to reinforce (pun intended) my own learning in the field.

What’s going on on PyPI

Scanning all new published packages on PyPI I know that the quality is often quite bad. I try to filter out the worst ones and list here the ones which might be worth a look, being followed or inspire you in some way.

Pandas Dataframe integration for spaCy. DframCy is a light-weight utility module to integrate Pandas Dataframe to spaCy’s linguistic annotation and training tasks. DframCy provides clean APIs to convert spaCy’s linguistic annotations, Matcher and PhraseMatcher information to Pandas dataframe, also supports training and evaluation of NLP pipeline from CSV/XLXS/XLS without any changes to spaCy’s underlying APIs.

LPC Utility for Pytorch Library. LPCTorch is a small pytorch utility for Linear Predictive Coding. It provides a simple way to compute windowed Linear Predictive Coding Coefficients on a input audio signal. The repo uses the Burg’s methods and is heavily inspired from the librosa audio library implementation.

The interface library for probabilistic modeling in HEP

Waymo Open Dataset libraries.

A framework for benchmarking causal inference.

Auto-PyTorch searches neural architectures using BO-HB

AutoML DNN Forecasting Models

The azureml-contrib-interpret package contains experimental functionality for the azureml-interpret package, which offers a variety of services for machine learning model interpretability. Eventually, useful pieces of this package will be moved into the full package.

Microsoft Azure Machine Learning Interpret API for Python.

Toolbox for Machine Learning using Topological Data Analysis.

If you did not already know

MARVIN google
In this demo paper, we introduce the DARPA D3M program for automatic machine learning (ML) and JPL’s MARVIN tool that provides an environment to locate, annotate, and execute machine learning primitives for use in ML pipelines. MARVIN is a web-based application and associated back-end interface written in Python that enables composition of ML pipelines from hundreds of primitives from the world of Scikit-Learn, Keras, DL4J and other widely used libraries. MARVIN allows for the creation of Docker containers that run on Kubernetes clusters within DARPA to provide an execution environment for automated machine learning. MARVIN currently contains over 400 datasets and challenge problems from a wide array of ML domains including routine classification and regression to advanced video/image classification and remote sensing. …

GLocalized Anomaly Detection (GLAD) google
We propose an algorithm called GLAD (GLocalized Anomaly Detection) that allows end-users to retain the use of simple and understandable global anomaly detectors by automatically learning their local relevance to specific data instances using label feedback. The key idea is to place a uniform prior over the input feature space for each member of the anomaly detection ensemble via a neural network trained on unlabeled instances, and tune the weights of the neural network to adjust the local relevance of each ensemble member using all labeled instances. Our experiments on synthetic and real-world data show the effectiveness of GLAD in learning the local relevance of ensemble members and discovering anomalies via label feedback. …

Semantic Pixel-Level Adaptation Transform (SPLAT) google
Domain adaptation of visual detectors is a critical challenge, yet existing methods have overlooked pixel appearance transformations, focusing instead on bootstrapping and/or domain confusion losses. We propose a Semantic Pixel-Level Adaptation Transform (SPLAT) approach to detector adaptation that efficiently generates cross-domain image pairs. Our model uses aligned-pair and/or pseudo-label losses to adapt an object detector to the target domain, and can learn transformations with or without densely labeled data in the source (e.g. semantic segmentation annotations). Without dense labels, as is the case when only detection labels are available in the source, transformations are learned using CycleGAN alignment. Otherwise, when dense labels are available we introduce a more efficient cycle-free method, which exploits pixel-level semantic labels to condition the training of the transformation network. The end task is then trained using detection box labels from the source, potentially including labels inferred on unlabeled source data. We show both that pixel-level transforms outperform prior approaches to detector domain adaptation, and that our cycle-free method outperforms prior models for unconstrained cycle-based learning of generic transformations while running 3.8 times faster. Our combined model improves on prior detection baselines by 12.5 mAP adapting from Sim 10K to Cityscapes, recovering over 50% of the missing performance between the unadapted baseline and the labeled-target upper bound. …

Raster Time Series google
The raster model is widely used in Geographic Information Systems to represent data that vary continuously in space, such as temperatures, precipitations, elevation, among other spatial attributes. In applications like weather forecast systems, not just a single raster, but a sequence of rasters covering the same region at different timestamps, known as a raster time series, needs to be stored and queried. Compact data structures have proven successful to provide space-efficient representations of rasters with query capabilities. Hence, a naive approach to save space is to use such a representation for each raster in a time series. …