Distilled News


In this post we’ll share how we used TensorFlow’s object detection API to build a custom image annotation service for eyeson. Below you can seen an example where Philipp is making the ‘thinking’ ?? pose during a meeting which automatically triggers a GIF reaction.

Relational inductive biases, deep learning, and graph networks

Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one’s experiences–a hallmark of human intelligence from infancy–remains a formidable challenge for modern AI. The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between ‘hand-engineering’ and ‘end-to-end’ learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias–the graph network–which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.

Top 30 Python Libraries for Machine Learning

Today, Python is one of the most popular programming languages and it has replaced many languages in the industry. There are various reasons for its popularity and one of them is that python has a large collection of libraries.

Beginner’s Guide to Machine Learning with Python

Machine Learning, a prominent topic in Artificial Intelligence domain, has been in the spotlight for quite some time now. This area may offer an attractive opportunity, and starting a career in it is not as difficult as it may seem at first glance. Even if you have zero-experience in math or programming, it is not a problem. The most important element of your success is purely your own interest and motivation to learn all those things. If you are a newcomer, you do not know where to start studying and why you need Machine Learning and why it is gaining more and more popularity lately, you got into the right place! I’ve gathered all the needed information and useful resources to help you gain new knowledge and accomplish your first projects.

The What, Why and How of Bias-Variance Trade-off

Building an effective Machine Learning model is all about striking the right balance between Bias (Underfitting) and Variance (Overfitting) but what are Bias and Variance ? What Bias and Variance mean intuitively ? Let’s take a step back and understand the terms Bias and Variance on a conceptual level and then try to relate these concepts to Machine Learning.

Basics of Reinforcement Learning, the Easy Way

Reinforcement Learning (RL) is the problem of studying an agent in an environment, the agent has to interact with the environment in order to maximize some cumulative rewards. Example of RL is an agent in a labyrinth trying to find its way out. The fastest it can find the exit, the better reward it will get.

R 3.6.1 is now available

On July 5, the R Core Group released the source code for the latest update to R, R 3.6.1, and binaries are now available to download for Windows, Linux and Mac from your local CRAN mirror.

Regulation of Artificial Intelligence in Selected Jurisdictions

This report examines the emerging regulatory and policy landscape surrounding artificial intelligence (AI) in jurisdictions around the world and in the European Union (EU). In addition, a survey of international organizations describes the approach that United Nations (UN) agencies and regional organizations have taken towards AI. As the regulation of AI is still in its infancy, guidelines, ethics codes, and actions by and statements from governments and their agencies on AI are also addressed. While the country surveys look at various legal issues, including data protection and privacy, transparency, human oversight, surveillance, public administration and services, autonomous vehicles, and lethal autonomous weapons systems, the most advanced regulations were found in the area of autonomous vehicles, in particular for the testing of such vehicles.

9 Lessons learned from failed AI PoCs

After several AI PoCs, I realized that it is quite easy to launch AI PoCs with initially positive results, but at the same time, it is difficult to scale up AI to enterprise-wide applications and reach the production stage. In this article, I’ll share some of the reasons why I failed in a couple of projects.
1. Data
2. Compliance
3. Realistic Expectations
4. Scalability
5. Size and nature of your PoC
6. Implementation process
7. AI Accuracy / Available Data
8. PoC Evaluation
9. Time Window

Summarizing popular Text-to-Image Synthesis methods with Python

Automatic synthesis of realistic images from text have become popular with deep convolutional and recurrent neural network architectures to aid in learning discriminative text feature representations. Discriminative power and strong generalization properties of attribute representations even though attractive, its a complex process and requires domain-specific knowledge. In comparison, natural language offers a general and flexible interface for describing objects in any space of visual categories. The best thing is to combine generality of text descriptions with the discriminative power of attributes. This blog addresses different text to image synthesis algorithms using GAN (Generative Adversarial Network) thats aims to directly map words and characters to image pixels with natural language representation and image synthesis techniques.

Document worth reading: “Abandon Statistical Significance”

In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only then is consideration–often scant–given to such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain. There have been recent proposals to change the p-value threshold, but instead we recommend abandoning the null hypothesis significance testing paradigm entirely, leaving p-values as just one of many pieces of information with no privileged role in scientific publication and decision making. We argue that this radical approach is both practical and sensible. Abandon Statistical Significance

R Packages worth a look

Fit Latent Dirichlet Allocation Models using Stochastic Variational Inference (lda.svi)
Fits Latent Dirichlet Allocation topic models to text data using the stochastic variational inference algorithm described in Hoffman et. al. (2013) &lt …

R Markdown Output Formats for Storytelling (rolldown)
R Markdown output formats based on JavaScript libraries such as ‘Scrollama’ (<

General Tools for Building GLM Expectation-Maximization Models (
Implementation of Expectation Maximization (EM) regression of general linear models. The package currently supports Poisson and Logistic regression wit …

Talking to ‘Docker’ and ‘Singularity’ Containers (babelwhale)
Provides a unified interface to interact with ‘docker’ and ‘singularity’ containers. You can execute a command inside a container, mount a volume or co …

Document worth reading: “Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology”

Performance metrics (error measures) are vital components of the evaluation frameworks in various fields. The intention of this study was to overview of a variety of performance metrics and approaches to their classification. The main goal of the study was to develop a typology that will help to improve our knowledge and understanding of metrics and facilitate their selection in machine learning regression, forecasting and prognostics. Based on the analysis of the structure of numerous performance metrics, we propose a framework of metrics which includes four (4) categories: primary metrics, extended metrics, composite metrics, and hybrid sets of metrics. The paper identified three (3) key components (dimensions) that determine the structure and properties of primary metrics: method of determining point distance, method of normalization, method of aggregation of point distances over a data set. Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology

Distilled News

NLP Tutorial: MultiLabel Classification Problem using Linear Models

This article presents in details how to predict tags for posts from StackOverflow using Linear Model after carefully preprocessing our text features.

What is Knowledge Distillation?

It seems fair to say that simple computer vision models weigh easily ~100Mo. A hundred Mo just to be able to make an inference isn’t a viable solution for an end product. A remote API can do the trick, but now your product needs to add encryption, you need to store and upload data, the user needs to have a reliable internet connection to have a decent speed. We can train a narrower network, they’ll probably fit in a small memory. But chances are they won’t be good enough at extracting complex features. And we’re not talking about ensembles. Ensembles are a great way to extract a lot of knowledge from the training data. But at test time it can be too expensive to run a hundred different models in parallel. The knowledge per parameter ratio is quite low.

Distilling the Knowledge in a Neural Network

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions [3]. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators [1] have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.

Predicting the Generalization Gap in Deep Neural Networks

Predicting the Generalization Gap in Deep Neural Networks Tuesday, July 9, 2019 Posted by Yiding Jiang, Google AI Resident Deep neural networks (DNN) are the cornerstone of recent progress in machine learning, and are responsible for recent breakthroughs in a variety of tasks such as image recognition, image segmentation, machine translation and more. However, despite their ubiquity, researchers are still attempting to fully understand the underlying principles that govern them. In particular, classical theories (e.g., VC-dimension and Rademacher complexity) suggest that over-parameterized functions should generalize poorly to unseen data, yet recent work has found that massively over-parameterized functions (orders of magnitude more parameters than the number of data points) generalize well. In order to improve models, a better understanding of generalization, which can lead to more theoretically grounded and therefore more principled approaches to DNN design, is required.

What’s wrong with the approach to Data Science?

The job ‘Data Scientist’ has been around for decades, it was just not called ‘Data Scientist’. Statisticians have used their knowledge and skills using machine learning techniques such as Logistic Regression and Random Forest for prediction and insights for longer than people actually realize.

Outliers detection with autoencoder, a neural network

Who deals with big dataset in order to use Machine Learning techniques knows that it is vital to keep data clean and to avoid data which is weird. In this point, outliers are a pain in the neck because they may make the results be misunderstood. Several methods can be used to remove outliers from the data, but this post will focus on an unsupervised Machine Learning technique: autoencoder, a kind of neural network. In this blog we have already seen several ways to detect outliers based on Machine Learning techniques, but now we describe a method which uses neural networks. As well, this blog has some explanations about neural networks and several examples of using them. I encourage you to go deeper into those posts to know all the information that has been published here.

Waffles are for Breakfast

It’s been a long time since my last update and I’ve decided to start with Tableau, of all topics! Although open source advocates do not look kindly upon Tableau, I find myself using it frequently and relearning all the stuff I can do in R. For my series of ‘how-to’s’ regarding Tableau, I’d like to start with posting about how to make a waffle chart in Tableau.

Correlation and Causation – How Alcohol Affects Life Expectancy

We hear this sentence over and over again. But what does that actually mean? This small analysis uncovers this topic with the help of R, and simple regressions, focusing on how alcohol impacts health.

How to Setup a Python Deep Learning Environment with Anaconda

For a beginner, setting up a Python environment and installing packages are little intimidating. Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment.

7 Important Ways Data Science Helps Human

The industries of healthcare and finance have one thing in common: they are both getting highly interrupted with the advancement of technology, namely data science. And this phenomenon is being highly encouraged as Data Science Helps Human. In 2017 alone, 3.5 million USD was invested in over 180 health companies. The core of significant transformation in the health industry, therefore, lies in data science. More than a billion clinical records are being created, for instance, in the US every year. Doctors and life scientists have an immense amount of data to base their studies on. Moreover, immense volumes of information related to health are made available through the large-scale choice of wearable gadgets. This opens the door to new innovations for more informed, better healthcare. The main objective for health data scientists working with the healthcare industry is to make sense of this huge data set and derive helpful insights from it so the human body and its issues can be understood better by healthcare providers. Therefore, data science can strongly transform healthcare.

How to Predict a Time Series Part 1

Time series forecasts are quite different from other supervised regression problems. My background is in business analytics, so I’ve learned quite a bit about classical forecasting methodology (arima, exponential smoothing state space models , moving average etc…). When talking to many data scientists, I have found that many of them know little about predicting time series and treat it like other supervised learning problems with little success (usually because they aren’t engineering the right features). The R forecast library is one of the most complete and popular libraries for handling and forecasting time series. While I do recognize that python has become more popular among data scientists, this does not mean that it is the best language for everything. Time series forecasting in R is much more mature and routine. The goal of this article is walk through forecasting workflow and evaluation. Thus, I will leave the math to a minimum in this blog post. There are many blog posts that deal with the math behind these methods and I will link to a few. In this blog post I ‘d like to share a few things I learned from forecasting a lot of time series using the forecast package. In follow up blog posts I will cover workflow with facebook prophet and workflow for one step ahead predictions using supervised machine learning algorithms.

How Machine Learning Can Lower the Search Cost for Finding Better Hikes

I recently went on a weekend camping trip in The Enchantments, which is just over a two hour drive from where I live in Seattle, WA. To plan for the trip, we relied on Washington Trails Association (WTA) and a few other resources to make sure we had the optimal trail routes and camping spots for each day. Many of these outdoor adventure resources can help folks plan for multi-day camping trips, figure out where to go for a hike with parents or make sure to correctly traverse Aasgard Pass, a sketchy 2300 feet elevation gain in less than a mile. But there is still something lacking.

Bias and Variance in Machine Learning

These concepts are important to both the theory and the practice of data science. They also come up in job interviews and academic exams. A biased predictor is eccentric, i.e. its predictions are consistently off. No matter how well it’s trained, it just doesn’t get it. Generally, such a predictor is too simple for the problem at hand. It under-fits the data, no matter how rich. A high-variance predictor is, in a sense, the opposite. It often arises when one tries to fix bias and over-compensates. One’s gone from a model that is too simple – i.e., biased – to one that is too complex – i.e. has high variance. It over-fits the data. Somewhere between these two is the ‘sweet spot’ – the best predictor. Often this is not easy to find. A data scientist can help. That’s another story …

My first year as a Project Manager for Artificial Intelligence (AI)

It has already been more than a year since I started working as a Project Manager for Artificial Intelligence (AI). I suppose you don’t notice the time passing when you love your job. I started onto this role with a background in wireless communication, something which is not usual and mostly helpful while working at a telecom operator. Since March 2018, learning has become an integral part of my life as I had a lot of catching up to do with data science (and still do). Since there is no college degree in AI project management, how could I adapt to this responsibility? Well, I learnt on the job.

R Packages worth a look

Detect Multiple Change Points from Time Series (offlineChange)
Detect the number and locations of change points. The locations can be either exact or in terms of ranges, depending on the available computational res …

Trend Estimation of Univariate and Bivariate Time Series with Controlled Smoothness (TSsmoothing)
It performs the smoothing approach provided by penalized least squares for univariate and bivariate time series, as proposed by Guerrero (2007) and Ger …

Hypothesis Testing Using the Overlapping Interval Estimates (intervcomp)
Performs hypothesis testing using the interval estimates (e.g., confidence intervals). The non-overlapping interval estimates indicates the statistical …

Create Beautiful, Customizable, Publication-Ready Summary Tables for Statistical Models (modelsummary)
Create beautiful, customizable, publication-ready summary tables for statistical models. ‘modelsummary’ leverages the power of the ‘gt’ and ‘broom’ pac …

If you did not already know

HyperAdam google
Deep neural networks are traditionally trained using human-designed stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as \textit{HyperAdam}, is proposed that combines the idea of ‘learning to optimize’ and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates. The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM. …

Summarized google
Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework. …

Teaching Risk google
Learning near-optimal behaviour from an expert’s demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner’s worldview, and thus ultimately enable her to find a near-optimal policy. …

ADNet google
Online video advertising gives content providers the ability to deliver compelling content, reach a growing audience, and generate additional revenue from online media. Recently, advertising strategies are designed to look for original advert(s) in a video frame, and replacing them with new adverts. These strategies, popularly known as product placement or embedded marketing, greatly help the marketing agencies to reach out to a wider audience. However, in the existing literature, such detection of candidate frames in a video sequence for the purpose of advert integration, is done manually. In this paper, we propose a deep-learning architecture called ADNet, that automatically detects the presence of advertisements in video frames. Our approach is the first of its kind that automatically detects the presence of adverts in a video frame, and achieves state-of-the-art results on a public dataset. …

If you did not already know

TrialChain google
The governance of data used for biomedical research and clinical trials is an important requirement for generating accurate results. To improve the visibility of data quality and analysis, we developed TrialChain, a blockchain-based platform that can be used to validate data integrity from large, biomedical research studies. We implemented a private blockchain using the MultiChain platform and integrated it with a data science platform deployed within a large research center. An administrative web application was built with Python to manage the platform, which was built with a microservice architecture using Docker. The TrialChain platform was integrated during data acquisition into our existing data science platform. Using NiFi, data were hashed and logged within the local blockchain infrastructure. To provide public validation, the local blockchain state was periodically synchronized to the public Ethereum network. The use of a combined private/public blockchain platform allows for both public validation of results while maintaining additional security and lower cost for blockchain transactions. Original data and modifications due to downstream analysis can be logged within TrialChain and data assets or results can be rapidly validated when needed using API calls to the platform. The TrialChain platform provides a data governance solution to audit the acquisition and analysis of biomedical research data. The platform provides cryptographic assurance of data authenticity and can also be used to document data analysis. …

Unsupervised Temperature Scaling (UTS) google
Great performances of deep learning are undeniable, with impressive results on wide range of tasks. However, the output confidence of these models is usually not well calibrated, which can be an issue for applications where confidence on the decisions is central to bring trust and reliability (e.g., autonomous driving or medical diagnosis). For models using softmax at the last layer, Temperature Scaling (TS) is a state-of-the-art calibration method, with low time and memory complexity as well as demonstrated effectiveness. TS relies on a T parameter to rescale and calibrate values of the softmax layer, using a labelled dataset to determine the value of that parameter.We are proposing an Unsupervised Temperature Scaling (UTS) approach, which does not dependent on labelled samples to calibrate the model,allowing, for example, using a part of test samples for calibrating the pre-trained model before going into inference mode. We provide theoretical justifications for UTS and assess its effectiveness on the wide range of deep models and datasets. We also demonstrate calibration results of UTS on skin lesion detection, a problem where a well-calibrated output can play an important role for accurate decision-making. …

You Only Look Once (YOLO) google
We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is far less likely to predict false detections where nothing exists. Finally, YOLO learns very general representations of objects. It outperforms all other detection methods, including DPM and R-CNN, by a wide margin when generalizing from natural images to artwork on both the Picasso Dataset and the People-Art Dataset. …

MEBoost google
Class imbalance problem has been a challenging research problem in the fields of machine learning and data mining as most real life datasets are imbalanced. Several existing machine learning algorithms try to maximize the accuracy classification by correctly identifying majority class samples while ignoring the minority class. However, the concept of the minority class instances usually represents a higher interest than the majority class. Recently, several cost sensitive methods, ensemble models and sampling techniques have been used in literature in order to classify imbalance datasets. In this paper, we propose MEBoost, a new boosting algorithm for imbalanced datasets. MEBoost mixes two different weak learners with boosting to improve the performance on imbalanced datasets. MEBoost is an alternative to the existing techniques such as SMOTEBoost, RUSBoost, Adaboost, etc. The performance of MEBoost has been evaluated on 12 benchmark imbalanced datasets with state of the art ensemble methods like SMOTEBoost, RUSBoost, Easy Ensemble, EUSBoost, DataBoost. Experimental results show significant improvement over the other methods and it can be concluded that MEBoost is an effective and promising algorithm to deal with imbalance datasets. …

Document worth reading: “Shannon’s entropy and its Generalizations towards Statistics, Reliability and Information Science during 1948-2018”

Starting from the pioneering works of Shannon and Weiner in 1948, a plethora of works have been reported on entropy in different directions. Entropy-related review work in the direction of statistics, reliability and information science, to the best of our knowledge, has not been reported so far. Here we have tried to collect all possible works in this direction during the period 1948-2018 so that people interested in entropy, specially the new researchers, get benefited. Shannon’s entropy and its Generalizations towards Statistics, Reliability and Information Science during 1948-2018