Forecast KPI: RMSE, MAE, MAPE and Bias

Measuring forecast accuracy (or error) is not an easy task as there is no one-size-fits-all indicator. Only experimentation will show you what Key Performance Indicator (KPI) is best for you. As you will see, each indicator will avoid some pitfalls but will be prone to others.

Inferential Statistics: Understanding Hypothesis Testing Using Chi-Square Test

As data science engineer it’s imperative that the sample data set which you pick from the population data is reliable, clean and well tested for its usability in Machine learning model building. So how do you do that ? Well, we have multiple statistical techniques like descriptive statistic where we measure the data central value, how it is spread across the mean/median. Is it normally distributed or there is a skew in the data spread. Please refer my previous article on the same for more clarity.

Harvard Data Science Review – A Microscopic, Telescopic, and Kaleidoscopic View of Data Science

HDSR is an open access journal of the Harvard Data Science Initiative published by the MIT Press.

A Turbulent Year: The 2019 Data & AI Landscape

It has been another intense year in the world of data, full of excitement but also complexity. As more of the world gets online, the ‘datafication’ of everything continues to accelerate. This mega-trend keeps gathering steam, powered by the intersection of separate advances in infrastructure, cloud computing, artificial intelligence, open source and the overall digitalization of our economies and lives.

Part II: Major Trends in the 2019 Data & AI Landscape

Part I of the 2019 Data & AI Landscape covered issues around the societal impact of data and AI, and included the landscape chart itself. In this Part II, we’re going to dive into some of the main industry trends in data and AI. The data and AI ecosystem continues to be one of the most exciting areas of technology. Not only does it have its own explosive momentum, but it also powers and accelerates innovation in many other areas (consumer applications, gaming, transportation, etc). As such, its overall impact is immense, and goes much beyond the technical discussions below. Of course, no meaningful trend unfolds over the course of just one year, and many of the following has been years in the making. We’ll focus the discussion on trends that we have seen particularly accelerating in 2019, or gaining rapid prominence in industry conversations. We will loosely follow the order of the landscape, from left to right: infrastructure, analytics and applications.

The data that trains AI increasingly calls into question AI

After 10 years of ImageNet, AI researchers are digging into the details of test sets and some are asking just how much knowledge has really been created with machine learning.

AI analyzed 3.3 million scientific abstracts and discovered possible new materials

A new paper shows how natural-language processing can accelerate scientific discovery.
The context: Natural-language processing has seen major advancements in recent years, thanks to the development of unsupervised machine-learning techniques that are really good at capturing the relationships between words. They count how often and how closely words are used in relation to one another, and map those relationships in a three-dimensional vector space. The patterns can then be used to predict basic analogies like ‘man is to king as woman is to queen,’ or to construct sentences and power things like autocomplete and other predictive text systems.
New application: A group of researchers have now used this technique to munch through 3.3 million scientific abstracts published between 1922 and 2018 in journals that would likely contain materials science research. The resulting word relationships captured fundamental knowledge within the field, including the structure of the periodic table and the way chemicals’ structures relate to their properties. The paper was published in Nature last week.

Object-oriented programming for data scientists: Build your ML estimator

Implement some of the core OOP principles in a machine learning context by building your own Scikit-learn-like estimator, and making it better.

A Basic NLP Tutorial for News Multiclass Categorization

Let’s understand how to do an approach for multiclass classification for text data in Python through identify the type of news based on headlines and short descriptions.

A Simple Framework for Designing Choices

Software is usually designed as a choose-your-own-adventure affair. To complete tasks, users move through an application by making a series of choices based on available options. This can include choosing an item from a menu, choosing the appropriate tool from a toolbar, or selecting a piece of content from a list. The user is always free to decide for themselves, but the design and presentation of these options has the power to greatly influence the choices they make.

Statistics for people in a hurry

Ever wished someone would just tell you what the point of statistics is and what the jargon means in plain English? Let me try to grant that wish for you! I’ll zoom through all the biggest ideas in statistics in 8 minutes! Or just 1 minute, if you stick to the large font bits.


ONNX is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners.

BERT Explained: State of the art language model for NLP

BERT (Bidirectional Encoder Representations from Transformers) is a recent paper published by researchers at Google AI Language. It has caused a stir in the Machine Learning community by presenting state-of-the-art results in a wide variety of NLP tasks, including Question Answering (SQuAD v1.1), Natural Language Inference (MNLI), and others. BERT’s key technical innovation is applying the bidirectional training of Transformer, a popular attention model, to language modelling. This is in contrast to previous efforts which looked at a text sequence either from left to right or combined left-to-right and right-to-left training. The paper’s results show that a language model which is bidirectionally trained can have a deeper sense of language context and flow than single-direction language models. In the paper, the researchers detail a novel technique named Masked LM (MLM) which allows bidirectional training in models in which it was previously impossible.

The Scariest Thing About DeepNude Wasn’t the Software

At the end of June, Motherboard reported on a new app called DeepNude, which promised – ‘with a single click’ – to transform a clothed photo of any woman into a convincing nude image using machine learning. In the weeks since this report, the app has been pulled by its creator and removed from GitHub, though open source copies have surfaced there in recent days. Most of the coverage of DeepNude has focused on the specific dangers posed by its technical advances. ‘DeepNude is an evolution of that technology that is easier to use and faster to create than deepfakes,’ wrote Samantha Cole in Motherboard’s initial report on the app. ‘DeepNude also dispenses with the idea that this technology can be used for anything other than claiming ownership over women’s bodies.’ With its promise of single-click undressing of any woman, it made it easier than ever to manufacture naked photos – and, by extension, to use those fake nudes to harass, extort, and publicly shame women everywhere. But even following the app’s removal, there’s a lingering problem with DeepNude that goes beyond its technical advances and ease of use. It’s something older and deeper, something far more intractable – and far harder to erase from the internet – than a piece of open source code.