Known Unknowns: Designing Uncertainty Into the AI-Powered System

Uncertainty may be a fearful state for many people, but for data scientists and developers training the next wave of AI, uncertainty may be a good thing. Designing uncertainty directly into the system could help AI focus on what experts need to leverage state of the art AI and use it to inform our world.


Simple Trick to Normalize Correlations, R-squared, and so on

Many statistics, such as correlations or R-squared, depend on the sample size, making it difficult to compare values computed on two data sets of different sizes. Here, we address this issue. Below is an example with 20 observations. The 10 last observations (the second half of the data set) is a mirror of the first 10, and the two correlations, computed on each subset, are identical and equal to 0.30. The full correlation computed on the 20 observations is 0.85.


Is automatic detection of hidden knowledge an anomaly?

The quantity of documents being published requires researchers to specialize to a narrower field, meaning that inferable connections between publications (particularly from different domains) can be missed. This has given rise to automatic literature based discovery (LBD). However, unless heavily filtered, LBD generates more potential new knowledge than can be manually verified and another form of selection is required before the results can be passed onto a user. Since a large proportion of the automatically generated hidden knowledge is valid but generally known, we investigate the hypothesis that non trivial, interesting, hidden knowledge can be treated as an anomaly and identified using anomaly detection approaches.


Clearing air around ‘Boosting’

Understanding the current go-to algorithm for best results. Although this post is a little bit math oriented, still you can understand core working of Boosting and Gradient Boosting by only reading first two sections i.e. Introduction and History. Sections after that are explanation of different Gradient Boosting algorithm’s papers. … This is one of the post from my posts from Concepts category, which can be found on my github repo here.


A gentle introduction to GA2Ms, a white box model

This post is a gentle introduction to a white box machine learning model called a GA2M. We’ll walk through:
• What is a white box model, and why would you want one?
• A classic example white box model: logistic regression
• What is a GAM, and why would you want one?
• What is a GA2M, and why would you want one?
• When should you choose a GAM, a GA2M, or something else?


June Edition: Probability, Statistics & Machine Learning

Everyone wants to be in the field of Data Science and Analytics as it’s challenging, fascinating as well as rewarding. You have to be familiar in the core areas of Data Science, which hinges on the concepts of probability, statistics, machine learning algorithms, visualization, etc. and as a data scientist, these are an essential part of your data science journey, which is why you have to learn them… There are so many blogs, so many videos, so many crash courses available; it’s difficult to know where to start. To assist our readers, we have assembled a list of seven amazing articles touching each angle of Machine Learning with probability, statistics for data science aspirants, as well as those already in the discipline and wish to be in touch with the basics. We hope that these articles can further guide you in the right direction.


Introduction to recommender systems

During the last few decades, with the rise of Youtube, Amazon, Netflix and many other such web services, recommender systems have taken more and more place in our lives. From e-commerce (suggest to buyers articles that could interest them) to online advertisement (suggest to users the right contents, matching their preferences), recommender systems are today unavoidable in our daily online journeys. In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy or anything else depending on industries). Recommender systems are really critical in some industries as they can generate a huge amount of income when they are efficient or also be a way to stand out significantly from competitors. As a proof of the importance of recommender systems, we can mention that, a few years ago, Netflix organised a challenges (the ‘Netflix prize’) where the goal was to produce a recommender system that performs better than its own algorithm with a prize of 1 million dollars to win. In this article, we will go through different paradigms of recommender systems. For each of them, we will present how they work, describe their theoretical basis and discuss their strengths and weaknesses.


Parameters in Machine Learning algorithms.

I had the pleasure of being a student of Shailesh Kumar during my association with ISB, Hyderabad. Prof. Shailesh has a unique view point on how one can define a successful data scientist:
• A data scientist is one who is able to write the objective function that has to be optimized for a given problem.
• A data scientist is one who is able to understand the number of free parameters that need to be learned in solving the objective function.
• A data scientist is one who is able to understand the knob (or hyper parameters) that would control the complexity of the model.
I am writing this post for those who want to understand the role of parameters in a ML algorithm. The number of parameters needed to solve would directly influence the time and output of the training process. The below information would be useful for those to get an understanding of the various algorithms in ML.


Text Classification – RNN’s or CNN’s?

RNN is a class of artificial neural network where connections between nodes form a directed graph along a sequence. It is basically a sequence of neural network blocks that are linked to each other like a chain. Each one is passing a message to a successor. If you want to dive into the internal mechanics, I highly recommend Colah’s blog. This architecture allows RNN to exhibit temporal behavior and capture sequential data which makes it a more ‘natural’ approach when dealing with textual data since text is naturally sequential.


How to Automate Hyperparameter Optimization

A Beginner’s Guide to Using Bayesian Optimization With Scikit-Optimize. In the machine learning and deep learning paradigm, model ‘parameters’ and ‘hyperparameters’ are two frequently used terms where ‘parameters’ define configuration variables that are internal to the model and whose values can be estimated from the training data and ‘hyperparameters’ define configuration variables that are external to the model and whose values cannot be estimated from the training data ( What is the Difference Between a Parameter and a Hyperparameter? ). Thus, the hyperparameter values need to be manually assigned by the practitioner.


Tesla’s Deep Learning at Scale: Using Billions of Miles to Train Neural Networks

Training data is one of the fundamental factors that determine how well deep neural networks perform. (The other two are the network architecture and optimization algorithm.) As a general principle, more training data leads to better performance. This is why I believe Tesla, not Waymo, has the most promising autonomous vehicles program in the world.


Keep Your A.I. Buzzwords Straight

Artificial intelligence is having its moment. Business leaders can’t stop talking about it. New tech products invariably include it. And news headlines incessantly chronicle the buzz around it. But for many people, artificial intelligence remains a mystery. To help, we’ve created a guide that explains some of the key terms associated with the technology, an increasingly useful tool for businesses that improves as it crunches more data.


A Transformer Chatbot Tutorial with TensorFlow 2.0

The use of artificial neural networks to create chatbots is increasingly popular nowadays, however, teaching a computer to have natural conversations is very difficult and often requires large and complicated language models. With all the changes and improvements made in TensorFlow 2.0 we can build complicated models with ease. In this post, we will demonstrate how to build a Transformer chatbot. All of the code used in this post is available in this colab notebook, which will run end to end (including installing TensorFlow 2.0).


The Best and Most Current of Modern Natural Language Processing

Over the last two years, the Natural Language Processing community has witnessed an acceleration in progress on a wide range of different tasks and applications. ?? This progress was enabled by a shift of paradigm in the way we classically build an NLP system: for a long time, we used pre-trained word embeddings such as word2vec or GloVe to initialize the first layer of a neural network, followed by a task-specific architecture that is trained in a supervised way using a single dataset. Recently, several works demonstrated that we can learn hierarchical contextualized representations on web-scale datasets ?? leveraging unsupervised (or self-supervised) signals such as language modeling and transfer this pre-training to downstream tasks (Transfer Learning). Excitingly, this shift led to significant advances on a wide range of downstream applications ranging from Question Answering, to Natural Language Inference through Syntactic Parsing…


Medical AI has a big data problem

Facing increasingly overworked doctors and labyrinthine insurance systems, hospitals are searching for a lifeline in AI systems that promises to ease hard diagnoses and treatment decisions. Reality check: The data underpinning the very first systems is often spotty, volatile and completely lacking in critical context, leading to a poor early record in the field. The big picture: Basic clinical decision support (CDS) systems have been around for decades, but a skepticism of technology leads many doctors to ignore or override them. Now, experts say a nascent generation of CDS – infused with AI in academic labs and startups – may reduce the estimated 40,000-80,000 deaths a year that result from medical errors.


Almost Unsupervised Text to Speech and Automatic Speech Recognition

Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech processing and both achieve impressive performance thanks to the recent advance in deep learning and large amount of aligned speech and text data. However, the lack of aligned data poses a major practical problem for TTS and ASR on lowresource languages. In this paper, by leveraging the dual nature of the two tasks, we propose an almost unsupervised learning method that only leverages few hundreds of paired data and extra unpaired data for TTS and ASR. Our method consists of the following components: (1) a denoising auto-encoder, which reconstructs speech and text sequences respectively to develop the capability of language modeling both in speech and text domain; (2) dual transformation, where the TTS model transforms the text y into speech ^x, and the ASR model leverages the transformed pair (^x; y) for training, and vice versa, to boost the accuracy of the two tasks; (3) bidirectional sequence modeling, which addresses error propagation especially in the long speech and text sequence when training with few paired data; (4) a unified model structure, which combines all the above components for TTS and ASR based on Transformer model. Our method achieves 99.84% in terms of word level intelligible rate and 2.68 MOS for TTS, and 11.7% PER for ASR on LJSpeech dataset, by leveraging only 200 paired speech and text data (about 20 minutes audio), together with extra unpaired speech and text data.