Machine Learning Models as Micro Services in Docker

One of the biggest underrated challenges in machine learning development is the deployment of the trained models in production that too in a scalable way. One joke on it I have read is ‘Most common way, Machine Learning gets deployed today is powerpoint slides :)’.

The best way to create and manage training data – Labelbox is a collaborative training data platform for artificial intelligence applications.

We believe that AI has the power to transform every aspect of our lives — from healthcare to agriculture. The exponential impact of artificial intelligence will mean mammograms can happen quickly and cheaply irrespective of the limited number of radiologists there are in the world and growers will know the instant that disease hits their farm without even being there. At Labelbox, we’re building a platform to accelerate the development of this future. Rather than requiring companies to create their own expensive and incomplete homegrown tools, we’ve created a training data platform that acts as a central hub for humans to interface with AI. When humans have better ways to input and manage data, machines have better ways to learn.

Data Science Software Reviews: Forrester vs. Gartner

In my previous post, I discussed Gartner’s reviews of data science software companies. In this post, I show Forrester’s coverage and discuss how radically different it is. As usual, this post is already integrated into my regularly-updated article, The Popularity of Data Science Software. Forrester Research, Inc. is another company that reviews data science software vendors. Studying their reports and comparing them to Gartner’s can provide a deeper understanding of the software these vendors provide.


Microsoft TextWorld is an open-source, extensible engine that both generates and simulates text games. You can use it to train reinforcement learning (RL) agents to learn skills such as language understanding and grounding, combined with sequential decision making.

Viewing Matrices and Probability as Graphs

Today I’d like to share an idea. It’s a very simple idea. It’s not fancy and it’s certainly not new. In fact, I’m sure many of you have thought about it already. But if you haven’t – and even if you have! – I hope you’ll take a few minutes to enjoy it with me.

Why Data Science Teams Need Generalists, Not Specialists

In The Wealth of Nations, Adam Smith demonstrates how the division of labor is the chief source of productivity gains using the vivid example of a pin factory assembly line: ‘One [person] draws out the wire, another straights it, a third cuts it, a fourth points it, a fifth grinds it.’ With specialization oriented around function, each worker becomes highly skilled in a narrow task leading to process efficiencies. Output per worker increases many fold; the factory becomes extremely efficient at producing pins.

Deploy your PyTorch model to Production

Following the last article about Training a Choripan Classifier with PyTorch and Google Colab, we will now talk about what are some steps that you can do if you want to deploy your recently trained model as an API. The discussion on how to do this with is currently ongoing (more) and will most likely continue until PyTorch releases their official 1.0 version. You can find more information in the Forums, PyTorch Documentation/Forums, and their respective GitHub repositories.

Initialization Techniques for Neural Networks

In this blog, we will see some initialization techniques used in Deep Learning. Anyone that even has little background in Machine Learning must know that we need to learn weights or hyperparameters to make the model. These parameters govern how well our algorithm will perform on unseen data. To learn the model we need to initialize the parameters, apply the loss function and then optimize it. In this blog, we will focus on the initialization part of the network. If you ever have built any machine learning algorithm you must have heard that we need to ‘randomly’ initialize our weights as the starting point and then start the learning process. The word random is quite vague in itself. We will see what actually goes behind this word random and what are different initialization techniques.
1. Zero Intiliation
2. Initialization with the same Random value
3. Initialization with small Random values
4. Initialization with large Random values

Fuzzy logic and how it is curing cancer

It is everything that is not crisp/solid. A science that gives no 100% accurate answer, but the more adequate thing is that it does not require a 100% clear input. A simple example is a program that predicts the age of a person from their photo. It studies multiple features in the photo such as hair color, skin texture, wrinkles, etc. and estimates their age. Input: A photo that might not be of high-resolution/a side view photo. Output: A number that is -supposedly- close to this person’s age. Soft computing is based on fuzzy logic, which will be explained later.

Deep Deterministic Policy Gradients Explained

This post is a thorough review of Deepmind’s publication ‘Continuous Control With Deep Reinforcement Learning’ (Lillicrap et al, 2015), in which the Deep Deterministic Policy Gradients (DDPG) is presented, and is written for people who wish to understand the DDPG algorithm. If you are interested only in the implementation, you can skip to the final section of this post. This post is written with the assumption that the reader is familiar with basic reinforcement learning concepts, value & policy learning, and actor critic methods. If you are not completely familiar with those concepts, I have also written about policy gradients and actor critic methods. Familiarity with python and PyTorch will also be really helpful for reading through this post. If you are not familiar with PyTorch, try to follow the code snippets as if they are pseudo-code.

Intuitive Understanding of Attention Mechanism in Deep Learning

This is a slightly advanced tutorial and requires basic understanding of sequence to sequence models using RNNs. Please refer my earlier blog here wherein I have explained in detail the concept of Seq2Seq models.

Do you know how to choose the right machine learning algorithm among 7 different types?

1-Categorize the problem
2-Understand Your Data
3-Find the available algorithms
4-Implement machine learning algorithms.
5-Optimize hyperparameters. There are three options for optimizing hyperparameters, grid search, random search, and Bayesian optimization.

Importance of Feature Engineering methods

A Machine Learning algorithm takes some input for which it generates an output. For example, this could be a stock’s price for the next day using the prices of the last days. From the given input data it builds a mathematical model. That model finds patterns in the data in an automated fashion. These patterns provide insights and those are used to make decisions or predictions. To improve the performance of such algorithms, Feature Engineering is beneficial. Feature Engineering is a data preparation process. One modifies the data such that Machine Learning algorithms identify more patterns. This is done by combining and transforming existing features into new features. Applied Machine Learning is fundamentally Feature Engineering. But it is also difficult and time-consuming. It requires a lot of experience and domain knowledge. Evaluating what types of engineered features perform best can save a good amount of time. If an algorithm can learn a feature on its own, there is not much need to provide it.

A Beginner’s Guide to Apache Spark

The company founded by the creators of Spark?-?Databricks?-?summarizes its functionality best in their Gentle Intro to Apache Spark eBook (highly recommended read – link to PDF download provided at the end of this article): ‘Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of the time of this writing, Spark is the most actively developed open source engine for this task; making it the de facto tool for any developer or data scientist interested in Big Data. Spark supports multiple widely used programming languages (Python, Java, Scala, and R), includes libraries for diverse tasks ranging from SQL to streaming and machine learning, and runs anywhere from a laptop to a cluster of thousands of servers. This makes it an easy system to start with and scale up to Big Data processing on an incredibly large scale.’

Lookalikes: Finding needles in a haystack

At Schibsted, we use data science to build models that aggregate user behaviours and preferences. Advertisers can combine these to allocate users in groups, known as targeting segments. Depending on how they are defined, some of these segments can be quite niche, limiting their reach. However, advertisers always look for ways to further extend the potential audience for their message. What could be better than finding users that behave like the potential customers they are already targeting? It is safe to assume that these users would be more receptive of their message than the average user, thus providing a well-targeted audience. We call these additional users the lookalike users. This sounds great, but how can we find them?

GraphQL Will Power the Decentralized Web

For decades, the only widely used database in town was the SQL database, and it still lies at the heart of most of organizations, software applications, and institutions. With the advent of the blockchain, however, there is finally an alternative to the relational database, and I’m not just talking about the incremental improvements brought about by the NoSQL movement. We’re on the cusp of a new paradigm, dubbed the decentralized web or Web3, where data, rather than proprietary APIs, is the fundamental substrate for interoperability. In this post, I explain why this transition is happening, and how the GraphQL query language is uniquely positioned to power this new era of data interoperability.

Fascinating New Results in the Theory of Randomness

I present here some innovative results in my most recent research on stochastic processes. chaos modeling, and dynamical systems, with applications to Fintech, cryptography, number theory, and random number generators. While covering advanced topics, this article is accessible to professionals with limited knowledge in statistical or mathematical theory. It introduces new material not covered in my recent book (available here) on applied stochastic processes. You don’t need to read my book to understand this article, but the book is a nice complement and introduction to the concepts discussed here.

Predictive modeling of the performance of asynchronous iterative methods

Asynchronous algorithms may increase the performance of parallel applications on large-scale HPC platforms due to decreased dependence among processing elements. This work investigates strategies for implementing asynchronous hybrid parallel MPI-OpenMP iterative solvers. Seven different implementations are considered, and results show that striking a balance between communication and computation that balances the number of iterations in each processing element benefits performance and solution quality. A predictive performance model that utilizes kernel density estimation to model the underlying probability density function to the collected data is then developed to optimize execution parameters for a given problem. For the majority of iteration executions, the performance model matches within about 6% of the empirical data. The different hybrid parallel implementations are examined further to find optimal parametric distributions whose parameters can be tuned to the problem at hand. The generalized extreme value distribution was selected based on a combination of quantitative and qualitative comparisons, and for the most of the iteration executions, the model matches the data within about 6.1%. Results from the parametric distribution model are examined along with results of the model on related problems, and possible further extensions to the predictive model are discussed.