Reinforcement Learning: From Grid World to Self-Driving Cars

Underlying many of the major announcements from researchers in Artificial Intelligence in the last few years is a discipline known as reinforcement learning (RL). Recent breakthroughs are mostly driven by minor twists on on classic RL ideas, enabled by the availability of powerful computing hardware and software that leverages said hardware. To get an idea of just how hungry modern deep RL models are for compute, the following table is a non-exhaustive collection of recent RL advances and estimates of the computational resources required to accomplish each task.

Reinforcement Learning with Hindsight Experience Replay

Reinforcement learning has gained a lot of popularity in recent years due some spectacular successes such as defeating the Go world champion and (very recently) winning matches against top professionals in the popular Real time strategy game StarCraft 2. One of the impressive aspects of achievements such as that of AlphaZero (the latest Go playing agent) is that it learns from sparse binary rewards, it either wins or loses the game. Having no intermediate rewards during the episodes makes learning extremely difficult in most cases, as the agent might never actually win, and therefore have no feedback on how to improve its performance. Apparently, games such as Go and StarCraft 2 (at least the way it was played in the matches) have some unique qualities that make it possible to learn with these binary rewards: they are symmetric zero-sum games. I am not going to go further into this right now, but I will probably devote a future article to the algorithm behind AlphaZero.

Reinforcement Learning with Exploration by Random Network Distillation

Ever since the seminal DQN work by DeepMind in 2013, in which an agent successfully learned to play Atari games at a level that is higher than an average human, Reinforcement Learning (RL) has been making headlines frequently. From Atari games to robotics, and the amazing defeat of world Go champion Lee-Sedol by AlphaGo, it seemed as though RL was about to take over the world by storm.

Reinforcement Learning Tutorial Part 1: Q-Learning

This is the first part of a tutorial series about reinforcement learning. We will start with some theory and then move on to more practical things in the next part. During this series, you will not only learn how to train your model, but also what is the best workflow for training it in the cloud with full version control using the Valohai deep learning management platform.

Quantile regression in R

While the conditional mean function is often what we want to model, sometimes we may want to model something else. On a recent episode of the Linear Digressions podcast, Katie and Ben talked about a situation in Uber where it might make sense to model a conditional quantile function.

Notes on Artificial Intelligence, Machine Learning and Deep Learning for curious people

AI has been the most intriguing topic of 2018 according to McKinsey. Many people make referrals to AI without actually knowing what it really means. There is public debate on whether it is an evil or savior for humanity. Thus this is yet another attempt to compile & explain the introductory AI/ML concepts to go beyond this buzz for non-practitioners and curious people. Artificial intelligence as an academic discipline was founded in 50s. Actually the ‘AI’ term was coined by John McCarthy, an American computer scientist, back in 1956 at The Dartmouth Conference. According to John McCarthy, AI is ‘The science and engineering of making intelligent machines, especially intelligent computer programs’.

Modeling cumulative impact – Part I

Welcome to Modeling cumulative impact, a series that views the cumulative impact of athletic training on sports performance through a variety of modeling lenses. The journey starts here in Part I with a model from physiology, the ‘fitness-fatigue model,’ that uses convolutions of training intensities with exponential decay. The series will transition towards more general purpose techniques, including splines, Kalman filters and LSTM-powered neural networks, starting in Part II. Longitudinal data sets, especially those featuring rich event histories, are difficult to find in the public domain. This article features a simulated hammer thrower modeled after a real competitor as described in a 1994 paper by Busso, Candau, and Lacour [1], hereby referred to as ‘BCL94.’ The hammer thrower, who we’ll call ‘H.T.,’ was aged 20 at the time of the study with 7 years of hammer throwing experience. All code for this simulation is on Github.

Model-Free Prediction: Reinforcement Learning

Previously, we looked at planning by dynamic programming to solve a known MDP. In this post, we will use model-free prediction to estimate the value function of an unknown MDP. i.e We will look at policy evaluation of an unknown MDP. This series of blog posts contain a summary of concepts explained in Introduction to Reinforcement Learning by David Silver.

ML Algorithms: One SD – Instance-based Algorithms

The obvious questions to ask when facing a wide variety of machine learning algorithms, is ‘which algorithm is better for a specific task, and which one should I use?’ Answering these questions vary depending on several factors, including: (1) The size, quality, and nature of data; (2) The available computational time; (3) The urgency of the task; and (4) What do you want to do with the data. This is one section of the many algorithms I wrote about in a previous article. In this part I tried to display and briefly explain the main algorithms (though not all of them) that are available for instance-based tasks as simply as possible.

Machine Learning Versus The News

I recently spent several months applying Natural Language Processing (NLP) techniques to a large corpus of news stories. My goal was to find a way to identify news articles that cover the same story, then evaluate those articles to understand how uniform or balanced the media’s coverage was.

Machine Learning Techniques applied to Stock Price Prediction

Machine learning has many applications, one of which is to forecast time series. One of the most interesting (or perhaps most profitable) time series to predict are, arguably, stock prices. Recently I read a blog post applying machine learning techniques to stock price prediction. You can read it here. It is a well-written article, and various techniques were explored. However, I felt the problem could be handled with a bit more academic rigor. For example, in the article the methods ‘Moving Average’, ‘Linear Regression’, ‘k-Nearest Neighbors’, ‘Auto ARIMA’ and ‘Prophet’ had a forecast horizon of 1 year, whereas ‘Long Short Term Memory (LSTM)’ had a forecast horizon of 1 day. Towards the end of the article, it is stated ‘LSTM has easily outshone any algorithm we saw so far.’ But clearly, we are not comparing apples to apples here.

Machine Learning Integration Options

Machine learning projects are inherently different from traditional IT projects in that they are significantly more heuristic and experimental, requiring skills spanning multiple domains, including statistical analysis, data analysis and application development. Most organizations have defined the process to build, train and test machine learning models. The challenge has been figuring out what to do once the model is built. Integration, deployment and monitoring are essential aspects to provide for continuous feedback once the models are in production.

Machine Learning and Particle Motion in Liquids: An Elegant Link

The gradient descent algorithm is one of the most popular optimization techniques in machine learning. It comes in three flavors: batch or ‘vanilla’ gradient descent (GD), stochastic gradient descent (SGD), and mini-batch gradient descent which differ in the amount of data used to compute the gradient of the loss function at each iteration. The goal of this article is to describe the progress in the search for global optimizers based on Langevin Dynamics (LD), a modeling approach for molecular motion which has its origins on works by Albert Einstein and Paul Langevin on statistical mechanics in the early 1900s. I will provide an elegant explanation, from the perspective of theoretical physics, to why variants of gradient descent are efficient global optimizers.

Machine Learning and AI using Microsoft Cognitive Services

Machine Learning has always been a complex subject. No doubt the salaries of the Machine Learning experts are among the highest. It is equally true that it has a steep learning curve. In this course, we are going to make it super easy for any developer to embed Machine Learning and AI in their applications. You will not need all the complexities of Mathematics and Statistics. Welcome to the course on Machine Learning using Azure Cognitive Services which is part of the Artificial Intelligence services of Microsoft.