Nuts & Bolts of Reinforcement Learning: Model Based Planning using Dynamic Programming

In this article, however, we will not talk about a typical RL setup but explore Dynamic Programming (DP). DP is a collection of algorithms that can solve a problem where we have the perfect model of the environment (i.e. probability distributions of any change happening in the problem setup are known) and where an agent can only take discrete actions. DP essentially solves a planning problem rather than a more general RL problem. The main difference, as mentioned, is that for an RL problem the environment can be very complex and its specifics are not known at all initially. But before we dive into all that, let´s understand why you should learn dynamic programming in the first place using an intuitive example.

The Machine Learning Cheatsheet

The idea with this project is to create a simple, concise, potentially exhaustive document about the most common machine learning algorithms. A cheatsheet one could come back to for a quick read, in case of doubt or just to keep things clear. This cheatsheet focus on how algorithms work: the learning, the predictions, the representation or even the expected inputs. I also added a few business oriented usecases, in order to show the usability of those methods. But coding is not included in the document, as I consider it highly dependent on the chosen language and library. Furnished documentations often constitute a good, comprehensive knowledge base. Deep Learning and Reinforcement Learning methods are also not present here, as they surely require their own cheatsheets.

How to Put Active Learning to Work for Your Enterprise

In this eBook from Figure Eight and AWS you’ll learn what active learning is and how it works, the areas in which active learning can be particularly effective, and how active learning iteratively improves your model.

10 Big Data Trends You Should Know

1. Rapidly Growing IoT Networks
2. Accessible Artificial Intelligence
3. The Rise of Predictive Analytics
4. Dark Data Migration to the Cloud
5. Chief Data Officers Will Have Bigger Roles
6. Quantum Computing
7. Smarter and Tighter Cybersecurity
8. Open Source Solutions
9. Edge Computing
10. Smarter Chatbots

Principal Component Momentum?

So, here´s the basic idea: in an allegedly balanced universe, containing both aggressive (e.g. equity asset class ETFs) assets and defensive assets (e.g. fixed income asset class ETFs), that principal component analysis, a cornerstone in machine learning, should have some effectiveness at creating an effective portfolio.

The Access Problem in AI

Machine learning algorithms are performing better than experts at their narrowly-defined tasks, and the labour market is naturally concerned about automation. When deep learning performs better than cardiologists at reading EKGs, at-par with human radiologists in detecting pneumonia, and almost as good as humans at recognizing conversational speech, the concerns of a drastic shifts in both education and employment of human resource seem legitimate, the ‘future of work’ seems in disarray and no job seems too safe from disruption.

Google Dataset Search

Dataset Search enables users to find datasets stored across thousands of repositories on the Web, making these datasets universally accessible and useful. Datasets and related data tend to be spread across multiple data repositories on the web. In many cases, information about these datasets is neither linked nor has it been indexed by search engines, making data discovery tedious or, in some cases, impossible. By providing our users with a single interface that allows them to search across multiple repositories, we hope to transform how data is being published and used. We also believe that this project will have the additional benefits of a) creating a data sharing ecosystem that will encourage data publishers to follow best practices for data storage and publication and b) giving scientists a way to show the impact of their work through citation of datasets that they have produced.

Self Organizing Maps

Recently, I learned about SOMs while applying for an internship. I thought I should share it with everyone since it is a very useful technique for clustering analysis, and exploring data. Also, we’ll discuss why it’s probably not the most popular technique for the same. Okay, let’s do this.

Building Reproducible Data Packages with DataPackageR

Sharing data sets for collaboration or publication has always been challenging, but it´s become increasingly problematic as complex and high dimensional data sets have become ubiquitous in the life sciences. Studies are large and time consuming; data collection takes time, data analysis is a moving target, as is the software used to carry it out. In the vaccine space (where I work) we analyze collections of high-dimensional immunological data sets from a variety of different technologies (RNA sequencing, cytometry, multiplexed antibody binding, and others). These data often arise from clinical trials and can involve tens to hundreds of subjects. The data are analyzed by teams of researchers with a diverse variety of goals. Data from a single study will lead to multiple manuscripts by different principal investigators, dozens of reports, talks, presentations. There are many different consumers of data sets, and results and conclusions must be consistent and reproducible. Data processing pipelines tend to be study specific. Even though we have great tidyverse tools that make data cleaning easier, every new data set has idiosyncracies and unique features that require some bespoke code to convert them from raw to tidy data.

The seven tools of causal inference with reflections on machine learning

In this technical report Judea Pearl reflects on some of the limitations of machine learning systems that are based solely on statistical interpretation of data. To understand why? and to answer what if? questions, we need some kind of a causal model. In the social sciences and especially epidemiology, a transformative mathematical framework called ‘Structural Causal Models’ (SCM) has seen widespread adoption. Pearl presents seven example tasks which the model can handle, but which are out of reach for associational machine learning systems.

Efficient data management and SQL data selection in R

Before running your data analysis, every data scientist needs to make data management, data cleaning and data selection.

Docstrings in Python

Get introduced to Docstrings in Python. Learn more about the different types of writing Docstrings, such as One-line Docstrings and Multi-line Docstrings, popular Docstring formats with their uses. Along with the built-in Docstrings.

Snakes in a Package: combining Python and R with reticulate

When I first started working as a data scientist (or something like it) I was told to program in C++ and Java. Then R came along and it was liberating, my ability to do data analysis increased substantially. As my applications grew in size and complexity, I started to miss the structure of Java/C++. At the time, Python felt like a good compromise so I switched again. After joining Mango Solutions I noticed I was not an anomaly, most data scientists here know both Python and R. Nowadays whenever I do my work in R there is a constant nagging voice in the back of my head telling me ‘you should do this in Python’. And when I do my work in Python it´s telling me ‘you can do this faster in R’. So when the reticulate package came out I was overjoyed and in this blogpost I will explain to you why.

Generative Adversarial Networks (GANs) – A Beginner´s Guide

It is CRAZY just how much you know about world. You understand we live in 3D environments, objects move, people talk, animals fly. There is an extraordinary amount of data in the world and much of it is easily accessible?-?the difficult part is developing algorithms that can analyze and understand this abundance of data. Generative models are one of the most promising approaches towards this goal. Generative models have many short-term applications, but in the long-term they have the potential to learn the natural features of a dataset, whether that’s categories or pixels or audio samples or something else entirely.

PCA & Autoencoders: Algorithms Everyone Can Understand

This primary focus of this article is to provide easy-to-understand intuition for the Principal Components Analysis (PCA) and Autoencoder data transformation techniques. I’m not going to delve deep into the mathematical theory underpinning these models as there are a plethora of resources already available.