Transfer Learning – Machine Learning’s Next Frontier

In recent years, we have become increasingly good at training deep neural networks to learn a very accurate mapping from inputs to outputs, whether they are images, sentences, label predictions, etc. from large amounts of labeled data. What our models still frightfully lack is the ability to generalize to conditions that are different from the ones encountered during training. When is this necessary? Every time you apply your model not to a carefully constructed dataset but to the real world. The real world is messy and contains an infinite number of novel scenarios, many of which your model has not encountered during training and for which it is in turn ill-prepared to make predictions. The ability to transfer knowledge to new conditions is generally known as transfer learning and is what we will discuss in the rest of this post. Over the course of this blog post, I will first contrast transfer learning with machine learning’s most pervasive and successful paradigm, supervised learning. I will then outline reasons why transfer learning warrants our attention. Subsequently, I will give a more technical definition and detail different transfer learning scenarios. I will then provide examples of applications of transfer learning before delving into practical methods that can be used to transfer knowledge. Finally, I will give an overview of related directions and provide an outlook into the future.

Domain Adaptation

Domain adaptation is a field associated with machine learning and transfer learning. This scenario arises when we aim at learning from a source data distribution a well performing model on a different (but related) target data distribution. For instance, one of the tasks of the common spam filtering problem consists in adapting a model from one user (the source distribution) to a new one who receives significantly different emails (the target distribution). Note that, when more than one source distribution is available the problem is referred to as multi-source domain adaptation.

20 useful tools for open-source maintainers

• IssueHunt – Created by BoostIO
• Jenkins – Created by Jenkins
• CircleCI – Created by Circle Internet Services, Inc.,
• Travis CI – Created by Travis CI
• Probot – Created by Brandon Keepers
• Stale – Created by Brandon Keepers
• Todo – Created by Jason Etcovitch
• Release Drafter – Created by Tim Lucas
• GitHub Polls Bot – Created by Michael Hsu
• Mergeable – Created by Justin Law
• commitlint [bot] – Created by Ahmed T. Ali
• react-preview – Created by Aditya Agarwal
• ForkHub – Created by Jon Ander Peñalba
• WIP – Created by Gregor Martynus
• ImgBot – Created by Dan Butvinik
• typot – Created by Takahiro Kubo
• backlog – Created by Abi Noda
• Redmine – Created by Redmine
• Wrike – Created by Wrike, Inc.
• Asana – Created by Asana
• Flow – Created by Flow

Create easy automated dashbords with R and Markdown

In this article, you learn how to make a visualization dashboard in R. First you need to install the rmarkdown rmarkdown package into your R library. Assuming that you installed the rmarkdown rmarkdown , next you create a new rmarkdown rmarkdown script in R. I will use the visualization code from the credit modeling visualization article that I have previously published at DataScience+.

Top 10 Python Data Science Libraries

The third part of our series investigating the top Python Libraries across Machine Learning, AI, Deep Learning and Data Science.

Convert Data Frame to Dictionary List in R

In R, there are a couple ways to convert the column-oriented data frame to a row-oriented dictionary list or alike, e.g. a list of lists. In the code snippet below, I would show each approach and how to extract keys and values from the dictionary. As shown in the benchmark, it appears that the generic R data structure is still the most efficient.

Example of Overfitting

I occasionally see queries on various social media as to overfitting – what is it?, etc. I’ll post an example here. (I mentioned it at my talk the other night on our novel approach to missing values, but had a bug in the code. Here is the correct account.) The dataset is prgeng, on wages of programmers and engineers in Silicon Valley as of the 2000 Census. It’s included in our polyreg package, which we developed as an alternative to neural networks. But it is quite useful in its own right, as it makes it very convenient to fit multivariate polynomial models. (Not as easy as it sounds; e.g. one must avoid cross products of orthogonal dummy variables, powers of those variables, etc.)

The tidy caret interface in R

Among most popular off-the-shelf machine learning packages available to R, caret ought to stand out for its consistency. It reaches out to a wide range of dependencies that deploy and support model building using a uniform, simple syntax. I have been using caret extensively for the past three years, with a precious partial least squares (PLS) tutorial in my track record. A couple of years ago, the creator and maintainer of caret Max Kuhn joined RStudio where he has contributing new packages to the ongoing tidy-paranoia – the supporting recipes, yardstick, rsample and many other packages that are part of the tidyverse paradigm and I knew little about. As it happens, caret is now best used with some of these. As an aspiring data scientist with fashionable hex-stickers on my laptop cover and a tendency to start any sentence with ‘big data’, I set to learn tidyverse and going Super Mario using pipes (%>%, Ctrl + Shift + M).

Using a genetic algorithm for the hyperparameter optimization of a SARIMA model

In this blog post, I’ll use the data that I cleaned in a previous blog post, which you can download here. If you want to follow along, download the monthly data. In my last blog post I showed how to perform a grid search the ‘tidy’ way. As an example, I looked for the right hyperparameters of a SARIMA model. However, the goal of the post was not hyperparameter optimization per se, so I did not bother with tuning the hyperparameters on a validation set, and used the test set for both validation of the hyperparameters and testing the forecast. Of course, this is not great because doing this might lead to overfitting the hyperparameters to the test set. So in this blog post I split my data into trainig, validation and testing sets and use a genetic algorithm to look for the hyperparameters. Again, this is not the most optimal way to go about this problem, since the {forecast} package contains the very useful auto.arima() function. I just wanted to see what kind of solution a genetic algorithm would return, and also try different cost functions. If you’re interested, read on!

PortfolioAnalytics: An open source library for portfolio optimisation

PortfolioAnalytics is a robust open source library for portfolio optimisation and analytics. This library implements classical portfolio optimisation methods such as Efficient Frontier, Sharpe ratio and Mean Variance, along with modern developments in the field such as Shrinkage estimators, Maximum likelihood estimators and Kelly Criterion. It also implements a novel shrinkage estimator and optimisation with higher moments. For a full documentation on the usage, head over to ReadTheDocs. Season 1 Episode 5.3 – ‘COLLABORATIVE FILTERING USING NEURAL NETWORK’

Welcome to the Third Part of the Fifth Episode of Fastdotai where we will deal with Collaborative Filtering using Neural Network?-?A technique widely used in Recommendation System. Before we start , I would like to thank Jeremy Howard and Rachel Thomas for their efforts to democratize AI.

Implementing Spatial Batch / Instance / Layer Normalization in Tensorflow

This post is a simple review of implementing different normalization layers.

Deploying a Keras Deep Learning Model as a Web Application in Python

Building a cool machine learning project is one thing, but at the end of the day, you want other people to be able to see your hard work. Sure, you could put the whole project on GitHub, but how are your grandparents supposed to figure that out? No, what we want is to deploy our deep learning model as a web application accessible to anyone in the world. In this article, we’ll see how to write a web application that takes a trained Keras recurrent neural network and allows users to generate new patent abstracts. This project builds on work from the Recurrent Neural Networks by Example article, but knowing how to create the RNN isn’t necessary. We’ll just treat it as a black box for now: we put in a starting sequence, and it outputs an entirely new patent abstract that we can display in the browser! Traditionally, data scientists develop the models and front end engineers show them to the world. In this project, we’ll have to play both roles, and dive into web development (almost all in Python though).

Essential text correction process for NLP tasks

In previous story, Norvig’s algorithm is introduced to correct spelling error. It used 4 operations (i.e. Deletion, Transposition, Replacement and Insertion). In this story, I would like to introduce another method which only using deletion operation to find the potential correct words. When dealing with text, we may need to deal with incorrect text. Although we can still use character embeddings and word embeddings to compute a similar vectors. It is good for unseen data and out-of-vocabulary (OOV). However, it will be better if we can correct typo.

Review: Dilated Convolution (Semantic Segmentation)

This time, Dilated Convolution, from Princeton University and Intel Lab, is briefly reviewed. The idea of Dilated Convolution is come from the wavelet decomposition. It is also called ‘atrous convolution’, ‘algorithme à trous’ and ‘hole algorithm’. Thus, any ideas from the past are still useful if we can turn them into the deep learning framework.

Interpretability of Deep Learning Models

Model Interpretability of Deep Neural Networks (DNN) has always been a limiting factor for use cases requiring explanations of the features involved in modelling and such is the case for many industries such as Financial Services. Financial institution whether by regulation or by choice prefer structural models that are easy to interpret by humans that’s why deep learning models within these industries have had slow adoptions. An example of a critical use case would be risk models where usually banks prefer classic statistical methods such as Generalized Linear Models, Bayesian Models and traditional machine learning models such as Tree-based models that are easily explainable and interpret in terms of human intuition. Interpretability since the beginning has been an important area of research since Deep Learning models can achieve high accuracy but at the expense of high abstraction (i.e. accuracy vs interpretability problem). This is important also because of Trust since a model that is not trusted is a model that will not be used (i.e. try selling to upper management a black box model).

Best Resources to learn AI & Machine Learning

I have tried to make a compilation of some of the best resources one should learn from basics to advance with a sequence of courses & blogs. I have also provided many amazing additional resources and tips which I wish I had known when I started out. I have got for you almost everything that you need to be Data Scientist or ML Engineer.

The Magic of Stratification in Data Analysis

For my very first post on Medium I’m going to briefly go over what I consider the single most fundamental problem of statistics?-?that of confounding, and more importantly how we might deal with this problem using pandas, and some visualization.

Text Classification using K Nearest Neighbors

We’ll define K Nearest Neighbor algorithm for text classification with Python. KNN algorithm is used to classify by finding the K nearest matches in training data and then using the label of closest matches to predict. Traditionally, distance such as euclidean is used to find the closest match.

Conversational AI: The Imitation Machine

The development of conversational AI has been underway for more than 60 years, in large part driven by research done in the field of natural language processing (NLP). In the 1980s, the departure from hand-written rules and shift to statistical approaches enabled NLP to be more effective and versatile in handling real data (Nadkarni, P.M. et al. 2011, p. 545). Since then, this trend has only grown in popularity, notably fuelled by the wide application of deep learning technologies. NLP in recent years finds remarkable success in classification, matching, translation, and structured prediction (Li, H. 2017, p. 2), tasks easier accomplished through statistic models. Naturalistic multi-turn dialogue still proves challenging, however, which some believe will remain unsolved until we develop an artificial general intelligence that is capable of ‘natural language understanding’ (Bailey, K. 2017).

What is the C4.5 algorithm and how does it work?

The C4.5 algorithm is used in Data Mining as a Decision Tree Classifier which can be employed to generate a decision, based on a certain sample of data (univariate or multivariate predictors).

Neural Network Introduction for Software Engineers 1 – A Vanilla MLP

This first blog post will help you design a neural network in Python/Numpy. It will demonstrate the downfalls of vanilla Multi Layer Perceptrons (MLPs), propose a few simple augmentations, and show how important they are. We will conclude by demonstrating how this could look in a well organized software engineering package.