Keras Transfer Learning For Beginners

This blog consists of 3 parts:
1. What is transfer learning ?
2. Why does transfer learning work so well ?
3. Coding your first image recognizer using transfer learning.

What’s New in Deep Learning Research: Facebook Meta-Embeddings Allow NLP Models to Choose Their Own Architecture

Word embeddings have revolutionized the world of natural language processing(NLP). Conceptually, word embeddings are language modeling methods that map phrases or words in a sentence to vectors and numbers. One of the first steps in any NLP application is to determine what type of word embedding algorithm is going to be used. Typically, NLP models resort to pretrained word embedding algorithm such as Word2Vec, Glove or FastText. While that approach is relatively simple, it also results highly inefficient as is near to impossible to determine what word embedding will perform better as the NLP model evolves. What if the NLP model itself could select the best word-embedding for a given context? In a recent paper, researchers from Facebook’s Artificial Intelligence Research Lab(FAIR), proposed a method that allow NLP models to dynamically select a word-embedding algorithm that performs the best on a given environment. Dynamic Meta-Embeddings is a technique that combine different word-embedding models in an ensemble model and allows a NLP algorithm to choose what embedding to use based on their performance. Facebook’s technique, essentially, delays the selection of an embedding algorithm from design time to runtime based on the specific behavior of the ensemble.

Building a Convolutional Neural Network (CNN) in Keras

Deep Learning is becoming a very popular subset of machine learning due to its high level of performance across many types of data. A great way to use deep learning to classify images is to build a convolutional neural network (CNN). The Keras library in Python makes it pretty simple to build a CNN. Computers see images using pixels. Pixels in images are usually related. For example, a certain group of pixels may signify an edge in an image or some other pattern. Convolutions use this to help identify images.

Building a Logistic Regression in Python

Suppose you are given the scores of two exams for various applicants and the objective is to classify the applicants into two categories based on their scores i.e, into Class-1 if the applicant can be admitted to the university or into Class-0 if the candidate can’t be given admission. Can this problem be solved using Linear Regression? Let’s check.

Automated Hyper-parameter Optimization in SageMaker

So you’ve built your model and are getting sensible results, and are now ready to squeeze out as much performance as possible. One possibility is doing Grid Search, where you try every possible combination of hyper-parameters and choose the best one. That works well if your number of choices are relatively small, but what if you have a large number of hyper-parameters, and some are continuous values that might span several orders of magnitude? Random Search works pretty well to explore the parameter space without committing to exploring all of it, but is randomly groping in the dark the best we can do? Of course not. Bayesian Optimization is a technique for optimizing a function when making sequential decisions. In this case, we’re trying to maximize performance by choosing hyper-parameter values. This sequential decision framework means that the hyper-parameters you choose for the next step will be influenced by the performance of all the previous attempts. Bayesian Optimization makes principled decisions about how to balance exploring new regions of the parameter space vs exploiting regions that are known to perform well. This is all to say that it’s generally much more efficient to use Bayesian Optimization than alternatives like Grid Search and Random Search.

Stacked Neural Networks for Prediction

Machine learning and deep learning have found their place in financial institution for their power in predicting time series data with high degrees of accuracy. There is a lot of research going on to improve models so that they can predict data will higher degree of accuracy. This post is a write up about my project AlphaAI, which is a stacked neural network architecture that predicts the stock prices of various companies. This project is also one of the finalists at iNTUtion 2018, a hackathon for undergraduates here in Singapore.

Who am I connected to?

A problem that arise a lot when you play with data is to figure out how things are connected. It could be for example to determine from all your friends, and your friends connection, and your friends friends connections, … to whom you are directly or indirectly connected, or how many degrees of separation you have with such and such connection. Luckily there are some tools at your disposal to perform such analysis. Those tools comes under the umbrella of Network Theory and I will cover some basic tricks in this post.

Curiosity-Driven Learning made easy Part I

In the recent years, we’ve seen a lot of innovations in Deep Reinforcement Learning. From DeepMind and the Deep Q learning architecture in 2014 to OpenAI playing Dota2 with OpenAI five in 2018, we live in an exciting and promising moment. And today we’ll learn about Curiosity-Driven Learning, one of the most exciting and promising strategy in Deep Reinforcement Learning.

AI SERIES: Looking for a ‘Cognitive Operating System’

AI is a field of study that seeks to understand, develop and implement intelligent behavior into hardware and software systems to mimic and expand human-like abilities. To deliver its promise, AI implements various techniques in the field of Machine Learning (ML), which is a subset of studies that focus on developing software systems with the ability to learn new skills from experience, by trial and error or by applying known rules. Deep Learning (DL), is so far, the technique in Machine Learning that, by a wide margin, has delivered the most exciting results and practical use cases in domains such as speech and image recognition, language translation and plays a role in a wide range of current AI applications.

The Evolution of Analytics with Data

We have made a tremendous progress in the field of Information & Technology in recent times. Some of the revolutionary feats achieved in the tech-ecosystem are really worth commendable. Data and Analytics have been the most commonly-used words in the last decade or two. As such, it’s important to know why they are inter-related, what roles in the market are currently evolving and how they are reshaping businesses.

Optimization: Loss Function Under the Hood (Part III)

Continuing this journey, I have discussed the loss function and optimization process of linear regression at Part I, logistic regression at part II, and this time, we are heading to Support Vector Machine.

Loss Function (Part II): Logistic Regression

This series aims to explain loss functions of a few widely-used supervised learning models, and some options of optimization algorithms. In part I, I walked through the optimization process of Linear Regression in details by using Gradient Descent and using Least Squared Error as loss function. In this part, I will move to Logistic Regression.

Optimization: Loss Function Under the Hood (Part I)

When building a machine learning model, some questions similar like these usually comes into my mind: How does a model being optimized? Why does Model A outperform Model B? To answer them, I think one of entry points can be understanding loss functions of different models, and furthermore, being able to choose an appropriate loss function or self-define a loss function based on the goal of the project and the tolerance of error type. I will post a series of blogs discussing loss functions and optimization algorithms of a few common supervised learning models. I will try to explain in a way that is friendly to the audience who don’t have a strong mathematical background. Let’s start from Part I, Linear Regression.

Big Data, the why, how and what – A thought Process and Architecture

One of the most common questions I get when talking with customers, is how they are able to set up a good big data architecture that will allow them to process all their existing data. With as an ultimate goal to perform advanced analytics and AI on top of it, to extract insights that will allow them to stay relevant in the ever faster evolving world of today. To tackle this issue, I always first start by asking them what their understanding ‘Big Data’ is, because one customer is not the other. One might think that Big Data is just the way they are able to process all their different excel files, while another might think that it is the holy grail for all their projects and intelligence. Well in this article, I want to explain you what Big Data means to me and provide you a thought process that can help you in defining your Big Data strategy for your organization.

Understanding and visualizing DenseNets

Counter-intuitively, by connecting this way DenseNets require fewer parameters than an equivalent traditional CNN, as there is no need to learn redundant feature maps. Furthermore, some variations of ResNets have proven that many layers are barely contributing and can be dropped. In fact, the number of parameters of ResNets are big because every layer has its weights to learn. Instead, DenseNets layers are very narrow (e.g. 12 filters), and they just add a small set of new feature-maps. Another problem with very deep networks was the problems to train, because of the mentioned flow of information and gradients. DenseNets solve this issue since each layer has direct access to the gradients from the loss function and the original input image.

ResNet on CIFAR10

ImageNet dataset consist on a set of images (the authors used 1.28 million training images, 50k validation images and 100k test images) of size (224×224) belonging to 1000 different classes. However, CIFAR10 consist on a different set of images (45k training images, 5k validation images and 10k testing images) distributed into just 10 different classes. Because the sizes of the input volumes (images) are completely different, it is easy to think that the same structure will not be suitable to train on this dataset. We cannot perform the same reductions on the dataset without having dimensionality mismatches. We are going to follow the solution the authors give to ResNets to train on CIFAR10, which are also tricky to follow like for ImageNet dataset.

Understanding and visualizing ResNets

Researchers observed that it makes sense to affirm that ‘the deeper the better’ when it comes to convolutional neural networks. This makes sense, since the models should be more capable (their flexibility to adapt to any space increase because they have a bigger parameter space to explore). However, it has been noticed that after some depth, the performance degrades. This was one of the bottlenecks of VGG. They couldn’t go as deep as wanted, because they started to lose generalization capability.

Text Analytics APIs, Part 2: The Smaller Players

It seems like there’s yet another cloud-based text analytics Application Programming Interface (API) on the market every few weeks. If you’re interested in building an application using these kinds of services, how do you decide which API to go for? In the previous post in this series, we looked at the text analytics APIs from the behemoths in the cloud software world: Amazon, Google, IBM and Microsoft. In this post, we survey sixteen APIs offered by smaller players in the market.

Text Analytics APIs, Part 1: The Bigger Players

If you’re in the market for an off-the-shelf text analytics API, you have a lot of options. You can choose to go with a major player in the software world, for whom each AI-related service is just another entry in their vast catalogues of tools, or you can go for a smaller provider that focusses on text analytics as their core business. In this first of two related posts, we look at what the most prominent software giants have to offer today.