The Secret Layer Behind Every Successful Deep Learning Model: Representation Learning and Knowledge Quality

Understanding the characteristics of input datasets is an essential capability of machine learning algorithms. Given a specific input, machine learning models need to infer specific features about the data in order to perform some target actions. Representation learning or feature learning is the subdiscipline of the machine learning space that deals with extracting features or understanding the representation of a dataset.


What’s New in Deep Learning Research: Neural Networks that Detect Relationships Between Objects

Relational reasoning is a key component of fluid intelligence. Since we are babies we learn to detect relationships between objects in space and time. Relational reasoning is so ubiquitous in our thinking that we barely notice it. Every time that we are deciding which road to take or when we are piercing together different episodes of a movie thriller or a detective novel to discover the end plot, we are effectively using this cognitive skill. To date, we know very little about the brain mechanisms that allow us to detect spatial and temporal relationships between objects and how they evolve overtime. From the artificial intelligence(AI) standpoint, relational reasoning has proven to be an elusive goal to most deep neural network models. Last year, DeepMind researchers published a paper that proposed a method for enabling relational reasoning in neural networks.


An Illustrated Explanation of Performing 2D Convolutions Using Matrix Multiplications

In this article, I will explain how 2D Convolutions are implemented as matrix multiplications. This explanation is based on the notes of the CS231n Convolutional Neural Networks for Visual Recognition (Module 2). I assume the reader is familiar with the concept of a convolution operation in the context of a deep neural network. If not, this repo has a report and excellent animations explaining what convolutions are. The code to reproduce the computations in this article can be downloaded here.


Entropy is a measure of uncertainty

Suppose you are talking with three patients in the waiting room of a doctor’s office. All three of them have just completed a medical test which, after some processing, yields one of two possible results: the disease is either present or absent. Let’s assume we are dealing with curious and data-oriented individuals. They’ve researched the probabilities for their specific risk profiles in advance and are now eager to find out the result.


When Should The Network Be The Computer?

Researchers have repurposed programmable network devices to place small amounts of application computation in the network, sometimes yielding orders-of-magnitude performance gains. At the same time, effectively using these devices requires careful use of limited resources and managing deployment challenges. This paper provides a framework for principled use of in-network processing. We provide a set of guidelines for building robust and deployable in-network primives, along with a taxonomy to help identify which applications can benefit from in-network processing and what types of devices they should use.


Statistics 101: Introduction to the Central Limit Theorem (with implementation in R)

What is one of the most important and core concepts of statistics that enables us to do predictive modeling, and yet it often confuses aspiring data scientists? Yes, I’m talking about the central limit theorem. It is a powerful statistical concept that every data scientist MUST know. Now, why is that? Well, the central limit theorem (CLT) is at the heart of hypothesis testing – a critical component of the data science lifecycle. That’s right, the idea that lets us explore the vast possibilities of the data we are given springs from CLT. It’s actually a simple notion to understand, yet most data scientists flounder at this question during interviews.


Visualising high-dimensional datasets using PCA and t-SNE in Python

The first step around any data related challenge is to start by exploring the data itself. This could be by looking at, for example, the distributions of certain variables or looking at potential correlations between variables. The problem nowadays is that most datasets have a large number of variables. In other words, they have a high number of dimensions along which the data is distributed. Visually exploring the data can then become challenging and most of the time even practically impossible to do manually. However, such visual exploration is incredibly important in any data-related problem. Therefore it is key to understand how to visualise high-dimensional datasets. This can be achieved using techniques known as dimensionality reduction. This post will focus on two techniques that will allow us to do this: PCA and t-SNE. More about that later. Lets first get some (high-dimensional) data to work with.


Recommendation Systems – Models and Evaluation

I’ve been involved in building several different types of recommendation systems, and one thing I’ve noticed is that each use case is different from the next, as each aims to solve a different business problem. Let’s consider a few examples:
1.Movie/Book/News Recommendations – Suggest new content that increases user engagement. The aim is to introduce users to new content that may interest them and encourage them to consume more content on our platform.
2.Stock Recommendations – Suggest stocks that are most profitable to the clients. The recommendations may be stocks that they have traded in historically. Novelty does not matter here; profitability of the stock does.
3.Product Recommendations – Suggest a mix of old and new products. The old products from users’ historical transactions serve as a reminder of their frequent purchases. Also, it is important to suggest new products that the users may like to try.
In all of these problems, the common thread is that they aim to increase customer satisfaction and in turn drive business in the form of increased commissions, greater sales, etc. Whatever the use case may be, the data is typically in the following format:
•Customer ID, Product ID (Movie/Stock/Product), No: of Units/Rating, Transaction Date
•Any other feature like the details of the product or demographics of the customer
Going forward, here are the topics I will be covering:
•Methods used for building recommendation systems – Content-based, Collaborative Filtering, Clustering
•Evaluation Metrics – Statistical accuracy metrics, Decision Support accuracy metrics
•Things to keep in mind


Bias in machine learning algorithms

I was able to attend the talk by Prof. Sharad Goyal on various types of bias in our machine learning models and insights on some of his recent work at Stanford Computational Policy Lab. This really got me excited and I did some study and created this note on bias in machine learning. Let’s talk about bias and why we need to care for it.


Optimizing any TensorFlow model using TensorFlow Transform Tools and using TensorRT

More than an article, this is basically how to, on optimizing a Tensorflow model, using TF Graph transformation tools and NVIDIA Tensor RT. This is a bit of a Heavy Reading and meant for Data Science Engineers than Data Scientist. Also, more than a how-to, it also illustrates the bugs and unexpected results I have encountered in doing this. Also, these results may be unexpected to me in the role of a DS Engineer but may have a rational explanation if I understand or go deeper into the abstractions of the model or the TF framework that implements this model. I don’t have the skill yet and leave it to the reader to point out the mistakes I have made if any. I have raised these two bugs, one with TF Graph Transform and one with NVIDIA TensorRT transform tool, and it gets answered, then I will update this post https://…/28276 https://…/#5320989


Reinforcement Learning – Motivation for a Control Engineer

Having a strong control theory background, I was never really impressed by usual machine learning applications. Off course I understand the importance of supervised and unsupervised algorithms, and how machine learning has been used for years to pump up the technological development (and society I would say). However, I can not find most of the introductory examples appealing enough (training a neural network for several hours in order to classify cats and dogs is not something for me). What changed everything was when I started studying more deeply the Reinforcement Learning theory. The concepts match so well with the kind of problem I used to study in control theory classes that the first thing I tried was basically to set and agent to learn how to control some dynamical system. In fact its origin is in the control optimization theory, with the Bellman equation as the beginning of everything.


Building Data

Today, I’d like to write a little story about what I’ve experienced personally across my data career and what I’ve learned by reading stories from others across Medium. And maybe at the same time I’ll come up with what I would have done differently. Data Engineers, in theory, aren’t responsible for ‘everything data’ at an organization. There’s no one liner in the default role description across the internet which states Data engineers collect, transform and publish data enabling data driven decisions at an organization…ALSO they create, maintain and evangelize the data culture we all know, love and want. But in quite a number of cases, particularly in early stage ‘data maturity’ companies, this is the requirement whether or not the employee or employer know it.


Looking at AI-focused Case Studies

In my last post, I discussed the impact of Artificial Intelligence (AI) on the job market and underlined the need for foresight in terms of regulating AI. I believe that our society is unable to cope with the ethical repercussions of a growing dependence on AI. Tech leadership in many countries acknowledge this issue, and in the past few years, they have each come out with a strategy to promote the development of AI in an effective way. An Overview of National AI Strategies briefly discusses the different AI policies proposed since the beginning of 2017.


Templated output in R

Earo Wang, who is the curator for the We are R-Ladies twitter feed this week (last week of April, 2019), had a really nice tweet about using the whisker package to create a template incorporating text and data in R. Her example created a list of tidyverse packages with descriptions. I really liked the example, but thought that the glue package might be able to do the same thing. I used Earo’s code to generate a tibble with the package names and descriptions, and glue_data to create the templated list.


Feature engineering

In general, you can think of data cleaning as a process of subtraction and feature engineering as a process of addition.
This is often one of the most valuable tasks a data scientist can do to improve model performance, for 3 big reasons:
1. You can isolate and highlight key information, which helps your algorithms ‘focus’ on what’s important.
2. You can bring in your own domain of expertise.
3. Most importantly, once you understand the ‘vocabulary’ of feature engineering, you can bring in other people’s domain expertise!
Before moving on, we just want to note that this is not an exhaustive compendium of all feature engineering because there are limitless possibilities for this step.
The good news is that this skill will naturally improve as you gain more experience.