Explaining AI from a Life cycle of data

When I was teaching a session on AI at an MBA program at the London School of Economics, I thought of explaining AI from the perspective of the life-cycle of Data. This explanation is useful because more people are used to data (than to code). I welcome comments on this approach. Essentially, we consider how data is used and transformed for AI and what are its implications.


The best resources in Machine Learning & AI

Get exposure to the Machine Learning community


Python Tutorial: Fuzzy Name Matching Algorithms

How to cope with the variability and complexity of person name variables used as identifiers.


Can you learn Data Science and Machine Learning without Maths?

Data scientists are the no. 1 most promising job in America for 2019, according to a Thursday report from LinkedIn. Hence, this comes as no surprise: Data scientist topped Glassdoor’s list of Best Jobs in America for the past three years, with professionals in the field reporting high demand, high salaries, and high job satisfaction. Also, with the increase in demand, employers are looking for more skills in modern day data scientists. Furthermore, a modern-day data scientist needs to be a good player in aspects like maths, programming, communication and problem-solving. In this blog, we are going to explore if knowledge of mathematics is really necessary to become good data scientists. Also, we will try to explore ways, if any, through which one can become a good data scientist without learning maths.


Putting ML in production I: using Apache Kafka in Python.

A company collects data using a series of services that generate events as the users/customers interact with the the company’s website or app. As these interactions happen, an algorithm needs to run in real time and some immediate action needs to be taken based on the algorithm’s outputs (or predictions). On top of that, after N interactions (or observations) the algorithm needs to be retrained without stopping the prediction service, since users will keep interacting. For the exercise here we have used the Adult dataset, where the goal is to predict whether individuals earn an income higher/lower than 50k based on their age, native country, etc. To adapt this dataset to the scenario described before, one could assume that age, native country, etc is collected through an online questionnaire/form and we need to predict whether users have high/low income in real time. If high income, then we immediately call/email them with some offer, for example. Then, after N new observations we retrain the algorithm while we keep predicting on new users.


How to do regression analysis for multiple independent or dependent variables

In this post, I will show how to run a linear regression analysis for multiple independent or dependent variables. You should not be confused with the multivariable-adjusted model. This tutorial is not about multivariable models.


A Simple Guide to Semantic Segmentation

A comprehensive review of Classical and Deep Learning methods for Semantic Segmentation


Interpretable AI or How I Learned to Stop Worrying and Trust AI

In the last five years alone, AI researchers have made significant breakthroughs in areas such as image recognition, natural language understanding and board games! As companies are considering handing over critical decisions to AI in industries like healthcare and finance, the lack of understanding of complex machine learned models is hugely problematic. This lack of understanding could result in models propagating bias and we’ve seen quite a few examples of this in criminal justice, politics, retail, facial recognition and language understanding.


The Machine That Programmed Humans

Many water marks have been breached in the last century, but perhaps the most important will be that, for first time in the history of the human species, we are encountering tools whose interests diverge from those of their user. I speak of artificial intelligence, and in particular advertising enabled software such as the Facebook and Gmail. In such examples, the software’s intended purpose has departed subtly from those of its users. While this may seem innocuous at present, brought to its furthest conclusions, it portends a raft of disturbing consequences. Namely, we may find ourselves at a crossroads in which humans are programmed by machines rather than vice-a-versa.


Twitter Sentiment Analysis using fastText

FastText is an NLP library developed by the Facebook AI. It is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.


Better Preference Predictions: Tunable and Explainable Recommender Systems

Ad recommendations should be understandable to the individual consumer, but is it possible to increase interpretability without sacrificing accuracy?


Machine Learning for Beginners: An Introduction to Neural Networks

A simple explanation of how they work and how to implement one from scratch in Python.


Graph analysis using the tidyverse

It is because I am not a graph analysis expert that I thought it important to write this article. For someone who thinks in terms of single rectangular data sets, it is a bit of a mental leap to understand how to apply tidy principles to a more robust object, such as a graph table.


A ‘full-stack’ data science project

I’ve been dabbling at data science for quite a while now?-?I would download a dataset from Kaggle, start a kernel, do exploratory analysis, data cleaning, build a baseline machine learning model or neural network using either sklearn or fast.ai, only to be distracted by work and temporarily lose interest, and later repeat the same process for a different competition or dataset all over again. After talking to some practicing data professionals, I’ve come to realize that while Kaggle is highly competitive and provides unmatched opportunity for learning from peers who are some of the top data scientists in the world, Kaggle datasets start somewhere in the middle of the data science life-cycle?-?the mundane but most important steps of data collection, interpretation, cleaning, masking (de-identification) are all done already. I wanted to build a data science project that incorporated all aspects of the lifecycle?-?data collection, cleaning, analysis and model building. I also wanted it to serve as a playground for me to learn the different aspects of data science and neural networks. And thus started the quest for a ‘good’ dataset. During a work related travel, I was browsing through a bookstore at the airport, and I wondered if I could build a data science project around book-related data. What information can be gleaned from the blurb of a book? Or its cover? As I mulled over this idea, I realized that other information about a book such as ratings, reviews, description, etc. may also be valuable to run a few experiments. The objective was to gain experience building a model from scratch. I will try to follow the standard workflow for a data science project, as illustrated below.


11 Steps to Transition into Data Science (for Reporting / MIS / BI Professionals)

1. Start performing detective analytics and generate insights from reports
2. Learn statistics to support your insights about reports
3. Present your findings to the right group
4. Explore an open source tool to generate reports OR to perform detective analysis
5. Understand the model building / predictive modeling steps
6. Methods to evaluate your model’s performance
7. Introduction to predictive modeling with Linear and Logistic regression
8. Identify the business problem (related to your role), convert it to a data problem and make predictions
9. Share your model’s results with the business owners and earn their trust
10. Keep learning new algorithms, engage in the data science community and focus on profile building
11. Focus on transitioning to a data science role within your organization


Don’t Do Data Science, Solve Business Problems

Here’s a few suggestions for Data Scientists or those on Analytics teams to apply this idea more fully:
1. Become a scientist of the business. Spend a little bit less time learning new algorithms and Python packages and more time learning the levers that make your specific business go up or down and the variables that impact those levers. Identify data sources contributing to those variables – usually at the intersection you will find high value opportunities.
2. Be ruthless in prioritizing and accepting projects. Prior to moving forward on a DS project, evaluate 1) The action that will be taken with the output and 2) the business value that will be created based on that action. If both the action isn’t clear and the value is not high, don’t waste your time. Side note: Data Science is NOT Business Intelligence, BI is an important IT function that maintains the integrity of data sources and dashboards – your job as a Data Scientist is to solve problems in the business.
3. Don’t expect stakeholders to always (or ever) be able to define the problem. In my opinion, this is the number one most important skill for a Data Scientist above any technical expertise – the ability to clearly evaluate and define a problem. Most business stakeholders have problems but haven’t thought about them long enough to be able to define the process behind them. This is the place where you will make Machine Learning and AI work for your organization – by deciphering the needs of the business into a process where Data Science can be applied effectively.
4. Make yourself part of the business. Do not under any circumstances become siloed. Proactively get involved with the business unit as a partner, not a support function.


Gini Coefficient and Lorenz curve

This post will explain the Gini coefficient’s usage and relevance for the data science professionals and we will also understand the Lorenz curve which is a way to determine Gini coefficient graphically.
Advertisements