Extending churn analysis to revenue forecasting using R

In this article we will review application of clustering to customer order data in three parts. First, we will define the approach to developing the cluster model including derived predictors and dummy variables; second we will extend beyond a typical “churn” model by using the model in a cumulative fashion to predict customer re-ordering in the future defined by a set of time cutoffs; last we will use the cluster model to forecast actual revenue by estimating the ordering parameter distributions on a cluster basis, then sampling those distributions to predict new orders and order values over a time interval.

GDPR Drives Real Time Analytics

New reforms under the General Data Protection Regulation (GDPR) started as an attempt to standardize data protection regulations in 2012. The European Union intends to make Europe “fit for the digital age.” It took four years to finalize the agreements and reach a roadmap on how the laws will be enforced. The GDPR presents new opportunities as well as difficulties for businesses, digital companies, data collectors, and digital marketers. On the one hand, these regulations will make it more difficult for businesses and data mining firms to collect and analyze customer data for marketers, while on the other, they will present an opportunity for data collectors to innovate and enhance their techniques. This will lead to better collection of more meaningful data, as customers will be directly involved.

Announcing AIRI: Industry’s First Integrated AI-Ready Infrastructure for Deploying Deep Learning at Scale

Pure Storage (NYSE: PSTG), the all-flash storage platform that helps innovators build a better world with data, today announced the industry’s first comprehensive AI-Ready Infrastructure, AIRI, powered by NVIDIA. Architected by Pure Storage and NVIDIA, AIRI is purpose-built to enable data architects, scientists and business leaders to extend the power of the NVIDIA DGX-1 and operationalize AI-at- scale for every enterprise. With AIRI, cloud, enterprise and government organizations can accelerate time-to- insight and bring new, impactful innovations to humanity, faster. AIRI enables organizations to turn data into innovation at an unprecedented pace. AI represents an opportunity for enterprises to innovate not only at a product level, but within day-to- day operations as they lead their industries through periods of tremendous change. According to Gartner, 80 percent of enterprises will deploy AI by 2020. AIRI provides a simple, yet powerful, architecture that empowers organizations with the data-centric infrastructure needed to harness the true power of AI.

Discriminant Analysis: Statistics All The Way

Discriminant analysis is used when the variable to be predicted is categorical in nature. This analysis requires that the way to define data points to the respective categories is known which makes it different from cluster analysis where the classification criteria is not know. It works by calculating a score based on all the predictor variables and based on the values of the score, a corresponding class is selected. Hence, the name discriminant analysis which, in simple terms, discriminates data points and classifies them into classes or categories based on analysis of the predictor variables. This article delves into the linear discriminant analysis function in R and delivers in-depth explanation of the process and concepts. Before we move further, let us look at the assumptions of discriminant analysis which are quite similar to MANOVA.

Ensemble Machine Learning Tutorial

Here’s the slides from a 2-part lecture I’m giving on ensemble learning at Indiana University. It includes a discussion of the Netflix Prize competition, and the use of ensemble techniques in that competition.

Commoditisation of AI, digital forgery and the end of trust: how we can fix it

It is becoming widely evident that technology will enable total manipulation of video and audio content, as well as its digital creation from scratch. As a consequence, the meaning of evidence and trust will be critically challenged and pillars of the modern society such as information, justice and democracy will be shaken up and go through a period of crisis. Once tools for fabrication becomes a commodity, the effects will be more dramatic than the current phenomenon of fake news. In the tech circles the issue are discussed only at a philosophical level; no clear solution is known at present time. This post discusses pros and cons of two classes of potential solutions: digital signatures and learning based detection systems. We also ran a brief “weekend experiment” to measure the effectiveness of machine learning for detection of face manipulation, on the wave of deepfakes. In the limited scope of the experiment, our model is able to spot image manipulation that is imperceptible to the human eye.

20 Useful Visualization Libraries

Well, not entirely limited to libraries. Useful stuff for visualization practitioners sounded a little non-specific, though. These are all freely available.

Python Object-Oriented Programming (OOP): Tutorial

Tackle the basics of Object-Oriented Programming (OOP) in Python: explore classes, objects, instance methods, attributes and much more!

Would You Survive the Titanic? Getting Started in Python

The dataset Titanic: Machine Learning from Disaster is indispensable for the beginner in Data Science. This dataset allows you to work on the supervised learning, more preciously a classification problem. It is the reason why I would like to introduce you an analysis of this one.

Vectors and Functions in R

In my last post had answers to some of the common questions in R that a person who has just begun exploring the language, needs to know. As we advance and immerse further, this post will contain some essential components whose basic understanding is the key to master R.

Understanding Feature Engineering (Part 4) – Deep Learning Methods for Text Data

Working with unstructured text data is hard especially when you are trying to build an intelligent system which interprets and understands free flowing natural language just like humans. You need to be able to process and transform noisy, unstructured textual data into some structured, vectorized formats which can be understood by any machine learning algorithm. Principles from Natural Language Processing, Machine Learning or Deep Learning all of which fall under the broad umbrella of Artificial Intelligence are effective tools of the trade. Based on my previous posts, an important point to remember here is that any machine learning algorithm is based on principles of statistics, math and optimization. Hence they are not intelligent enough to start processing text in their raw, native form. We covered some traditional strategies for extracting meaningful features from text data in Part-3: Traditional Methods for Text Data. I encourage you to check out the same for a brief refresher. In this article, we will be looking at more advanced feature engineering strategies which often leverage deep learning models. More specifically we will be covering the Word2Vec, GloVe and FastText models.