Data Lake Business Model Maturity Index

“Our organization is abuzz with the concept of data lakes!” a customer recently told me. And rightfully so, as the data lake holds the potential to help organizations become more effective at leveraging data and analytics to power their business models. That’s exactly what we propose when we talk about the Big Data Business Model Maturity Index, and helping organizations to exploit the power of predictive, prescriptive, and cognitive (self-learning) analytics to advance up the business model maturity index.

Docker for data science, building a simple jupyter container

This is the first in a series of posts where I’ll be noting down my findings while exploring Docker and how and if it can help for everything Data Science related. In this post I’ll try to build a simple container that will have a jupyter notebook installed.

Create Real Value with Augmented (not Artificial) Intelligence

As long as human and artificial intelligence work in tandem, we’ll continue to make each other better at what we naturally do best.

The rise of the streaming platform

Neha Narkhede explains why streaming platforms have become the central nervous systems for modern digital businesses.

Approximate Nearest Neighbours for Recommender Systems

One challenge that recommender systems face is in quickly generating a list of the best recommendations to show for the user. These days many libraries can quickly train models that can handle millions of users and millions of items, but the naive solution for evaluating these models involves ranking every single item for every single user which can be extremely expensive. As an example, my implicit recommendation library can train a model on the dataset in 24 seconds on my desktop – but takes over an hour to use that model to generate recommendations for each user. This post is about evaluating a couple of different approximate nearest neighbours libraries to speed up making recommendations made by matrix factorization models. In particular, the libraries I’m looking at are Annoy, NMSLib and Faiss. I’ve used Annoy successfully for a couple different projects now in the past – but was recently intrigued when I read that NMSLib can be up to 10x faster when using its Hierarchical Navigable Small World Graph (HNSW) index option. I also wanted to try out Faiss after reading the blog post that Facebook Research wrote about it – where they claimed that the GPU enabled version of Faiss was the fastest available option. Both NMSLib and Faiss turn out to be extremely good at this task, and I’ve added code to implicit to use these libraries for generating recommendations.