The Two (Conflicting) Definitions of AI

There are two definitions currently in use for AI, the popular definition and the data science definition and they conflict in fundamental ways. If you’re going to explain or recommend AI to a non-data scientist, it’s important to understand the difference.

Secure and fast microVMs for serverless computing

Firecracker is an open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant containers and functions-based services.


A simple tool to make a bot that speaks like you, simply learning from your WhatsApp Chats.

Horizon: The first open source reinforcement learning platform for large-scale products and services

Today we are open-sourcing Horizon, an end-to-end applied reinforcement learning platform that uses RL to optimize products and services used by billions of people. We developed this platform to bridge the gap between RL’s growing impact in research and its traditionally narrow range of uses in production. We deployed Horizon at Facebook over the past year, improving the platform’s ability to adapt RL’s decision-based approach to large-scale applications. While others have worked on applications for reinforcement learning, Horizon is the first open source RL platform for production.

Automated Machine Learning with Auto-Keras

Machine learning is not a very uncommon term these days because of organizations like DataCamp, Coursera, Udacity and many more are constantly working on how efficiently and flexible they can bring the very education of machine learning to the commoners. With the virtue of their platforms, it is really easy nowadays to get started in this field with almost no prerequisites. However, the term Automated Machine Learning is making a lot of headlines these days on the popular Data Science education forums. Many organizations like Google,, etc. are working commendably in this area. This is not a very common topic as compared to Machine Learning. Because machine learning deals with the automation part itself; so naturally the question that hits the mind first – ‘Can machine learning be also automated?’

Deep Learning cheatsheets for Stanford’s CS 230

This repository aims at summing up in the same place all the important notions that are covered in Stanford’s CS 230 Deep Learning course, and include:
• Cheatsheets detailing everything about convolutional neural networks, recurrent neural networks, as well as the tips and tricks to have in mind when training a deep learning model.
• All elements of the above combined in an ultimate compilation of concepts, to have with you at all times!

Mode is the source of truth for your data.

Choose the right language for the job: SQL, Python, and R, all in one platform.

What Python editors or IDEs you used the most in 2018?

Vote in the new KDnuggets Poll – what are your favorite Python editors or IDEs?

Slack and Plumber, Part Two

This is the final entry in a three-part series about the plumber package. The first post introduces plumber as an R package for building REST API endpoints in R. The second post builds a working example of a plumber API that powers a Slack slash command. In this final entry, we will secure the API created in the previous post so that it only responds to authenticated requests, and deploy it using RStudio Connect.

How to build a Movie Recommender System in Python using LightFm

In this blog post, we will be creating a movie recommender system in python, that suggest new movies to the user based on their viewing history. Before we start let’s have a quick look at what a recommender system is.

A Universal Knowledge Bank

Today, we live in a world with the internet. There are unfathomable amounts of knowledge on the internet, being only a couple of clicks away. When in history can a 17 year old like myself be working on cutting edge technology such as Machine Learning, Brain Machine Interfaces, and more.

Disentanglement with Variational Autoencoder: A Review

Learning of interpretable factorized representation has been around in machine learning for quite a time. But with the recent advancement in deep generative models like Variational Autoencoder (VAE), there has been an explosion in the interest for learning such disentangled representation. Since the objective of any generative model is essentially to capture underlying data generative factors, the disentangled representation would mean a single latent unit being sensitive to variations in single generative factors.

Temporal Difference in Reinforcement Learning, the Easy Way

Suppose you are driving your car equipped with a GPS. At the start of your journey the GPS gives you an estimate of the arrival time (based on statistical data), as you drive and you hit traffic jams (or not), it refines its estimate and gives you other arrival times. You notice that at each portion of the trip you are provided with some estimate about the arrival time. Now suppose that your GPS does not give you any estimate but stores the data until you arrive then gives you a detailed report on how much time each part of the road took. Would this be useful for you ? The answer will be: it depends on what you want to do. But for sure you will appreciate having early on feedback even if it was not very accurate.

Speeding up your Algorithms Part 4 – Dask

This is the fourth post in a series:
1. Speed Up your Algorithms Part 1 – PyTorch
2. Speed Up your Algorithms Part 2 – Numba
3. Speed Up your Algorithms Part 3 – Parallelization
4. Speed Up your Algorithms Part 4 – Dask

Speed Up Your Algorithms Part 3 – Parallelization

This is the third post in a series:
1. Speed Up your Algorithms Part 1 – PyTorch
2. Speed Up your Algorithms Part 2 – Numba
3. Speed Up your Algorithms Part 3 – Parallelization
4. Speed Up your Algorithms Part 4 – Dask

Making Your Neural Network Say ‘I Don´t Know’ – Bayesian NNs using Pyro and PyTorch

Building an image classifier has become the new ‘hello world’. Remember the day when you first came across Python and your print ‘hello world’ felt magical? I got the same feeling a couple months back when I followed the PyTorch official tutorial and built myself a simple classifier that worked pretty well.

Database-Inspired Optimizations for Statistical Analysis

Computing complex statistics on large amounts of data is no longer a corner case, but a daily challenge. However, current tools such as GNU R were not built to efficiently handle large data sets. We propose to vastly improve the execution of R scripts by interpreting them as a declaration of intent rather than an imperative order set in stone. This allows us to apply optimization techniques from the columnar data management research field. We have implemented several of these optimizers in Renjin, an open-source execution environment for R scripts targeted at the Java virtual machine. The demonstration of our approach using a series of micro-benchmarks and experiments on complex survey analysis show orders-of-magnitude improvements in analysis cost.

21 Statistical Concepts Explained in Simple English – Part 4

• Content Validity (Logical or Rational Validity)
• Contingency Coefficient: Definition
• Continuous Probability Distribution
• Continuous Variable Definition (Continuous Data)
• Contour Plots: Definition, Examples
• Control Group: Definition, Examples and Types
• Control Variable: Simple Definition
• Convenience Sampling (Accidental Sampling): Definition, Examples
• Convergent Validity and Discriminant Validity: Definition, Examples
• Cook’s Distance / Cook’s D: Definition, Interpretation
• Correlation Matrix: Definition
• Counterbalancing in Research
• Covariance in Statistics: What is it? Example
• Covariate Definition in Statistics
• Cramer-Rao Lower Bound
• Criterion Validity: Definition, Types of Validity
• Criterion Variable: Definition, Use and Examples
• Critical Z Value TI 83: Easy Steps for the InvNorm Function
• Cronbach’s Alpha: Simple Definition, Use and Interpretation
• C-Statistic: Definition, Examples, Weighting and Significance
• Cumulative Distribution Function CDF

The broader the canvas of visualization is, the better the understanding is. That’s exactly what happens when one visualizes big data through the Augmented Reality (AR) and Virtual Reality (VR). A combination of AR and VR could open a world of possibilities to better utilize the data at hand. VR and AR can practically improve the way the data is being perceived and could actually be the solution to make use of the large unused data.

Estimating Probabilities with Bayesian Modeling in Python

A simple application of Probabilistic Programming with PyMC3 in Python