Questions to Ask When Applying Deep Learning

• Is my problem supervised or unsupervised? If supervised, is it a classification or regression problem?
• If supervised, how many labels am I dealing with?
• What’s my batch size?
• How many features am I dealing with?
• Another way to ask that same question is: What is my architecture?
• How am I going to tune my neural net?
• How much data will be sufficient to train my model? How do I go about finding that data?
• Hardware: Will I be using GPUs, CPUs or both? Am I going to rely on a single-system GPU or a distributed system?
• What’s my data pipeline? How do I plan to extract, transform and load the data (ETL)? Is it in an Oracle DB? Is it on a Hadoop cluster? Is it local or in the cloud?
• How will I featurize that data?
• What kind of non-linearity, loss function and weight initialization will I use?
• What is the simplest architecture I can use for this problem?
• Where will my net be trained and where will the model be deployed? What does it need to integrate with?


End-To-End Memory Networks for question-answering tasks with demo

This is an implementation of MemN2N model in Python for the bAbI question-answering tasks as shown in the Section 4 of the paper ‘End-To-End Memory Networks’. It is based on Facebook’s Matlab code.


Data Visualization Tools (an interactive listing)


Sentiment Analysis using R

Today I will explain you how to create a basic Movie review engine based on the tweets by people using R.


Implicit Recommender Systems: Biased Matrix Factorization

In today’s post, we will explain a certain algorithm for matrix factorization models for recommender systems which goes by the name Alternating Least Squares (there are others, for example based on stochastic gradient descent). We will go through the basic ALS algorithm, as well as how one can modify it to incorporate user and item biases. We will also go through the ALS algorithm for implicit feedback, and then explain how to modify that to incorporate user and item biases. The basic ALS model and the version from implicit feedback are discussed in many places (both online and in freely available research papers), but we aren’t aware of any good source for implicit ALS with biases… hence, this post.


Are Turbocharged Engines More Fuel Efficient?

In this post I will show the effect of a change in values by simulating a dataset, keeping only variables of interest fixed while allowing all other variables to change. I will do this by simulating data points from a method described here. I’ll compare this method with existing techniques.


Genre-based Music Recommendations Using Open Data (and the problem with recommender systems)

After a long 12 months of pouring my soul into it, my book, Data Analysis with R, was finally published. After the requisite 2-4 day breather, I started thinking about how I was going to get back into the swing of regular blog posts and decided that the easier and softer way is to cannibalize and expand on an example in the book.


Avoid overlapping labels in ggplot2 charts

If you’ve ever created a scatterplot with text labels using the text function in R, or the geom_text function in the ggplot2 package, you’ve probably found that the text labels can easily overlap, rendering some of them unreadable. Now, thanks to the new extensibility capabilities of the ggplot2 package, R user Kamil Slowikowski has created an R package ggrepel that adds alternative text labeling functions to ggplot2 that ‘repels’ labels from data points and other labels to avoid overlapping. The new geom_text_repel replaces the standard geom_text for plain text lablels, and you can also use geom_label_repel instead of geom_label for these rounded and color-coded labels:


Combining Admin 1 Choropleths and Reference Maps

A new version of choroplethr (v3.4.0) is now on CRAN. It allows you to combine Administrative Level 1 choropleths with reference maps. For reference, this functionality has been present for US maps for a while now (1, 2). This update just extends that functionality to the Administrative Level 1 mapping function, admin1_choropleth. To do this, just set the parameter reference_map=TRUE.


Revolution R Open Performance Improvements

Since last year Revolution Analytics has been publishing beta versions of Revolution R Open and finally in April this year they released RRO 8.0.3. The current release is RRO 3.2.2 (naming was adapted to fit the R version it is built upon). This post will give you an introduction on my favorite new features, how to install RRO and on performance benchmarks.