R: Data Visualization cheatsheet
We’ve added a new cheatsheet to our collection. Data Visualization with ggplot2 describes how to build a plot with ggplot2 and the grammar of graphics. You will find helpful reminders of how to use: {geoms, stats, scales, coordinate systems, facets, position adjustments, legends, themes }. The cheatsheet also documents tips on zooming.

Understanding Convolution in Deep Learning
Convolution is probably the most important concept in deep learning right now. It was convolution and convolutional nets that catapulted deep learning to the forefront of almost any machine learning task there is. But what makes convolution so powerful? How does it work? In this blog post I will explain convolution and relate it to other concepts that will help you to understand convolution thoroughly. There are already some blog post regarding convolution in deep learning, but I found all of them highly confusing with unnecessary mathematical details that do not further the understanding in any meaningful way. This blog post will also have many mathematical details, but I will approach them from a conceptual point of view where I represent the underlying mathematics with images everybody should be able to understand. The first part of this blog post is aimed at anybody who wants to understand the general concept of convolution and convolutional nets in deep learning. The second part of this blog post includes advanced concepts and is aimed to further and enhance the understanding of convolution for deep learning researchers and specialists.

Nonsensical beer reviews via Markov chains
We can continue picking ‘next states’ and eventually we’ll have generated a random, yet probabilistic sequence of weather. These same principles can be used to generate a sentence from text data – pick a random beginning state (word) from the text and then pick the next word based on the likelihood of it occurring, given the current word. A first-order Markov sentence would have a one word current state, a second-order would have a two word current state, … and so on. The larger the corpus and the higher the order, the more sense these Markov generated sentences make. Good thing I have a lot of beer reviews.

Interactive pivot tables with R
I love interactive pivot tables. That is the number one reason why I keep using spreadsheet software. The ability to look at data quickly in lots of different ways, without a single line of code helps me to get an understanding of the data really fast.

rPithon vs. rPython
Similar to rPython, the rPithon package (http://rpithon.r-forge.r-project.org) allows users to execute Python code from R and exchange the data between Python and R. However, the underlying mechanisms between these two packages are fundamentally different. Wihle rPithon communicates with Python from R through pipes, rPython accomplishes the same task with json. A major advantage of rPithon over rPython is that multiple Python processes can be started within a R session. However, rPithon is not very robust while exchanging large data objects between R and Python.

Fastest Growing Software for Scholarly Analytics: Python, R, KNIME…
The three fastest growing packages are all free and open source: Python, R and KNIME. All three saw more than 25% growth. Note that the Python figures are strictly for analytics use as defined here. At the other end of the scale are SPSS and SAS, both of which declined in use by around 25%. Recall that Fig. 2a shows that despite recent years of decline, SPSS is still extremely dominant for scholarly use.

Mapping Flows in R
Last year I published the above graphic, which then got converted into the below for the book London: The Information Capital. I have had many requests for the code I used to create the plot so here it is!

Need for Processing Speed: data.table
The first time I discovered data.table it felt like magic. I was waiting on a process that was projected to take the better part of an afternoon. In the meantime, I followed the data.table tutorial, rewrote my code using the data.table structure, and fully executed said code, all while the data.frame equivalent was wheezing along. In the last year, data.table has gotten even faster.

Another Interactive Map for the Cholera Dataset
Following my previous post, François posted a comment mentionning another package I could use to get an interactive map, the rleafmap package. And the heatmap was here easy to include. This time, we do not use openstreetmap.