The Bayesian New Statistics: Two Historical Trends Converge
If not null hypothesis significance testing, then what? If not p values, then confidence intervals? If not NHST, then Bayes factors? Both? Neither? These issues are addressed in a new manuscript titled The Bayesian New Statistics: Two Historical Trends Converge.

Seven Techniques for Data Dimensionality Reduction
Performing data mining with high dimensional data sets. Comparative study of different feature selection techniques like Missing Values Ratio, Low Variance Filter, PCA, Random Forests / Ensemble Trees etc.

Python 101 for Aspiring Data Nerds
Here, we’ll build an app in Python from A-Z, iterate on it to make it more robust, and finally add application event logging with Fluentd and Treasure Data. We chose Python because it’s quickly becoming the language of choice among aspiring data scientists. In our examples, we’ll use Python version 2.7.

Data Science 101: Introduction to Deep Learning with Python
In the presentation below, Alec Radford, Head of Research at indico Data Solutions, talks about deep learning with Python and the Theano library. The emphasis of the talk is on high performance computing, natural language processing using recurrent neural nets, and large scale learning with GPUs.

Reinventing the wheel for ordination biplots with ggplot2
I’ll be the first to admit that the topic of plotting ordination results using ggplot2 has been visited many times over. As is my typical fashion, I started creating a package for this purpose without completely searching for existing solutions. Specifically, the ggbiplot and factoextra packages already provide almost complete coverage of plotting results from multivariate and ordination analyses in R. Being the stubborn individual, I couldn’t give up on my own package so I started exploring ways to improve some of the functionality of biplot methods in these existing packages. For example, ggbiplot and factoextra work almost exclusively with results from principal components analysis, whereas numerous other multivariate analyses can be visualized using the biplot approach. I started to write methods to create biplots for some of the more common ordination techniques, in addition to all of the functions I could find in R that conduct PCA. This exercise became very boring very quickly so I stopped adding methods after the first eight or so. That being said, I present this blog as a sinking ship that was doomed from the beginning, but I’m also hopeful that these functions can be built on by others more ambitious than myself.

A first look at htmlwidgets
A strong case can be made that base R graphics supplemented with either the lattice library or ggplot2 for plotting by subgroups provides everything a statistician might need for both exploratory data analysis and for developing clear, crisp for communicating results. However, it is abundantly clear that web based graphics, driven to a large extent by JavaScript enhanced web design, is opening up new vistas for data visualizations. The ability to interact with graphs, view them from different points of view, establish real-time relationships between different plots and other graphical elements provides opportunities to extract new insights from data. To be fair, many of these capabilities have existed in R for quite some time, some from the very beginning. For example, the identify() function in the graphics package lets you mouse over a point on a plot and click to determine the associated value, and what could be easier than the plot3d() function in the rgl package that uses OpenGL technology to let you grab a #D scatter plot with your mouse and rotate it any which way.

scikit-learn video #5: Choosing a machine learning model
Welcome back to my video series on machine learning in Python with scikit-learn. In the previous video, we learned how to train three different models and make predictions using those models. However, we still need a way to choose the ‘best’ model, meaning the one that is most likely to make correct predictions when faced with new data. That’s the focus of this week’s video.

All Is Not Lost: Finding Value In Marketing Attribution Data
In my last blog, I laid out some facts that call into question the extensive effort many organizations put into attributing individual customer sales to individual marketing touch points via common attribution methods. To summarize, Suresh Pillai, head of Customer Analytics & Insights for Europe at eBay, showed that all reasonable attribution algorithms led to effectively the same aggregate credit to each marketing lever and also the same credit as a random method.