Linux Foundation Announces R Consortium to Support Millions of Users Around the World
The Linux Foundation, the nonprofit organization dedicated to accelerating the growth of Linux and collaborative development, announced the R Consortium. This new organization will strengthen both the technical and user communities as a Collaborative Projects hosted at Linux Foundation.

Questions About the Size and Power of a Test
Osman, sent a comment in relation to my recent post on the effects of temporal aggregation on t-tests, and the like. Rather than just bury it, with a short response, in the ‘Comments’ section of that post, I thought I’d give it proper attention here.

Must Watch Data Science Videos from SciPy Conference 2015
Why is this relevant here? Well, I had a similar feeling when I looked at the videos from the recent SciPy conference.

Drag and Drop Visuals in your Interactive Dashboard
Gridster is a really cool and awesome JavaScript library that enables drag and drop as well as re-sizing features for your html placeholders (div’s).

The Big List of D3.js Examples

Pandarize your Spark DataFrames
DataFrames are a great abstraction for working with structured and semi-structured data. They are basically a collection of rows, organized into named columns. Think of relational database tables: DataFrames are very similar and allow you to do similar operations on them:
• slice data: select subset of rows or columns based on conditions (filters)
• sort data by one or more columns
• aggregate data and compute summary statistics
• join multiple DataFrames
What makes them much more powerful than SQL is the fact that this nice, SQL-like API is actually exposed in a full-fledged programming language. Which means we can mix declarative SQL-like operations with arbitrary code written in a general-purpose programming language.

7 Common Biases That Skew Big Data Results
1.Confirmation Bias
2.Selection Bias
4.Simpson’s Paradox
5.Over Fitting and Under Fitting
6.Confounding Variables
7.Non-normality: The Bell Does Not Toll

scikit-learn video #8: Efficiently searching for optimal tuning parameters
In this video, you’ll learn how to efficiently search for the optimal tuning parameters (or ‘hyperparameters’) for your machine learning model in order to maximize its performance. I’ll start by demonstrating an exhaustive ‘grid search’ process using scikit-learn’s GridSearchCV class, and then I’ll compare it with RandomizedSearchCV, which can often achieve similar results much more quickly.

How Does #DeepDream Work?
If you’ve been browsing the net recently, you might have stumbled on some strange-looking images, with pieces of dog heads, eyes, legs and what looks like buildings, sometimes superimposed on a normal picture, sometimes not. Although they can be nightmare-inducing (or because of that), they have gained a lot of popularity on the Internet. Often tagged #deepdream, they are made by a neural network trained on a huge set of categorized images and set free to generate new ones. The network comes from Google Research and its code is currently available on github, spawning more home-made neural image generators.