Murphy diagrams in R
At the recent International Symposium on Forecasting, held in Riverside, California, Tillman Gneiting gave a great talk on ‘Evaluating forecasts: why proper scoring rules and consistent scoring functions matter’. It will be the subject of an IJF invited paper in due course. One of the things he talked about was the ‘Murphy diagram’ for comparing forecasts, as proposed in Ehm et al (2015). Here’s how it works for comparing mean forecasts.

Talk: How to Visualize Data
Last week, I gave one of the visualization primer talks at BioVis in Dublin. My goal was to show people some examples, but also criticize the rather poor visualization culture in bioinformatics and challenge people to do better. Here is a write-up of that talk.

Python: Intro to Linear Algebra for Data Scientists
It’s important to know what goes on inside a machine learning algorithm. But it’s hard. There is some pretty intense math happening, much of which is linear algebra. When I took Andrew Ng’s course on machine learning, I found the hardest part was the linear algebra. I’m writing this for myself as much as you. So here is a quick review, so next time you look under the hood of an algorithm, you’re more confident. You can view the iPython notebook (usually easier to code with) on my github.

Awesome Random Forest
A curated list of resources regarding tree-based methods and more, including but not limited to random forest, bagging and boosting.

Seven Python Tools All Data Scientists Should Know How to Use
1. IPython
2. GraphLab Create
3. Pandas
4. PuLP
5. Matplotlib
6. Scikit-Learn
7. Spark

A Step by Step Backpropagation Example
Backpropagation is a common method for training a neural network. There is no shortage of papers online that attempt to explain how backpropagation works, but few that include an example with actual numbers. This post is my attempt to explain how it works with a concrete example that folks can compare their own calculations to in order to ensure they understand backpropagation correctly.

Deep Learning Adversarial Examples – Clarifying Misconceptions
Google scientist clarifies misconceptions and myths around Deep Learning Adversarial Examples, including: they do not occur in practice, Deep Learning is more vulnerable to them, they can be easily solved, and human brains make similar mistakes.

6 reasons why I like KeystoneML
As we put the finishing touches on what promises to be another outstanding Hardcore Data Science Day at Strata + Hadoop World in New York, I sat down with my co-organizer Ben Recht for the the latest episode of the O’Reilly Data Show Podcast. Recht is a UC Berkeley faculty member and member of AMPLab, and his research spans many areas of interest to data scientists including optimization, compressed sensing, statistics, and machine learning.

Seeing Data as the Product of Underlying Structural Forms
Matrix factorization follows from the realization that nothing forces us to accept the data as given. We start with objects placed in rows and record observations on those objects arrayed along the top in columns. Neither the objects nor the measurements need to be preserved in their original form.

10 Pains Businesses Feel When Working With Data
The 10 most common data issues facing business and how to cure them.
1. Inability to compare data held in different locations and from different sources.
2. Non conforming data, e.g. invoice discrepancies etc.