Why I use Panel/Multilevel Methods
I don’t understand why any researcher would choose not to use panel/multilevel methods on panel/hierarchical data.

The Big ‘Big Data’ Question: Hadoop or Spark?
One question I get asked a lot by my clients recently is: Should we go for Hadoop or Spark as our big data framework? Spark has overtaken Hadoop as the most active open source Big Data project. While they are not directly comparable products, they both have many of the same uses.

The Ultimate Beginner’s Guide to Data Quality and Business Intelligence
Today’s marketers are becoming technically savvier. They understand the need to improve customer experiences or implement digital marketing strategies to engage consumers across channels. Customer retention and acquisition, Big Data, social media marketing, and content marketing are just a few of the goals and strategies in today’s marketing toolbox. However, perhaps not so widely discussed are some important fundamentals – high quality marketing data.

The Rise of Computer-Aided Explanation
Computers can translate French and prove mathematical theorems. But can they make deep conceptual insights into the way the world works?

Beginner’s Guide to Time Series Analysis
Over the last few years we’ve looked at various tools to help us identify exploitable patterns in asset prices. In particular we have considered basic econometrics, statistical machine learning and Bayesian statistics. While these are all great modern tools for data analysis, the vast majority of asset modeling in the industry still makes use of statistical time series analysis. In this article we are going to examine what time series analysis is, outline its scope and learn how we can apply the techniques to various frequencies of financial data.

Learn regular expressions in about 55 minutes
Regular expressions (‘regexes’) are supercharged Find/Replace string operations. Regular expressions are used when editing text in a text editor, to:
• check whether the text contains a certain pattern
• find those pattern matches, if there are any
• pull information (i.e. substrings) out of the text
• make modifications to the text.

Taxi Trip Time Winners’ Interview: 3rd place, BlueTaxi
This spring, Kaggle hosted two competitions with the ECML PKDD conference in Porto, Portugal. The competitions shared a dataset but focused on different problems. Taxi Trajectory asked participants to predict where a taxi would drop off a customer given partial information on their journey, while Taxi Trip Time’s goal was to predict the amount of time a journey would take given the same dataset.

R #6 in IEEE 2015 Top Programming Languages, Rising 3 Places
IEEE Spectrum has published its 2015 list of Top Programming Languages, and R ranks in 6th place, jumping 3 places from its 2014 ranking.

mapView: basic interactive viewing of spatial data in R
Working with spatial data in R I find myself quite often in the need to quickly visually check whether a certain analysis has produced reasonable results.

stringsAsFactors: An unauthorized biography
Recently, I was listening in on the conversation of some colleagues who were discussing a bug in their R code. The bug was ultimately traced back to the well-known phenomenon that functions like ‘read.table()’ and ‘read.csv()’ in R convert columns that are detected to be character/strings to be factor variables.

Text encoding is a convoluted mess
Modern text encoding is a convoluted mess where costs can easily exceed benefits. I admit we are in a world that has moved beyond ASCII (which at best served only English, and even then without full punctuation). But modern text encoding standards (utf-x, Unicode) have metastasized to the point you spend more time working around them than benefiting from them.