Markov Chain Monte Carlo Without all the Bullshit
It seems very difficult to find an explanation of Markov Chain Monte Carlo without all any superfluous jargon. The ‘bullshit’ here is the implicit claim of an author that such jargon is needed. Maybe it is to explain advanced applications (like attempts to do ‘inference in Bayesian networks’), but it is certainly not needed to define or analyze the basic ideas.

Part 4a: Modelling – predicting the amount of rain
In the fourth and last part of this series, we will build several predictive models and evaluate their accuracies. In Part 4a, our dependent value will be continuous, and we will be predicting the daily amount of rain. Then, in Part 4b, we will deal with the case of a binary outcome, which means we will assign probabilities to the occurrence of rain on a given day.

Predicting Mobile Phone Prices
Recently a colleague of mine showed me a nauseating interactive scatterplot that plots mobile phones according to two dimensions of the user’s choice from a list of possible dimensions. Although the interactive visualization was offensive to my tastes, the JSON data behind the visualization was intriguing. It was easy enough to get the data behind it (see this link if you want an up to date copy and be sure to take out the ‘data=’ from the start of the file! I pulled this data around noon on March 23rd.) so that I could start asking a simple question: Which of the available factors provided in the dataset were the most predictive of full mobile phone price?

Using Decision Trees in Evidence Based Medicine
For our analysis, we start with a data set which contains data about a number of patients all of whom suffered from the same illness. Each of these patients responded well to one of five medications. We will use a decision tree to understand what factors in each patients history led to them responding well to one specific medication over the others. We will then use our findings to generate a set of evidence based rules or policies that can be followed by doctors to treat this illness in future patients. As part of our analysis, we will also explore how to interpret decision trees.

Test Driven Analysis
Rich’s presentation focused on the challenge of how to ensure that the new system (R) would provide the same answers as the legacy system (SAS).

Lambda Complexity: Why Fast Data Needs New Thinking
Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.’ I’m going to argue that while the Unix philosophy works for batch workflows, it is a poor fit for stream processing.

Data Profiling Services: Putting Business Data to Work
To connect savvy business people to the data they need requires the right technology tools. One tool that is still not easily made available to business users is data profiling. Data profiling is the process of analyzing data to determine things like its structure and meaning. Users set up a variety of methods and processes to reveal patterns of data usage, metadata matches, value completeness, degree of data quality, and so on.