How to place titles in lattice plots
I like the Economist theme in the latticeExtra package. It produces nice looking charts that mimic the design of the weekly newspaper, such as in this example:
MarI/O – Machine Learning for Video Games
MarI/O is a program made of neural networks and genetic algorithms that kicks butt at Super Mario World.
ROC Curves in Python and R
Ever heard people at your office talking about AUC, ROC, or TPR but been too shy to ask what the heck they’re talking about? Well lucky for you we’re going to be diving into the wonderful world of binary classification evaluation today. In particular, we’ll be discussing ROC curves. ROC curves are a great technique that have been around for a while and is still one of the tried and true industry standards. So sit back, relax, and get ready to dive into a world of 3 letter acronyms!
A step by step (screenshots) tutorial for upgrading R on Windows
If you are running R on Windows you can easily upgrade to the latest version of R using the installr package.
Connect R to Bloomberg with the RBlpapi package
For anyone who works with financial data and has access to a Bloomberg terminal, there is a new R package to interface to Bloomberg data services: RBlpapi. (If you had searched for an R connection to Bloomberg you wouldn’t have found this one — Bloomberg is happy to have software that connects to its public API, but not to use its name, apparently.)
Shiny App for the Wittgenstein Centre Population Projections
A few weeks ago a new version of the the Wittgenstein Centre Data Explorer was launched. The data explorer is intended to disseminate the results of a recent global population projection exercise which uniquely incorporates level of education (as well as age and sex) and the scientific input of more than 500 population experts around the world. Included are the projected populations used in the 5th assessment report of the Intergovernmental Panel on Climate Change (IPCC).
Analyzing Yelp Reviews with Bayesian Statistics
It’s not obvious what the best measure of predictive power to use is, but correlation is the simplest. For the subset of data that I looked at, just using the moving average gave a correlation of 0.357 whereas a simple Bayesian model taking regression to the mean into account gave a correlation of 0.361. If we take the average of the two models’ predictions, the correlation with the number of stars rises to 0.364. Since averaging the two models’ predictions is a naive thing to do, it’s clear that one would get a further improvement by improving the off-the-shelf model. One also sees a small boost by taking into account the category of the restaurant, and a larger boost by looking at trends in ratings over time for each restaurant, and then taking into account the amount of variation that one would expect to see in the trends in ratings over time across restaurants by chance.