A new R package for detecting unusual time series
The anomalous package provides some tools to detect unusual time series in a large collection of time series. This is joint work with Earo Wang (an honours student at Monash) and Nikolay Laptev (from Yahoo Labs). Yahoo is interested in detecting unusual patterns in server metrics. The basic idea is to measure a range of features of the time series (such as strength of seasonality, an index of spikiness, first order autocorrelation, etc.) Then a principal component decomposition of the feature matrix is calculated, and outliers are identified in 2-dimensional space of the first two principal component scores.
How to Evaluate Machine Learning Models, Part 4: Hyperparameter Tuning
In the realm of machine learning, hyperparameter tuning is a ‘meta’ learning task. It happens to be one of my favorite subjects because it can appear like black magic, yet its secrets are not impenetrable. In this post, I’ll walk through what is hyperparameter tuning, why it’s hard, and what kind of smart tuning methods are being developed to do something about it.
R: the Excel Connection
As companies increasingly look beyond the scope of what is logistically possible in Excel more and more companies are approaching Mango looking for help with connecting to Excel from R. With over 6,500 packages now on CRAN it should come as no surprise that there are quite a few packages that have be written in order to connect to Excel from R. So which is the best? Unfortunately it really does depend on what you want to do but here’s a quick guide to some of the main packages available from CRAN. – See more at: http://…/#sthash.fFFF1igu.dpuf
Modeling Contagion Using Airline Networks in R
I first became interested in networks when reading Matthew O’Jackson’s 2010 paper describing networks in economics. At some point during the 2014 ebola outbreak, I became interested in how the disease could actually come to the U.S. I was caught up with work/classes at the time, but decided to use airline flight data to at least explore the question.
Lessons learned in high-performance R
On this blog, I’ve had a long running investigation/demonstration of how to make a ’embarrassingly-parallel’ but computationally intractable (on commodity hardware, at least) R problem more performant by using parallel computation and Rcpp.