Why Business Strategy Always Trumps Data Analytics
There’s a lot of hype in the business communities about data, analytics, cloud technologies, data science, and eCommerce. A cornucopia of jargon that emphasizes data (raw material) and its analytics spinoff (the service). Or is it emphasizing business ? What came first, the chicken or the egg ?
Sequential Bayesian inference for time series
Hidden Markov models are very flexible tools to model time series: the observations are assumed to be noisy measurements of a Markov process. The Markov process can represent the complex dynamics of the underlying phenomenon (in the example of the article, it is a prey-predator model for the population growth of planktons). The noise in the measurements accounts for the error of the measuring devices, the fact that the underlying process is partially observed, etc.
Getting smart with Machine Learning – AdaBoost and Gradient Boost
In this article, we’ll introduce you to some of the best practices used to enhance power of these engines to achieve a higher predictability using an additional booster.
Online Experiments for Computational Social Science
This tutorial teaches attendees how to design, plan, implement, and analyze online experiments. First, we review basic concepts in causal inference and motivate the need for experiments. Then we will discuss basic statistical tools to help plan experiments: exploratory analysis, power calculations, and the use of simulation in R. We then discuss statistical methods to estimate causal quantities of interest and construct appropriate confidence intervals. Particular attention will be given to scalable methods suitable for ‘big data’, including working with weighted data and clustered bootstrapping. We then discuss how to design and implement online experiments using PlanOut, an open-source toolkit for advanced online experimentation used at Facebook. We will show how basic ‘A/B tests’, within-subjects designs, as well as more sophisticated experiments can be implemented. We demonstrate how experimental designs from social computing literature can be implemented, and also review in detail two very large field experiments conducted at Facebook using PlanOut. Finally, we will discuss issues with logging and common errors in the deployment and analysis of experiments. Attendees will be given code examples and participate in the planning, implementation, and analysis of a Web application using Python, PlanOut, and R.
TidyR Challenge: Data.Table Solution
Arun Srinivasan is the man! Once he saw that his data.table solution to the TidyR Challenge had an issue, he fixed it! His solution is below along with a quick equivalence test to my original solution, and check out this stackOverflow question for a more engaging discussion of the strengths and weaknesses of both dplyr/tidyr and data.table.
Fast parallel computing with Intel Phi coprocessors
We know that R is a great system for performing statistical analysis. The price is quite nice too 😉 . As a graduate student, I need a cheap replacement for Matlab and/or Maple. Well, R can do that too. I’m running a large program that benefits from parallel processing. RRO 8.0.2 with the MKL works exceedingly well.