The JASP Statistics Project
JASP aims to be a complete statistical package for both Bayesian and Frequentist statistical methods, that is easy to use and familiar to users of SPSS

Introducing shinyStan
As a project for Andrew’s Statistical Communication and Graphics graduate course at Columbia, a few of us (Michael Andreae, Yuanjun Gao, Dongying Song, and I) had the goal of giving RStan’s print and plot functions a makeover. We ended up getting a bit carried away and instead we designed a graphical user interface for interactively exploring virtually any Bayesian model fit using a Markov chain Monte Carlo algorithm. The result is shinyStan, a package for R and an app powered by Shiny. The full version of shinyStan v1.0.0 can be downloaded as an R package from the Stan Development Team GitHub page here, and we have a demo up online here. If you’re not an R user, we’re working on a full online version of shinyStan too.

All Machine Learning Models Have Flaws
This classic post examines what is right and wrong with different models of machine learning, including Bayesian learning, Graphical Models, Convex Loss Optimization, Statistical Learning, and more.

R Notebooks in the cloud
We recently added a feature to Domino that lets you spin up an interactive R session on any class of hardware you choose, with a single click, enabling more powerful interactive, exploratory work in R without any infrastructure or setup hassle. This post describes how and why we built our “R Notebook” feature.

Understanding Variance, Co-Variance, and Correlation
What is Variance? One possible answer is s 2 , but this is just a mechanical calculation (and leads to the next obvious question: what is s ?). Another answer might be the “the measure of the width of a distribution”, which is a pretty reasonable explanation for distributions like the Normal distribution. What about multi-modal distributions, distributions having not just one peak but many and of different widths? Following this we might want to say “How much a Random Variable varies?” but now we’re getting back to where we started!

ML Pitfalls: Measuring Performance (Part 1)
Unfortunately, analysis lives and dies by self-reported metrics. Is this feature A better than feature B? Is this classifier better than another? How much confidence can I have in this financial report? From the development to the consumption, almost every decision regarding analytics inherently asks “How good is this model?” “How good” can mean a lot of things and it varies over domain and problems sets. But it is the developer’s responsibility to provide a fair measurement in the first place. That task is surprisingly easy to mess up. So before you use all sorts of fancy software to run on your biggest computing cluster, make sure your “How good” is accurate.

Large-Scale Machine Learning for Drug Discovery
Discovering new treatments for human diseases is an immensely complicated challenge; Even after extensive research to develop a biological understanding of a disease, an effective therapeutic that can improve the quality of life must still be found. This process often takes years of research, requiring the creation and testing of millions of drug-like compounds in an effort to find a just a few viable drug treatment candidates. These high-throughput screens are often automated in sophisticated labs and are expensive to perform.

A Linear Congruential Generator (LCG) in R
In my simulation classes, we talk about how to generate random numbers. One of the techniques we talk about is the Linear Congruential Generator (LCG). Starting with a seed, the LCG produces the first number in the sequence, and then uses that value to generate the second one. The second value is used to generate the third, the third to generate the fourth, and so on.