Many rules of statistics are wrong

There are two kinds of people who violate the rules of statistical inference: people who don’t know them and people who don’t agree with them.

How Could Classification Trees Be So Fast on Categorical Variables?

I think that over the past months, I have been saying non-correct things about classification with categorical covariates. Because I never took time to look at it carefuly. Consider some simulated dataset, with a logistic regression,

Create or machine-learn fuzzy logic rules for use with an on-line inference engine

New DocAndys SaaS service supports user-created embeddable Fuzzy Logic Expert Systems. Use rule language Darl to hand-create or machine-learn rule sets from data and use them via REST interfaces.

Extract Google Trends Data with Python

A Discrete Time Markov Chain (DTMC) SIR Model in R

There are many different techniques that be used to model physical, social, economic, and conceptual systems. The purpose of this post is to show how the Kermack-McKendrick (1927) formulation of the SIR Model for studying disease epidemics (where S stands for Susceptible, I stands for Infected, and R for Recovered) can be easily implemented in R as a discrete time Markov Chain using the markovchain package.

Build Online Image Classification Service with Shiny and MXNetR

Early this week, Google announced its Cloud Vision API, which can detect the content of an image. With the power of R and MXNet, you can try something very similar on your own laptop: an image classification shiny app. Thanks to the powerful shiny framework, it is implemented with no more than 150 lines of R code.

The Method of Boosting

One of the techniques that has caused the most excitement in the machine learning community is boosting, which in essence is a process of iteratively refining, e.g. by reweighting, of estimated regression and classification functions (though it has primarily been applied to the latter), in order to improve predictive ability. Much has been made of the remark by the late statistician Leo Breiman that boosting is “the best off-the-shelf classifier in the world,” his term off-the-shelf meaning that the given method can be used by nonspecialist users without special tweaking. Many analysts have indeed reported good results from the method.

Deploying Your Very Own Shiny Server

In this tutorial, I’m going to walk you through the process of:
1. Setting up an Ubuntu 14.04 + NGINX server at DigitalOcean
2. Installing and configuring R
3. Installing and configuring Shiny and the open-source edition of Shiny Server
4. Installing a free SSL certificate from Let’s Encrypt
5. Securing the Shiny Server using the SSL cert and reverse proxy through NGINX
6. Setting appropriate permissions on the files to be served
7. Creating and launching the app Nicole created in her recent post