More 3D Graphics (rgl) for Classification with Local Logistic Regression and Kernel Density Estimates (from The Elements of Statistical Learning)
As part of my course on statistical learning, we created 3D graphics to foster a more intuitive understanding of the various methods that are used to relax the assumption of linearity (in the predictors) in regression and classification methods.
Finding the dramatic arc of novels with sentiment analysis
Sentiment analysis has been widely used to infer the mood of customers in emails, tweets and other short communications. The base assumption is that the sentiment is a fixed value: the email is either angry or happy; positive or negative. But in longer writings like a novel, we naturally expect the sentiment to vary over time. Can we apply sentiment analysis over the course of a long text, and thereby see the dramatic arc of a story as it flows from comedy to tragedy and maybe back again? That’s what Matthew Jockers has done with his R package “syuzhet”.
Probabilistic Programming for Advancing Machine Learning
Machine Learning is at the heart of modern approaches to artificial intelligence. The field posits that teaching computers how to learn can be significantly more effective than programming them explicitly. Unfortunately, building effective machine learning applications currently still requires Herculean efforts on the part of highly trained experts in machine learning. Probabilistic Programming is a new programming paradigm for managing uncertain information. The goal of the Probabilistic Programming for Advancing Machine Learning (PPAML) program is to facilitate the construction of machine learning applications by using probabilistic programming to:
1. Dramatically increase the number of people who can successfully build machine learning applications;
2. Make machine learning experts radically more effective; and
3. Enable new applications that are inconceivable today.
The PPAML program started in November 2013 and is scheduled to run 46 months, with three phases of activity through 2017.
Data science: It’s greater than the sum of the parts
This is the best statement I ever read about data science. It’s more than its components added together (statistics, machine learning, computer science, data plumbing, core data science, domain expertise, business acumen, hacking).