A Gentle Introduction to Artificial Neural Networks
Though many phenomena in the world can be adequately modeled using linear regression or classification, most interesting phenomena are generally nonlinear in nature. In order to deal with nonlinear phenomena, there have been a diversity of nonlinear models developed. For example parametric models assume that data follow some parameteric class of nonlinear function (e.g. polynomial, power, or exponential), then fine-tune the shape of the parametric function to fit observed data. However this approach is only helpful if data are fit nicely by the available catalog of parametric functions. Another approach, kernel-based methods, transforms data non-linearly into an abstract space that measures distances between observations, then predicts new values or classes based on these distances. However, kernel methods generally involve constructing a kernel matrix that depends on the number of training observations and can thus be prohibitive for large data sets. Another class of models, the ones that are the focus of this post, are artificial neural networks (ANNs). ANNs are nonlinear models motivated by the physiological architecture of the nervous system. They involve a cascade of simple nonlinear computations that when aggregated can implement robust and complex nonlinear functions. In fact, depending on how they are constructed, ANNs can approximate any nonlinear function, making them a quite powerful class of models (note that this property is not reserved for ANNs; kernel methods are also considered ‘universal approximators’; however, it turns out that neural networks with multiple layers are more efficient at approximating arbitrary functions than other methods. I refer the interested reader to more in-depth discussion on the topic.).

Math of Ideas: A Word is Worth a Thousand Vectors
Word vectors give us a simple and flexible platform for understanding text, there are a few diverse examples that should help build your confidence in developing and deploying NLP systems and what problems they can solve.

Topic Modeling in Multi-Aspect Reviews
The purpose of this project is to investigate topic modeling in multi-aspect reviews. More specifically, I wanted to investigate a way to find the words in reviews which were associated with the different categories being rated. Since I, like seemingly all data sciencists, love beer, I was thrilled to find a dataset containing about 1.5 million beer reviews from the beeradvocate website. Below is a summary of my workflow and findings in playing around with this dataset.

5 Unusual Ways Businesses Are Using Big Data
1. Parking Lot Analytics 2. Dating Driven By Data 3. Data at the Australian Open 4. Dynamic Ticket Pricing 5. Ski Resorts and Big Data

Statistics journals network
Xian blogged recently on the incoming RSS read paper: Statistical Modelling of Citation Exchange Between Statistics Journals, by Cristiano Varin, Manuela Cattelan and David Firth. Following the last JRSS B read paper by one of us! The data that are used in the paper (and can be downloaded here) are quite fascinating for us, academics fascinated by academic rankings, for better or for worse (ironic here). They consist in cross citations counts C = (C_{ij}) for 47 statistics journals (see list and abbreviations page 5): C_{ij} is the number of citations from articles published in journal j in 2010 to papers published in journal i in the 2001-2010 decade. The choice of the list of journals is discussed in the paper. Major journals missing include Bayesian Analysis (published from 2006), The Annals of Applied Statistics (published from 2007).

Now you can start every R tutorial for free @DataCamp
Want to learn more on topics such as R programming, dplyr, data.table, ggvis, and R Markdown? You can now start every DataCamp course for free, and discover how fun it is to learn R in the comfort of your own browser.

Guidelines for reporting confidence intervals
1. Report credible intervals instead. 2. Do not use procedures whose Bayesian properties are not known. 3. Warn readers if the confidence procedure does not correspond to a Bayesian procedure. 4. Never report a confidence interval without noting the procedure and the corresponding statistics. 5. Consider reporting likelihoods or posteriors instead.

The Data Science Ecosystem
Data science initiatives have been popping up at an increasing pace in the last couple of years all around the world. …

Data Science 101: Preventing Overfitting in Neural Networks
Overfitting is a major problem for Predictive Analytics and especially for Neural Networks. Here is an overview of key methods to avoid overfitting, including regularization (L2 and L1), Max norm constraints and Dropout.

Grades of evidence – A cheat sheet
There are at least three traditions in statistics which work with a kind of likelihood ratios (LRs): the ‘Bayes factor camp’, the ‘AIC camp’, and the ‘likehood camp’. In my experience, unfortunately most people do not have an intuitive understanding of LRs. When I give talks about Bayes factors, the most predictable question is ‘And how much is a BF of 3.4? Is that something I can put confidence in?’.

Understanding Bayes: A Look at the Likelihood
Much of the discussion in psychology surrounding Bayesian inference focuses on priors. Should we embrace priors, or should we be skeptical? When are Bayesian methods sensitive to specification of the prior, and when do the data effectively overwhelm it? Should we use context specific prior distributions or should we use general defaults? These are all great questions and great discussions to be having. One thing that often gets left out of the discussion is the importance of the likelihood. The likelihood is the workhorse of Bayesian inference. In order to understand Bayesian parameter estimation you need to understand the likelihood. In order to understand Bayesian model comparison (Bayes factors) you need to understand the likelihood and likelihood ratios.