Intro to Deep Learning with Theano and OpenDeep by Markus Beissinger
Deep learning currently provides state-of-the-art performance in computer vision, natural language processing, and many other machine learning tasks. In this talk, we will learn when deep learning is useful (and when it isn’t!), how to implement some simple neural networks in Python using Theano, and how to build more powerful systems using the OpenDeep package.
Using system and web fonts in R plots
The forthcoming R Journal has an interesting article on the showtext package by Yixuan Qiu. The package allows me to use system and web fonts directly in R plots, reminding me a little of the approach taken by XeLaTeX. But ‘unlike other methods to embed fonts into graphics, showtext converts text into raster images or polygons, and then adds them to the plot canvas. This method produces platform-independent image files that do not rely on the fonts that create them.’
May 2015: Scripts of the Week
Every day, the team at Kaggle HQ shares scripts that wow us in our company chat tool. Our ‘script of the week’ was created to make sure the larger community doesn’t miss out on this great content. Every Friday we share our script of the week on the forums and Twitter. We’ll also be aggregating these scripts at the end of the month to post in the blog. Have a question or want to leave feedback for a script’s creator? You can now comment directly on the script’s page below the output!
Heteroscedasticity in Regression — It Matters!
R’s main linear and nonlinear regression functions, lm() and nls(), report standard errors for parameter estimates under the assumption of homoscedasticity, a fancy word for a situation that rarely occurs in practice. The assumption is that the (conditional) variance of the response variable is the same at any set of values of the predictor variables.
R in a 64 bit world
32 bit data structures (pointers, integer representations, single precision floating point) have been past their ‘best before date’ for quite some time. R itself moved to a 64 bit memory model some time ago, but still has only 32 bit integers. This is going to get more and more awkward going forward. What is R doing to work around this limitation?
Why has R, despite quirks, been so successful?
I think R sometimes gets a bit of an unfair rap from its quirks, but in fact these design decisions – made in the interest of making R extensible rather than fast – have enabled some truly important innovations in statistical computing:
• The fact that R has lazy evaluation allowed for the development of the formula syntax, so useful for statistical modeling of all kinds.
• The fact that R supports missing values as a core data value allowed R to handle real-world, messy data sources without resorting to dangerous hacks (like using zeroes to represent missing data).
• R’s package system – a simple method of encapsulating user-contributed functions for R – enabled the CRAN system to flourish. The pass-by-value system and naming notation for function arguments also made it easy for R programmers to create R functions that could easily be used by others.
• R’s graphics system was designed to be extensible, which allowed the ggplot2 system to be built on top of the ‘grid’ framework (and influencing the look of statistical graphics everywhere).
• R is dynamically typed and allows functions to ‘reach outside’ of scope, and everything is an object – including expressions in the R language itself. These language-level programming features allowed for the development of the reactive programming framework underlying Shiny.
• The fact that every action in R is a function – including operators – allowed for the development of new syntax models, like the %>% pipe operator in magrittr.
• R gives programmers the ability to control the REPL loop, which allowed for the development of IDEs like ESS and RStudio.
• The ‘for’ loops can be slow in R which … well, I can’t really think of an upside for that one, except that it encouraged the development of high-performance extension frameworks like Rcpp.