Python Excel Tutorial: The Definitive Guide

You will probably already know that Excel is a spreadsheet application developed by Microsoft. You can use this easily accessible tool to organize, analyze and store your data in tables. What’s more, this software is widely used in many different application fields all over the world. And, whether you like it or not, this applies to data science. You’ll need to deal with these spreadsheets at some point, but you won’t always want to continue working in it either. That’s why Python developers have implemented ways to read, write and manipulate not only these files, but also many other types of files.


Learning to learn by gradient descent by gradient descent

One of the things that strikes me when I read these NIPS papers is just how short some of them are – between the introduction and the evaluation sections you might find only one or two pages! A general form is to start out with a basic mathematical model of the problem domain, expressed in terms of functions. Selected functions are then learned, by reaching into the machine learning toolbox and combining existing building blocks in potentially novel ways. When looked at this way, we could really call machine learning ‘function learning‘. Thinking in terms of functions like this is a bridge back to the familiar (for me at least). We have function composition. For example, given a function f mapping images to feature representations, and a function g acting as a classifier mapping image feature representations to objects, we can build a systems that classifies objects in images with g \circ f. Each function in the system model could be learned or just implemented directly with some algorithm. For example, feature mappings (or encodings) were traditionally implemented by hand, but increasingly are learned…


Data Science: Identifying Variables That Might Be Better Predictors

I love the simplicity of the data science concepts as taught by the book “Moneyball.” Everyone wants to jump right into the real meaty, highly technical data science books. But I recommend to my students to start with the book “Moneyball.” The book does a great job of making the power of data science come to life (and the movie doesn’t count, as my wife saw it and “Brad Pitt is so cute!” was her only takeaway…ugh).


The real meaning of spurious correlations

Like many data nerds, I’m a big fan of Tyler Vigen’s Spurious Correlations, a humourous illustration of the old adage “correlation does not equal causation”. Technically, I suppose it should be called “spurious interpretations” since the correlations themselves are quite real, but then good marketing is everything.


fst: Fast serialization of R data frames

If you want to get data out of R and into another application or system, simply copying the data as it resides in memory generally isn’t an option. Instead you have to serialize the data (into a file, usually), which the other application can then deserialize to recreate the original data.


Where Do Z-Score Tables Come From? (+ how to make them in R)

Every student learns how to look up areas under the normal curve using Z-Score tables in their first statistics class. But what is less commonly covered, especially in courses where calculus is not a prerequisite, is where those Z-Score tables come from: by evaluating the integral of the equation for the bell-shaped normal curve, usually from -Inf to the z-score of interest. This is the same thing that the R command pnorm does when you provide it with a z-score. Here is the slide presentation I put together to explain the use and origin of the Z-Score table, and how it relates to pnorm and qnorm (the command that lets you input an area to find the z-score at which the area to the left is swiped out). It’s free to use under Creative Commons, and is part of the course materials that is available for use with this 2015 book.


Multipanel Graphics in R (part 1)

In many situations, we require that several plots are placed in the same figure as subplots. R has various ways of doing it. Base Graphics has three different ways to draw subplots, i.e. mfrow, layout and split.screen, with increasing degree of complexity, and, at the same time, with increased control over the plot elements. This example introduces the mfrow, mfcol and layout functions in Base Graphics. We use the familiar iris dataset for the illustrations. Answers to the exercises are available here.If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.