Bayes meets Fourier

By ‘Bayes filter’, I don’t mean spam filtering using a Bayesian classifier, but rather recursive Bayesian estimation, which is used in robotics and other domains to estimate the state of a system that evolves over time, for example, the position of a moving robot. My interest in this topic was prompted by Roger Labbe’s book, Kalman and Bayesian Filters in Python, which I am reading with my book club. The Python code for this article is in this IPython notebook. If you get sick of the math, you might want to jump into the code.

Are You Holding the Map Upside Down?

These days, it seems as if it is not only true that “there’s an app for that,” but there is a map for that, too, with apps like Google Maps, FlightAware, and MapMyRun becoming more and more popular. Today’s maps include 3D, street view, business reviews, nearby friends, and much more. Heatmaps have also evolved into a must-have tool for customer experience analysis, which we discussed in a previous blog post, as they enable you to visualize the mouse moves, clicks, hovers, and scroll patterns of website visitors. Customer experience professionals rely on heatmaps like 1990’s Domino’s Pizza drivers relied on a good old-fashioned Thomas Guide or printable Mapquest. But an error in analyzing a heatmap could be much more disastrous than flipping to the wrong map book page.

A Tour of Machine Learning Algorithms

In this post we take a tour of the most popular machine learning algorithms. It is useful to tour the main algorithms in the field to get a feeling of what methods are available. There are so many algorithms available and it can feel overwhelming when algorithm names are thrown around and you are expected to just know what they are and where they fit. In this post I want to give you two ways to think about and categorize the algorithms you may come across in the field.
• The first is a grouping of algorithms by the learning style.
• The second is a grouping of algorithms by similarity in form or function (like grouping similar animals together).
Both approaches are useful, but we will focus in on the grouping of algorithms by similarity and go on a tour of a variety of different algorithm types. After reading this post, you will have a much better understanding of the most popular machine learning algorithms for supervised learning and how they are related.

A Map to Perfection: Using D3.js to Make Beautiful Web Maps

Data Driven Documents, or D3.js, is “a JavaScript library for manipulating documents based on data”. Or to put it more simply, D3.js is a data visualization library. It was developed by Mike Bostock with the idea of bridging the gap between static display of data, and interactive and animated data visualizations. D3 is a powerful library with a ton of uses. In this tutorial, I’ll discuss one particularly compelling application of D3: map making. We’ll go through the common challenges of building a useful and informative web map, and show how in each case, D3.js gives capable JavaScript developers everything they need to make maps look and feel beautiful.

Using kNN Classifier to Predict Whether the Price of Stock Will Increase

In this article, I will show you how to use the k-Nearest Neighbors algorithm (kNN for short) to predict whether price of Apple stock will increase or decrease. I obtained the data from Yahoo Finance

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We introduce the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder. The encoder allows one to rank images and sentences while the decoder can generate novel descriptions from scratch. Using LSTM to encode sentences, we match the state-of-the-art performance on Flickr8K and Flickr30K without using object detections. We also set new best results when using the 19-layer Oxford convolutional network. Furthermore we show that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic e.g. image of a blue car – ‘blue’ + ‘red’ is near images of red cars. Sample captions generated for 800 images are made available for comparison.

scikit-learn video #9: Better evaluation of classification models

In this video, you’ll learn how to properly evaluate a classification model using a variety of common tools and metrics, as well as how to adjust the performance of a classifier to best match your business objectives.

Why Deep Learning Works II: the Renormalization Group

Deep Learning is amazing. But why is Deep Learning so successful? Is Deep Learning just old-school Neural Networks on modern hardware? Is it just that we have so much data now the methods work better? Is Deep Learning just a really good at finding features. Researchers are working hard to sort this out.

Convex Relaxations of Transductive Learning

Why are SVMs interesting? It is just a better way to do Logistic Regression? Is it the Kernel Trick? And does this even matter now that Deep Learning is everywhere? To the beginning student of machine learning, SVMs are the first example of a Convex Optimization method. To the advanced practitioner, SVMs are the starting point to creating powerful Convex Relaxations to hard problems.

What Does the AVERAGE Brand Logo Look Like?

PNG images are essentially a grid of values that represent colors to display. Since each cell in the grid is made up of numbers, I got curious about what it might mean to aggregate multiple PNGs. What would it look like to average two or more images? Median? Mode? Random? To do so, I pulled the top 100 brands’ logos from Best Global Brands. Then I used the (layers of) values as inputs to aggregate in various ways. Averaging these logos yields this gray blob that looks roughly, well, saturnine.

Power analysis for mixed models

This is a quick note that may be useful for some people. I was interested in knowing how many years of monitoring we need to detect a trend. This is a long term monitoring project, so we already have 7 years of data to play with. For a simple design, you can use the pwr library in R to answer your question, but for nested designs (i.e. random factors) things get hairy. In this and this paper they suggest building your own simulation and both has quite complex supplementary material with R code. I didn’t spent enough time to make sense of them. I also found thanks to @frod_san two packages that do it for you. The first, PAMM, is broken (lme4 keep evolving, while the package didn’t, so even the example they use don’t work*). The second (SimR) is not published yet, but is amazingly simple. All its code is in github and they are fast at fixing any bug you may detect (they fixed a small bug I found in no time).