From Pixels to Torques

In this research, we introduce a deep dynamical convolutional model, able to learn complex non-linear dynamics and do long-term predictions. Compared to state-of-the-art reinforcement learning methods for continuous state and action space problems, our approach is solid and efficient as it is model-based, is scalable to high-dimensional state spaces, learns quickly, and is a major step towards fully autonomous learning from pixels to torques.


Dato Truly Native? Winner’s Interview: 2nd place, mortehu

In Dato’s Truly Native? competition Kagglers were given the HTML of webpages on StumbleUpon and challenged to identify paid content (in the form of native advertising) from unpaid content. Morten Hustveit finished in second place out of 340 data scientists on 274 teams. His previous work researching and building a text classifier program for HTML documents gave him a unique competitive edge.


Don’t use stats::aggregate()

When working with an analysis system (such as R) there are usually good reasons to prefer using functions from the “base” system over using functions from extension packages. However, base functions are sometimes locked into unfortunate design compromises that can now be avoided. In R’s case I would say: do not use stats::aggregate().


The Traveling Vampire Problem

Let’s say you are a vampire and you would like to figure out the shortest route to visit the supple necks of N maidens. But, there is only so much time in any night!


Using Sankey Diagrams to Visualize Better Ad Placement

How do you use data to decide where to advertise? At RadiumOne, we spend a lot of time on that question. Considering that there are a vast number of websites that sell advertising space, and that different ads will perform better with different audiences, it can get to be pretty complicated. We recently developed a new visualization tool to help make sense of advertising performance data. As you can see from the chart above, we can easily track the performance of our ad testing platform as it shuffles advertising domains between three buckets: Test, Performant, and Holding. This tool came out of our participation in Insight’s Data Visualization Lab.


Two-Way ANOVA with Repeated Measures

When I was studying psychology as an undergraduate, one of my biggest frustrations with R was the lack of quality support for repeated measures ANOVAs.They’re a pretty common thing to run into in much psychological research, and having to wade through incomplete and often contradictory advice for conducting them was (and still is) a pain, to put it mildly.


Semantic Technology Is Not Only For Data Geeks

You can’t bring up semantics without someone inserting an apology for the geekiness of the discussion. If you are a data person like me, geek away! But, for everyone else it was a topic that was best left to alone. Well, like every geek, the semantic geeks now have their day – and may just rule the data world.


A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion

We present a probabilistic suggestion model that is able to account for sequences of previous queries of arbitrary lengths. Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity.


How Big Data is used in Recommendation Systems to change our lives

A Recommendation systems have impacted or even redefined our lives in many ways. It works in well-defined, logical phases which are data collection, ratings, and filtering.


Demo: R in SQL Server 2016

At the PASS Summit in Seattle this week, Microsoft’s Jason Wilcox and Gopi Kumar demonstrated a SQL Server 2016 application that embeds R to predict what time you need to leave to catch a flight, given traffic, check-in time, and the likelihood of a flight leaving early or being delayed.


Visualizing Chess Data With ggplot

There are nice visualizations from chess data: piece movement, piece survaviliy, square usage by player. Sadly not always the authors shows the code/data for replicate the final result. So I wrote some code to show how to do some this great visualizations entirely in R. Just for fun.