Predicting Daily Activities from Egocentric Images Using Deep Learning

We present a method to analyze images taken from a passive egocentric wearable camera along with the contextual information, such as time and day of week, to learn and predict everyday activities of an individual. We collected a dataset of 40,103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities. Classification is conducted using a Convolutional Neural Network (CNN) with a classification method we introduce called a late fusion ensemble. This late fusion ensemble incorporates relevant contextual information and increases our classification accuracy. Our technique achieves an overall accuracy of 83.07% in predicting a person’s activity across the 19 activity classes. We also demonstrate some promising results from two additional users by fine-tuning the classifier with one day of training data.

A Step-by-Step Plan for Getting Your Company Started with Predictive Analytics – Part 1

Over 80% of companies are not yet using advanced analytics. Here’s a step-by-step plan to implement a brand new predictive analytics program getting the biggest bang for your buck from the most cost effective investment.

Learning Social Network Embeddings for Predicting Information Diffusion

We introduce here a new approach to this problem whose goal is to learn a mapping of the observed temporal dynamic onto a continuous space. Nodes participating to diffusion cascades are projected in a latent representation space in such a way that information diffusion can be modeled efficiently using a heat diffusion process. This amounts to learning a diffusion kernel for which the proximity of nodes in the projection space reflects the proximity of their infection time in cascades.

Grasp-and-Lift EEG Detection Winners’ Interview: 3rd place, Team HEDJ

This is our 3rd place solution to the Grasp-and-Lift EEG Detection Competition on Kaggle. The main aim of the competition was to identify when a hand is grasping, lifting, and replacing an object using EEG data that was taken from healthy subjects as they performed these activities. Better understanding the relationship between EEG signals and hand movements is critical to developing a BCI device that would give patients with neurological disabilities the ability to move through the world with greater autonomy.

Data Driven Digest for October 2: Traffic and Public Transit

We’ve assembled three very innovative ways of showing and analyzing transportation data. Enjoy! – See more at: http://…/#sthash.itmAKkoX.dpuf

How to actually learn data science

It’s an exciting time for data science. The field is new, but growing quickly. There’s huge demand for data scientists – average compensation in SF is well north of 100 thousand dollars a year. Where there’s money, there are also people trying to earn it. The data science skills gap means that many people are learning or trying to learn data science. The first step to learning data science is usually asking “how do I learn data science?”. The response to this question tends to be a long list of courses to take and books to read, starting with linear algebra or statistics. I went through this myself a few years ago when I was learning. I had no programming background, but knew that I wanted to work with data. I can’t fully explain how immensely unmotivating it is to be given a huge list of resources without any context. It’s akin to a teacher handing you a stack of textbooks and saying “read all of these”. I struggled with this approach when I was in school. If I had started learning data science this way, I never would have kept going. Some people learn best with a list of books, but I learn best by building and trying things. I learn when I’m motivated, and when I know why I’m learning something. Best of all, when you learn this way, you come out with immediately useful skills. From my conversations with new learners over the years, I know many share these views. That’s why I don’t think your first goal should be to learn linear algebra or statistics. If you want to learn data science, your first goal should be to learn to love data. Interested in finding out how? Read on to see how to actually learn data science.

Undirected Graphs When the Causality Is Mutual

Structural equation models impose causal order on a set of observations. We start with a measurement model: a list of theoretical constructs and a table assigning what is observed (manifest) to what is hidden (latent). Although it is possible to think of this assignment as formative rather than reflective, the default is a causal connection with the latent variables responsible for the observed scores. Next, we draw arrows specifying the cause and effect relationships among the latent variables. All of this is shown in great detail with a customer satisfaction example in the very well-written vignette for the R package semPLS, which uses partial least squares (PLS) to fit structural equations models (sem).

Using differential privacy to reuse training data

Win-Vector LLC‘s Nina Zumel wrote a great article explaining differential privacy and demonstrating how to use it to enhance forward step-wise logistic regression. This allowed her to reproduce results similar to the recent Science paper “The reusable holdout: Preserving validity in adaptive data analysis”. The technique essentially protects and reuses test data, allowing the series of adaptive decisions driving forward step-wise logistic regression to remain valid with respect to unseen future data. Without the differential privacy precaution these steps are not always sufficiently independent of each other to ensure good model generalization performance. Through differential privacy one gets safe reuse of test data across many adaptive queries, yielding more accurate estimates of out of sample performance, more robust choices, and resulting in a better model. In this note I will discuss a specific related application: using differential privacy to reuse training data (or equivalently make training procedures more statistically efficient). I will also demonstrate similar effects using more familiar statistical techniques.