Growing one Tree
Consider the following toy dataset, with some spam/ham information, and two words, ‘viagra’ and ‘lottery’.

R: Conditionally Updating Rows of a Data Frame
In a blog post I wrote a couple of days ago about cohort analysis I had to assign a monthNumber to each row in a data frame and started out with the following code: …

Classifying plankton with deep neural networks
The National Data Science Bowl, a data science competition where the goal was to classify images of plankton, has just ended. I participated with six other members of my research lab, the Reservoir lab of prof. Joni Dambre at Ghent University in Belgium. Our team finished 1st! In this post, we’ll explain our approach.

DIGITS: Deep Learning GPU Training System
The hottest area in machine learning today is Deep Learning, which uses Deep Neural Networks (DNNs) to teach computers to detect recognizable concepts in data. Researchers and industry practitioners are using DNNs in image and video classification, computer vision, speech recognition, natural language processing, and audio recognition, among other applications. The success of DNNs has been greatly accelerated by using GPUs, which have become the platform of choice for training these large, complex DNNs, reducing training time from months to only a few days. The major deep learning software frameworks have incorporated GPU acceleration, including Caffe, Torch7, Theano, and CUDA-Convnet2. Because of the increasing importance of DNNs in both industry and academia and the key role of GPUs, last year NVIDIA introduced cuDNN, a library of primitives for deep neural networks. Today at the GPU Technology Conference, NVIDIA CEO and co-founder Jen-Hsun Huang introduced DIGITS, the first interactive Deep Learning GPU Training System. DIGITS is a new system for developing, training and visualizing deep neural networks. It puts the power of deep learning into an intuitive browser-based interface, so that data scientists and researchers can quickly design the best DNN for their data using real-time network behavior visualization. DIGITS is open-source software, available on GitHub, so developers can extend or customize it or contribute to the project.

Vector Autoregressive Models
Consider here some VAR(1) model …

Forecast, Automatic Routines vs. Experience
This morning, in our Time Series course, we’ve been playing with some data I got from Actually, we’ve been playing on some old version, downloaded 18 months ago (discussed in a previous post, in French)….

Seven Ways You Can Use A Linear, Polynomial, Gaussian, & Exponential Line Of Best Fit
A line of best fit lets you model, predict, forecast, and explain data. This post shows how you can use a line of best fit to explain college tuition, rats, turkeys, burritos, and the NHL draft. Read on or see our tutorials for more. Contact us if you’re interested in a trial of plotly on-premise. Developers, scroll down to see Python and R.

Model Segmentation with Cubist
Cubist is a tree-based model with a OLS regression attached to each terminal node and is somewhat similar to mob() function in the Party package ( Below is a demonstrate of cubist() model with the classic Boston housing data.

Growing some Trees
Consider here the dataset used in a previous post, about visualising a classification (with more than 2 features),