A novel cost-sensitive framework for customer churn predictive modeling
In this paper, we present a new cost-sensitive framework for customer churn predictive modeling. First we propose a new financial based measure for evaluating the effectiveness of a churn campaign taking into account the available portfolio of offers, their individual financial cost and probability of offer acceptance depending on the customer profile. Then, using a real-world churn dataset we compare different cost-insensitive and cost-sensitive classification algorithms and measure their effectiveness based on their predictive power and also the cost optimization. The results show that using a cost-sensitive approach yields to an increase in cost savings of up to 26.4 %.

Bumping or Escaping Local Minima
Yes, this is about machine learning and not some weird fetish. This post is totally safe for work, promise. With that out of the way: What is bumping? Bumping is a simple algorithm that can help your classifier escape from a local minimum. Huh? Read on, after a few imports you will see what I mean.

R 3.2.1 is released
R 3.2.1 (codename ‘World-Famous Astronaut’) was released yesterday.

Building self-service tools to monitor high-volume time-series data
One of the main sources of real-time data processing tools is IT operations. In fact, a previous post I wrote on the re-emergence of real-time, was to a large extent prompted by my discussions with engineers and entrepreneurs building monitoring tools for IT operations. In many ways, data centers are perfect laboratories in that they are controlled environments managed by teams willing to instrument devices and software, and monitor fine-grain metrics.

Building Tools for Hyperparameter Search
In this post, I describe our thinking process while building a tool for data scientists. The goal: to make hyper-parameter tuning straightforward in GraphLab Create.

Popular Deep Learning Tools – a review
Deep Learning is the hottest trend now in AI and Machine Learning. We review the popular software for Deep Learning, including Caffe, Cuda-convnet, Deeplearning4j, Pylearn2, Theano, and Torch.

Webinar: Intro to SparkR
Are you interested in combining the power of R and Spark? An ‘Intro to SparkR’ webinar will take place on July 15, 2015 at 10 am California time. Everyone is welcome to attend.

Fishing for packages in CRAN
It is incredibly challenging to keep up to date with R packages. As of today (6/16/15), there are 6,789 listed on CRAN. Of course, the CRAN Task Views are probably the best resource for finding what’s out there. A tremendous amount of work goes into maintaining and curating these pages and we should all be grateful for the expertise, dedication and efforts of the task view maintainers. But, R continues to grow at a tremendous rate. (Have a look at growth curve in Bob Muenchen’s 5/22/15 post R Now Contains 150 Times as Many Commands as SAS). CRANberries, a site that tracks new packages and package updates, indicates that over the last few months the list of R packages has been growing by about 100 packages per month. How can anybody hope to keep current?

Connecting R to Everything with IFTTT
Hopefully, this little example has demonstrated how IFTTT’s Maker Channel can be used to connect R to a whole lot of online services. Have at it!

Tips & Tricks 9: Shape Changes and Hypothetical Shapes
Today we will use some relatively simple R code to create a shape based on position in a Principal Component (PC) shape space and visualise this shape as a change from the mean using plotRefToTarget().

Probabilities in Google Analytics Content Experiments
Have you ever taken a look at the ‘probability of outperforming’ metric in Google Analytics’ Content Experiments and wondered how it was calculated? Have you ever scratched your head because the numbers didn’t make sense to you? I certainly have. It’s hard to see experiment results like the ones depicted below and not wonder what’s going on underneath the hood.