New Hybrid Rare-Event Sampling Technique for Fraud Detection
Proposed hybrid sampling methodology may prove useful when building and validating machine learning models for applications where target event is rare, such as fraud detection.

How to spot first stories on Twitter using Storm
Every day, thousands of posts share information about news, events, automatic updates (weather, songs) and personal information. The information published can be retrieved and analyzed in a news detection approach. The immediate spread of events on Twitter combined with the large number of Twitter users prove it suitable for first stories extraction. Towards this direction, this project deals with a distributed real-time first story detection (FSD) using Twitter on top of Storm. Specifically, I try to identify the first document in a stream of documents, which discusses about a specific event. Let’s have a look into the implementation of the methods used.

Comprehensive Guide for Data Exploration in R
Till now we have already covered a detailed tutorials on data exploration using SAS and Python. What is the one piece missing to complete this series. I am sure you guessed it right. In this article I will give a detailed tutorial on Data Exploration using R. For reader ease, I will follow a very similar format we used in Python tutorial. This is just because of the sheer resemblance between the two languages.

R: Markov Chain Wikipedia Example
Over the weekend I’ve been reading about Markov Chains and I thought it’d be an interesting exercise for me to translate Wikipedia’s example into R code.

Thoughts following the 2015 “Text By The Bay” Conference
Five Takeaways on the State of Natural Language Processing:
1. word2vec and doc2vec appear to be pervasive
2. Production-grade NLP is Spreading in Industry
3. Open tools are being used but probably not compensated in the way they should
4. ‘RNNs for X’
5. A Big Problem: Massive Gender Imbalance