Analytics without actions – why bother?

Modern applications are ingesting vast quantities of streaming event data from multiple sources in real time. The businesses behind those applications hope to use that data to benefit both the organization and its customers: better service, delightful user experiences, and more personalized interactions.

Using Decision Trees to Predict Infant Birth Weights

In this article, I will show you how to use decision trees to predict whether the birth weights of infants will be low or not. We will use the birthwt data from the MASS library.

A Guide on Google’s Author Rank

One of the major significant shifts in the SEO world last year which was also eyed as bigger than Panda and Penguin algorithms update is the Author Rank, a new added challenge for web marketers and SEO experts to prove their content quality and authority in their respective niche online. Author Rank is not only designed to provide search results transparency and page ranking influencing factor but also a way to combat plagiarism in the web.

24 Uses of Statistical Modeling (Part I)

1. Spatial Models
2. Time Series
3. Survival Analysis
4. Market Segmentation
5. Recommendation Systems
6. Association Rule Learning
7. Attribution Modeling
8. Scoring
9. Predictive Modeling
10. Clustering
11. Supervised Classification
12. Extreme Value Theory

Exploring the VW scandal with graph analysis

The German car manufacturer admitted cheating emissions tests. Which companies are impacted directly or indirectly by these revelations? James Phare from Data To Value used graph analysis and open source information to unravel the impact of the VW scandal on its customers, partners and shareholders.

Best of the Visualisation Web… November 2015

At the end of each month I pull together a collection of links to some of the most relevant, interesting or thought-provoking web content I’ve come across during the previous month. Here’s the latest collection from November 2015.

Securely storing your secrets in R code

Last month I wrote about How to store and use webservice keys and authentication details, a summary of the options mentioned in a twitter discussion started by Jennifer Bryan. All of the options in my article really stored the secrets in plain text somewhere on your system, but in such a way to minimize the risk of accidentally publishing the secrets. Since then, I’ve had several comments (via twitter as well as the blog comments), about alternative options to really store your keys securely:

Update: Google TensorFlow Deep Learning Is Improving

The recent open sourcing of Google’s TensorFlow was a significant event for machine learning. While the original release was lacking in some ways, development continues and improvements are already being made.

Improving Semi-Supervised Learning with Auxiliary Deep Generative Models

Deep generative models based upon continuous variational distributions parameterized by deep networks give state-of-the-art performance. In this paper we propose a framework for extending the latent representation with extra auxiliary variables in order to make the variational distribution more expressive for semi-supervised learning. By utilizing the stochasticity of the auxiliary variable we demonstrate how to train discriminative classifiers resulting in state-of-the-art performance within semi-supervised learning exemplified by an 0.96% error on MNIST using 100 labeled data points. Furthermore we observe empirically that using auxiliary variables increases convergence speed suggesting that less expressive variational distributions, not only lead to looser bounds but also slower model training.

Be A Star Data Scientist: Certifications For Overall Excellence

The necessary skills for data scientists vary widely, depending on field and how up to date any given company’s set-up is. But in this era of increased demand for IT excellence, there are certain certifications that will help you stand out from the crowd and differentiate yourself.
1. Certified Analytics Professional
2. Cisco Certified Network Associate (CCNA)
3. Microsoft SQL Certification

5 Data Cleansing Tools

1. Drake
2. Talend
3. Winpure
4. SQL Power DQguru
5. Data Cleansing Suite