Best way to learn kNN Algorithm using R Programming

In this article, I’ll show you the application of kNN (k – nearest neighbor) algorithm using R Programming. But, before we go ahead on that journey, you should read the following articles:
• Basics of machine learning from my previous article
• Common machine learning algorithms
• Introduction to kNN – simplified
We’ll also discuss a case study which describes the step by step process of implementing kNN in building models.

Machine Learning Libraries in Go Language

Go , an open source language by Google was initially created by group of engineers who were frustrated with C++. Ever since their creation, the language has gotten traction for its simplicity. It ranked highly in the programming popularity indexes of Redmonk & TiOBE.

Let me guess where you’re from

On the website you can enter a name, and an algorithm will print five countries the name seems come from. Try it out!


auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.

10 IT Security Books for Big Data Scientists

1.The Realities of Securing Big Data
2.Data-Driven Security: Analysis, Visualization and Dashboards
3.Information Security Analytics: Finding Security Insights, Patterns, and Anomalies in Big Data
4.Hadoop Security: Protecting Your Big Data Platform
5.Big Data: A Primer
6.Application of Big Data for National Security: A Practitioner’s Guide to Emerging Technologies
7.Privacy and Big Data
8.Network Security with NetFlow and IPFIX: Big Data Analytics for Information Security
9.To the Cloud: Big Data in a Turbulent World
10.Compromised Data: From Social Media to Big Data

Spaghetti plots with ggplot2 and ggvis

This post was motivated by this article that discusses the graphics and statistical analysis for a two treatment, two period, two sequence (2x2x2) crossover drug interaction study of a new drug versus the standard. I wanted to write about implementing those graphics and the statistical analysis in R. This post is devoted to the different ways of generating the spaghetti plot in R, and the statistical analysis part will follow in the next post.

Top 3 Challenges Retailers Overcome With Business Analytics

Challenge #1: Consumers expect to have access to information about your products and services at any given time.
Challenge #2: We need to understand the consumer’s “path to purchase”.
Challenge #3: Information-empowered consumers are more demanding.

If you ask different questions you get different answers – one more way science isn’t broken it is just really hard

If you haven’t already read the amazing piece by Christie Aschwanden on why Science isn’t Broken you should do so immediately. It does an amazing job of capturing the nuance of statistics as applied to real data sets and how that can be misconstrued as science being “broken” without falling for the easy “everything is wrong” meme.

The World We Live In #5: Calories And Kilograms

I recently finished reading The Signal and the Noise, a book by Nate Silver, creator of the also famous FiveThirtyEight blog. The book is a very good reading for all data science professionals, and is a must in particular for all those who work trying to predict the future. The book praises the bayesian way of thinking as the best way to face and modify predictions and criticizes rigid ways of thinking with many examples of disastrous predictions. I enjoyed a lot the chapter dedicated to chess and how Deep Blue finally took over Kasparov. In a nutshell: I strongly recommend it. One of the plots of Silver’s book present a case of false negative showing the relationship between obesity and calorie consumption across the world countries. The plot shows that there is no evidence of a connection between both variables. Since it seemed very strange to me, I decided to reproduce the plot by myself.

Kickin’ it with elastic net regression

With the kind of data that I usually work with, overfitting regression models can be a huge problem if I’m not careful. Ridge regression is a really effective technique for thwarting overfitting. It does this by penalizing the L2 norm (euclidean distance) of the coefficient vector which results in “shrinking” the beta coefficients. The aggressiveness of the penalty is controlled by a parameter lambda. Lasso regression is a related regularization method. Instead of using the L2 norm, though, it penalizes the L1 norm (manhattan distance) of the coefficient vector. Because it uses the L1 norm, some of the coefficients will shrink to zero while lambda increases. A similar effect would be achieved in Bayesian linear regression using a Laplacian prior (strongly peaked at zero) on each of the beta coefficients.

Track Hurricane Danny (Interactively) with R + leaflet

Danny became the first hurricane of the 2015 Season, so it’s a good time to revisit how one might be able to track them with R. We’ll pull track data from Unisys and just look at Danny, but it should be easy to extrapolate from the code. For this visualization, we’ll use leaflet since it’s all the rage and makes the plots interactive without any real work (thanks to the very real work by the HTML Widgets folks and the Leaflet.JS folks).