Collaborative Filtering Recommender Systems – Item Based approach

In the series of implementing Recommendation engines, in my previous blog about recommendation system in R, I have explained about implementing user based collaborative filtering approach using R. In this post, I will be explaining about basic implementation of Item based collaborative filtering recommender systems in r.

Predictive Analytics in the Supply Chain

Predictive analytics are increasingly important to Supply Chain Management making the process more accurate, reliable, and at reduced cost. To be at the top of your game as a supply chain manager you need to understand and utilize advanced predictive analytics.

Big Data, Big Smiles

Having got your attention I would like to introduce you to a pragmatic, real-world and business centric approach to Big Data and Big Data Analytics. When I say that this is the best approach to Big Data you are ever likely to find in the whole universe and in your entire life, I am still significantly understating the magnificent utility, timeliness and the here-and-now facets of the approach. Now with the introduction done and dusted with, and the virtues of the BIG SMILES approach exalted, it should come as no surprise that this eminently sensible, highly rational and thoroughly reasonable methodical and no-nonsense technique has been applied successfully in more than 500 business-oriented situations. Best of all, this amazing Big Data approach is free of charge and with no-strings attached – you don’t even have to buy my book. Now, isn’t that amazing? Let’s start with the basics. The BIG in BIG SMILES refers to Business Insight Gains. This refers to the focus of SMILES.

Rotation Forest – A Classifier Ensemble Based on Feature Extraction

The claim to fame of most of these methods is that they need zero data preparation as compared to the other methods, can obtain very good results, and can be provided as a black box of tools in the hands of software engineers. By design, bagging lends itself nicely to parallelization. Hence, these methods can be easily applied on a very large dataset in a cluster environment. The decision tree algorithms split the input data into various regions at each level of the tree. Thus, they perform implicit feature selection. Feature selection is one of the most important tasks in building a good model. By providing implicit feature selection, trees are at an advantageous position as compared to other techniques. Hence, bagging with decision trees comes with this advantage.

TreeMap World Population visualisation

This example is inspired by the examples of the treemap package. You’ll learn how to
• convert a data.frame to a data.tree structure
• navigate a tree and locate specific nodes
• use Aggregate and Cumulate
• manipulate an existing tree, e.g. by using the Prune method
• use data.tree in connection with the treemap package

How to store and use webservice keys and authentication details with R

I frequently get asked the question how you can safely store login details and passwords for use by R, without exposing these details in your script. Yesterday Jennifer Bryan asked this question on twitter and a small storm of views and tweets erupted.

Deaths from assault over time in 40 relatively rich countries

Simple but effective graphics showing relative country standings with regard to deaths from assault, and trends over time. Most countries in this group are seeing a decline in deaths from assault, from peaks in the 1970s, ’80s or ’90s. Data are downloaded from OECD.Stat via their API and SDMX and this post shows how to do this, and also demos screen scraping with {rvest} – not that it needs any demo-ing, it’s so user-friendly.

Count The Mondays in a Time Interval with Lubridate

Recently, while working on quantifying the inpatient workload volume of routine tests as a function of the day of the week, I needed to be able to count the number of Mondays, Tuesdays, etc in a time-interval so I could calculate the average volume for each weekday in a time-interval.

Online R courses at Udemy

For the next 36 hours, Udemy is offering readers of R-bloggers access to its global online learning marketplace with a (special) $15 (up to 98% off) deal on over 17,000 of their courses (including R-Programming and Python courses).

Automate the Boring Stuff: GGPlot2

The majority of my interaction with the ggplot2 package involves the interactive execution of code to visualize data within the context of exploratory data analysis. This is often a manual process and quite laborious. I recently sought to improve these tasks by creating a series of user defined functions that contained my most commonly used ggplot calls. These functions could then be sourced in and the appropriate arguments specified to generate the desired visualization. While this is a fairly simple task, attempting to call ggplot2 functions within a user defined function requires some understanding of R’s evaluation procedures. The key thing to remember is that the generic aes mapping argument uses non-standard evaluation to specify variables names within ggplot. When programming, it is suggested that we utilize standard evaluation by using aes_string to map the properties of a geom. Here are some examples of how aes_string can be utilized within a function to create graphics.