Interactive Charts using htmlwidgets

This was a deck used in my presentation to the Inland Northwest R user Group this past Friday (November 6, 2015). The introduction of htmlwidgets has opened up a wide-range of options for R-users without having the need to pick-up on JavaScript to create great charts and I would like to thank the great work of many in the community, including R-Studio, Ramnath Vaidyanathan and Kenton Russell for htmlwidgets and the many widget developers.


Bringing Julia from beta to 1.0 to support data-intensive, scientific computing

As data-driven research becomes more mainstream, the need for efficient and powerful scientific computing tools increases. To help meet this need, the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative has granted Julia Computing $600,000 over the next two years to enable the Julia Language team to move their core open-source computing language and libraries into its first production version over the next two years.


Visual Data Mining with Item Explorer

Item explorer is an open source visual data mining tool based on d3.js. It enables the user to interactively explore combinatorial questions such as analyzing frequent item sets.


10 Free Hadoop Tutorials

1. tutorialspoint
2. coreservlets.com
3. Yahoo Developer Network
4. HortonWorks
5. HadoopTutorials
6. guru99.com
7. Skilledup
8. Hadoop-Skills
9. Mapr Academy
10. udemy by Jigar Vora


Confidence Intervals for Mean Growth Forecast

One important aspect of real GDP growth is volatility. We will see that all methods of forecasting produce forecasts that are much smoother than actual series. For example, averaging observations is a smoother using all past observations in a symmetrical fashion and as such forecasts are just a single point estimate that is the same for any horizon length.


USFD: Twitter NER with Drift Compensation and Linked Data

This paper describes a pilot NER system for Twitter, comprising the USFD system entry to the W-NUT 2015 NER shared task. The goal is to correctly label entities in a tweet dataset, using an inventory of ten types. We employ structured learning, drawing on gazetteers taken from Linked Data, and on unsupervised clustering features, and attempting to compensate for stylistic and topic drift – a key challenge in social media text.


Launch Apache Spark on AWS EC2 and Initialize SparkR Using Rstudio

In this blog post, we shall learn how to launch a Spark stand alone cluster on Amazon Web Services (AWS) Elastic Compute Cloud (EC2) for analysis of Big Data. This is a continuation from our previous blog, which showed us how to download Apache Spark and start SparkR locally on windows OS and RStudio.


Hierarchical Loss Reserving with Stan

I continue with the growth curve model for loss reserving from last week’s post. Today, following the ideas of James Guszcza I will add an hierarchical component to the model, by treating the ultimate loss cost of an accident year as a random effect. Initially, I will use the nlme R package, just as James did in his paper, and then move on to Stan/RStan , which will allow me to estimate the full distribution of future claims payments.