Easy Bayesian Bootstrap in R
A while back I wrote about how the classical non-parametric bootstrap can be seen as a special case of the Bayesian bootstrap. Well, one difference between the two methods is that, while it is straightforward to roll a classical bootstrap in R, there is no easy way to do a Bayesian bootstrap. This post, in an attempt to change that, introduces a bayes_boot function that should make it pretty easy to do the Bayesian bootstrap for any statistic in R. If you just want a function you can copy-n-paste into R go to The bayes_boot function below. Otherwise here is a quick example of how to use the function, followed by some details on the implementation.
Scatter Plots with Marginal Densities – An Example for Doing Exploratory Data Analysis with Tableau and R
One of the first stages in most data analysis projects is about exploring the data at hand. During this stage the analyst tries to get familiar with his dataset by looking at summary statistics, feature distributions and relationships between different attributes – just to name the key tasks. It is a really important procedure before the start of hypotheses testing and statistical modeling, as it gives important insight about what can be done with data and where we should expect problems. For example, a discrete target attribute where the labels are extremely uneven distributed (rare events) should guide our choice for the right modeling and data prep technique. Or if we detect an independent feature that is highly correlated with the target, then this indicates a good candidate for feature selection. Visualizations are the key technique used within exploratory data analysis, which conversely should give us great preconditions for using Tableau during this stage.
Javascript Chart Libraries
In this article we will give you a quick overview about open source javascript chart libraries (mostly D3 based). Just leave a comment if you miss one.
PageRank. A simple example in R
…
Casino Gambling Simulations in R
In this document I will explain why this tactic is flawed via probability theory as well as simulations. As a seperate goal, this document will also help explain simulation and lazy plotting patterns in R.
How to properly present a Data Mining project?
1. Start with big picture
2. Provide process summary
3. Show the main outcome (if you are presenting to business people, translate the outcome into $$)
Don’t invert that matrix
There is hardly ever a good reason to invert a matrix.
‘Don’t invert that matrix’ – why and how
RStudio and GitHub
Version control has become essential for me keeping track of projects, as well as collaborating. It allows backup of scripts and easy collaboration on complex projects. RStudio works really well with Git, an open source open source distributed version control system, and GitHub, a web-based Git repository hosting service. I was always forget how to set up a repository, so here’s a reminder. This example is done on RStudio Server, but the same procedure can be used for RStudio desktop. Git or similar needs to be installed first, which is straight forward to do.