The guide to quickly learn (AWS) Cloud Computing in R Programming
Cloud computing is becoming a natural extension for problems / data sets bigger than what laptops and desktops can process. However, for complete starters, getting started on cloud computing platform can look more difficult than what it is. Below in an infographic displaying the concept of cloud computing, its importance and setup using R programming and RStudio. Since, we term it as a ‘quick mode’ for learning this concept, chances are you might miss out on conceptual explanations. Yet, not to worry, you can check out the full guide on How to get started with Cloud Computing in R?
Python: Learning about deep learning through album cover classification
In the past month, I’ve spent some time on my album cover classification project. The goal of this project is for me to learn about deep learning by working on an actual problem. This post covers my progress so far, highlighting lessons that would be useful to others who are getting started with deep learning.
Why xkcd-style graphs are important
The rough, seemingly hand drawn nature of the graph provides a visual hint as to the imprecision of the results.
Graphs in the world: Modeling systems as networks
Networks of all kinds drive the modern world. You can build a network from nearly any kind of data set, which is probably why network structures characterize some aspects of most phenomenon. And yet, many people can’t see the networks underlying different systems. In this post, we’re going to survey a series of networks that model different systems in order to understand different ways networks help us understand the world around us. We’ll explore how to see, extract, and create value with networks. We’ll look at four examples where I used networks to model different phenomenon, starting with startup ecosystems and ending in network-driven marketing.
R financial time series tips everyone should know about
There are many R time series tutorials floating around on the web this post is not designed to be one of them. Instead I want to introduce a list of the most useful tricks I came across when dealing with financial time series in R. Some of the functions presented here are incredibly powerful but unfortunately buried in the documentation hence my desire to create a dedicated post. I only address daily or lower frequency times series. Dealing with higher frequency data requires specific tools: data.table or highfrequency packages are some of them.
Regression with Multicollinearity Yields Multiple Sets of Equally Good Coefficients
The multiple regression equation represents the linear combination of the predictors with the smallest mean-squared error. That linear combination is a factorization of the predictors with the factors equal to the regression weights. You may see the words ‘factorization’ and ‘decomposition’ interchanged, but do not be fooled. The QR decomposition or factorization is the default computational method for the linear model function lm() in R. We start our linear modeling by attempting to minimize least square error, and we find that a matrix computation accomplishes this task fast and accurately. Regression is not unique for matrix factorization is a computational approach that you will rediscover over and over again as you add R packages to your library (see Two Purposes for Matrix Factorization).
A dynamic programming solution to A/B test design
For this article we are assigning two different advertising message to our potential customers. The first message, called ‘A’, we have been using a long time, and we have a very good estimate at what rate it generates sales (we are going to assume all sales are for exactly $1, so all we are trying to estimate rates or probabilities). We have a new proposed advertising message, called ‘B’, and we wish to know does B convert traffic to sales at a higher rate than A?