Joining Multiple Sources 101: Inner and Outer Joins
Mashing up multiple data sources to generate a single source of truth is an integral part of data analysis. It allows you to compare and cross-reference records stored in different formats and locations, and to perform queries and calculations. This article will run you through some basic concepts in data analysis that you should become familiar with when joining data stored in multiple tables.
Best Practices: Combine AdWords with Google Analytics for Better Insights, Bidding and Results
We’ve put together a new Best Practices guide, Better Together: AdWords and Google Analytics, to help you get deep insight into your performance. When you analyze performance with the combination of GA and AdWords you can find all sorts of actionable info:
• Which parts of your account drive actual on-site engagement
• Which keywords attract new users to your site
• What messaging and landing pages connect with the different users on your site
• How your business compares across your entire industry
More on Quadratic Progamming in R
This post is another tour of quadratic programming algorithms and applications in R. First, we look at the quadratic program that lies at the heart of support vector machine (SVM) classification. Then we’ll look at a very different quadratic programming demo problem that models the energy of a circus tent. The key difference between these two problems is that the energy minimization problem has a positive definite system matrix whereas the SVM problem has only a semi-definite one. This distinction has important implications when it comes to choosing a quadratic program solver and we’ll do some solver benchmarking to further illustrate this issue.
Make your R plots interactive
As a part of my daily job, I draw scatterplots, lots of them. And because there are thousands of genes expressed in any mouse or human tissue, my typical plot looks something like this (code). (Actually, it is a comparison of variance that can be attributed to “sex” factor in mRNA vs. protein expression.)
Code as Magic, and the Vernacular of Data Wrangling Verbs
Someone who I think has a great take on conceptualising the data wrangling process – in part arising from his prolific tool building approach in the R language – is Hadley Wickham. His recent work for RStudio is built around an approach to working with data that he’s captured as follows (e.g. “dplyr” tutorial at useR 2014 , Pipelines for Data Analysis):