The seven people you need on your data team
1. The Handyman
2. The Open Source Guru
3. The Data Modeler
4. The Deep Diver
5. The Storyteller
6. The Snoop
7. The Privacy Wonk

How to assess quality and correctness of classification models? Part 4 – ROC Curve
The ROC curve is one of the methods for visualizing classification quality, which shows the dependency between TPR (True Positive Rate) and FPR (False Positive Rate).

Journey through the layers of the mind
An artificial neural network can be thought of as analogous to a brain (immensely, immensely, immensely simplified. nothing like a brain really). It consists of layers of neurons and connections between neurons. Information is stored in this network as ‘weights’ (strengths) of connections between neurons. Low layers (i.e. closer to the input, e.g. ‘eyes’) store (and recognise) low level abstract features (corners, edges, orientations etc.) and higher layers store (and recognise) higher level features. This is analogous to how information is stored in the mammalian cerebral cortex (e.g. our brain).

10 Key Tips for Entry-Level Analytics Professionals
1. Complete an internship
2. Get experience with large, real-world data sets
3. Try Kaggle competitions
4. Get a SAS certification
5. Create a complete LinkedIn profile
6. Look into an advanced degree
7. Familiarize yourself with the industry
8. Research companies
9. Read job descriptions
10. Network, network, network!

The state of assertions in R
Assertion’ is computer-science jargon for a run-time check on your code. In R , this typically means function argument checks (‘did they pass a numeric vector rather than a character vector into your function?’), and data quality checks (‘does the date-of-birth column contain values in the past?’).

SparkR with Rstudio in Ubuntu 12.04
Welcome to the blog post! It’s been long time since I wrote last post. I was recently searching about big data with R and I found sparkR package. Few months back I heard about it and it was a separate project on github. Databricks is actively working on sparkR package. They officially announced its integration with Apache spark. In this post, I will discuss about how to configure sparkR with Rstudio in Ubuntu 12.04 and get started using it.

Building an Instructional Simulation App in Shiny
Shiny makes it easy for R users to develop responsive, R-powered web applications. As you probably know, either from your own initial forays into Shiny or from the Shiny Tutorial, creating simple apps is no problem, and probably you have some ideas for teaching apps that could be written using just the tools developed in the Tutorial. But some teaching apps appear to be quite complex. Consider, for example, this app which aims to introduce the student to the χ 2 Test for Goodness of Fit. The app takes the user through a simulation process, keeping track of the results of simulations as they accumulate, permitting the viewer to consider the results from several points of view, and allowing the viewer to start over, perhaps with new data. The aim of this tutorial is to take you step-by-step through the construction of a reasonably full-featured simulation app that lets students explore, through simulation, the coverage properties of the classical t-intervals for a population mean. After completing the tutorial you will be able to write your own simulation apps—hopefully having been spared some of the struggle that I went through when I first learned Shiny in the Spring of 2014.

The Nature of Heterogeneity in Coevolving Networks of Customers and Products
The output is straightforward once you understand what nonnegative matrix factorization (NMF) is trying to accomplish. All matrix factorizations, as the name implies, attempt to identify ‘simpler’ matrices or factors that will reproduce approximately the original data matrix when multiplied together. Simpler, in this case, means that we replace the many observed variables with a much smaller number of latent variables. The belief is that these latent variables will simultaneously account for both row cliques and column microgenres as they coevolve.