What Can We Learn from the Apps on Your Smartphone? Topic Modeling and Matrix Factorization
An alternative approach might treat this as a form of topic modeling with your phone as a document and app usage levels as frequency of word occurrences. Topics are the latent variables that generate the pattern of app co-occurrences (e.g., similar interests, needs or networks). The R package stm for Structural Topic Modeling may provide a gentle introduction for the social scientist, although Latent Dirichlet Allocation (LDA) still remains one step beyond the statistical training for most. This, of course, will change as more researchers are motivated to learn the mathematics given the promise of easier to use R packages.

Using Neo4j Spatial with Mapbox / Leaflet.js to search for businesses by location
A common use-case for database queries is to search for things that are close to other things or within some specified geospatial boundary. Geospatial indexes and queries are offered by NoSQL databases, such as MongoDB and relational databases such as PostgreSQL. But what about graph databases? In this article, I show how to create a web application to search within a user-defined boundary powered by the Neo4j graph database.

Ontotext: Creating Actionable Insights from Life Sciences and Healthcare Data, May 14
Life sciences and healthcare organizations sit on mountains of structured and unstructured information. This webinar shows how semantic technology and text analytics help get value from this information, with a focus on Data Modeling, Data Mining, and Data Fusion.

Python: Equivalent to flatMap for Flattening an Array of Arrays
I found myself wanting to flatten an array of arrays while writing some Python code earlier this afternoon and being lazy my first attempt involved building the flattened array manually:

How to Analyze Your Predictable Data: Anomaly Detection
There is predictable data as far as the eye can see. Millions of variables quietly tracing the path we thought, and perhaps hoped, they would. Because there are so many, noticing when one of these variables does something unexpected is a task that is unsolvable by diligence alone. In order to spot these rare unexpected observations, we need an often-overlooked statistical analysis: anomaly detection.

Using Python on Azure Machine Learning Studio
AzureML is the cloud hosted machine learning platform on top of Microsoft’s cloud platform. Readers of Data Science Central will realize that AzureML have hosted a few webinars about their platform. This tutorial will walk you through integrating Python with AzureML.

Most Viewed Big Data Videos on YouTube
The top Big Data YouTube videos by those like Hortonworks and Kirk D. Borne cover diverse topics including Hadoop, Big Data Trends, Deep Learning, and Big Data Leadership.

Regression Models: It’s Not Only About Interpretation
But my post was not complete: I was simply plotting the prediction obtained by some model. And it ‘looked like’ the regression was nice, but so were the random forrest, the -nearest neighbour and boosting algorithm. What if we compare those models on new data? Here is the code to create all the models (I did include another one, some kind of benchmark, where no covariates are included), based on 1,000 simulated values