No more jumping between applications. Mode Studio combines a SQL editor, Python and R notebooks, and a visualization builder in one platform.
The data science community, Kaggle, recently announced the Google Analytics Customer Revenue Prediction competition. The competition uses data from the Google Merchandise store, and the challenge is to create a model that will predict the total revenue per customer. I’m using the Kaggle competition as an opportunity to show how data science can be used in digital marketing to answer a specific question, and take what is learned from the data and apply it to marketing strategies. In my approach to the competition, I am using the Business Science Problem Framework (BSPF), a methodology for applying data science to business problems. The BSPF was developed by Matt Dancho of Business Science. Business Science has an online university for learning data science on a pro level. I learned the BSPF methodology from their course, Data Science for Business with R.
Data Analysis Expressions are a collection of functions that can be used to perform a task and return one or more values. Although this sounds very similar to any other programming language, DAX is only a formula or a query language. DAX was developed around 2009 by Microsoft to be used with Microsoft’s PowerPivot, which at that time was available as an Excel (2010) add-in. It is extremely popular today as it is now the language of choice for Power BI and is supported by Tabular SSAS as well. Since it is widely being used in Power BI, I have concentrated on DAX for Power BI in this article but, can be applied to other applicable tools.
A couple of months ago, I wrote here on Medium an article on mapping the UK’s traffic accidents hot spots. I was mostly concerned about illustrating the use of the DBSCAN clustering algorithm on geographical data. In the article, I used geographical information published by the UK government on reported traffic accidents. My purpose was to run a density-based clustering process to find the areas where traffic accidents are reported most frequently. The end result was the creation of a set of geo-fences representing these accident hot spots. By collecting all points in a given cluster, you can get an idea on how the cluster would look like on a map, but you will lack an important piece of information: the cluster’s outer shape. In this case, we are talking about a closed polygon that can be represented on a map as a geo-fence. Any point inside the geo-fence can be assumed to belong to the cluster, which makes this shape an interesting piece of information: you can use it as a discriminator function. All newly sampled points that fall inside the polygon can be assumed to belong to the corresponding cluster. As I hinted in the article, you could use such polygons to assert your driving risk, by using them to classify your own sampled GPS location.
Clustering is one of the most well known techniques in Data Science. From customer segmentation to outlier detection, it has a broad range of uses, and different techniques that fit different use cases. In this blog post we will take a look at hierarchical clustering, which is the hierarchical application of clustering techniques.
Tensorflow is great. Really, you can do everything imaginable. You can turn zebras into horses with it. However, Tensorflow’s code examples generally tend to gloss over to get data into your model: they either sometimes naively assume that someone else did the hard work for you and serialized the data into Tensorflow’s native format, or showcase unreasonably slow methods that would have a GPU idling away with shockingly low performance. Also oftentimes the code is very hacky and difficult to follow. So I thought it might be useful to show a small, self contained example that handles both training and efficient data pipelining on a nontrivial example.