Visualizing Patterns in Complex Data

Dartmouth College researchers have published a free Python software package called HyperTools that allows users to turn complex data into 3D shapes or animations. The tool allows users to visualize patterns in their data and compare the characteristics of different datasets, which in turn could inform researchers on how to train their machine learning algorithms by illuminating differences between groups of data. Additionally, the Dartmouth researchers have published tutorials for HyperTools and a gallery of examples, such as how to plot the text of State of the Union addresses, to help users create visualizations.

Selection of Great Data Science Articles still Worth Reading

These articles are between 3 and 5 year old, but are still valuable today. The methodology used in these articles is modern, and still state-of-the-art today. Some discuss immense data sets still available to the public, and that resulted in designing new machine learning techniques to handle them.

Machine Learning and Its Algorithms to Know – MLAlgos

Describing and picturing MLAlgos and Machine Learning is the main idea of this post. I will attempt to answer few basic questions as well. Though these questions have been answered many a times in the past and are widely available. Answering them again here from my very own experience on the ground may makes the difference though rather then simply answering from phd or scholar books material prospective.

A “quick” introduction to PyMC3 and Bayesian models, Part I

In my last post, I talked about Bayesian Statistics and how they can help us quantify uncertainty in data analysis. In this post, I give a “brief”, practical introduction using a specific and hopefully relatable example drawn from real data.

Numerical Modality

The mode is one of the basic statistics which is defined as the most common value over an array. When the values of the array are categorical, the mode is easy to detect by selecting the one with the most occurrence. The problem of identifying the modes on a numerical array is harder since the values can be continuous and therefore count the occurrences by value is not enough, so the distribution of these values must be checked in order to identify the most probable values. However, numerical arrays can be multi-modal which reduces the problem to finding local maxima on the distribution instead of the global maximum where only one mode is present.

Custom R charts coming to Excel

This week at the BUILD conference, Microsoft announced that Power BI custom visuals will soon be available as charts with Excel. You’ll be able to choose a range of data within an Excel workbook, and pass those data to one of the built-in Power BI custom visuals, or one you’ve created yourself using the API. Excel custom visuals Since you can create Power BI custom visuals using R, that means you’ll be able to design a custom R-based chart, and make it available to people using Excel — even if they don’t know how to use R themselves. There also many pre-defined custom visuals available, including some familiar R charts like decision trees, calendar heatmaps, and hexbin scatterplots.

Visualizing graphs with overlapping node groups

I recently came across some data about multilateral agreements, which needed to be visualized as network plots. This data had some peculiarities that made it more difficult to create a plot that was easy to understand. First, the nodes in the graph were organized in groups but each node could belong to multiple groups or to no group at all. Second, there was one “super node” that was connected to all other nodes (while “normal” nodes were only connected within their group). This made it difficult to find the right layout that showed the connections between the nodes as well as the group memberships. However, digging a little deeper into the R packages igraph and ggraph it is possible to get satisfying results in such a scenario.