Make your Data Talk!

From 0 to Hero in visualization using matplotlib and seaborn


An Overview of Human Pose Estimation with Deep Learning

An introduction to the techniques used in Human Pose Estimation based on Deep Learning.


An Overview of Outlier Detection Methods from PyOD – Part 1

PyOD is an outlier detection package developed with a comprehensive API to support multiple techniques. This post will showcase Part 1 of an overview of techniques that can be used to analyze anomalies in data.


Octoparse: A Revolutionary Web Scraping Software

Octoparse is the ultimate tool for data extraction (web crawling, data crawling and data scraping), which lets you turn the whole internet into a structured format. The newly launched Web Scraping Template makes it very easy even for people with no technical training.


shinyApp(), runApp(), shinyAppDir(), and a fourth option

This title might sounds a little bit weird so let’s being with a little bit of context. It all started with this issue on the {golem} package, which reflects a discussion we previously had inside the team. Also, two weeks ago, I received a tweet on the very same subject, which can be summarised as such: ‘should we use shinyApp() or runApp() when deploying to production?’


Blackman-Tukey Spectral Estimator in R

There are two definitions of the power spectral density (PSD). Both definitions are mathematically nearly identical and define a function that describes the distribution of power over the frequency components in our data set. The periodogram PSD estimator is based on the first definition of the PSD (see periodogram post). The Blackman-Tukey spectral estimator (BTSE) is based on the second definition. The second definition says, find the PSD by calculating the Fourier transform of the autocorrelation sequence (ACS).


Simulating the bias-variance tradeoff in R

In my last blog post, I have elaborated on the Bagging algorithm and showed its prediction performance via simulation. Here, I want to go into the details on how to simulate the bias and variance of a nonparametric regression fitting method using R. These kinds of questions arise here at STATWORX when developing, for example, new machine learning algorithms or testing established ones which shall generalize well to new unseen data.


Association Discovery – the Apriori Algorithm

Association discovery is commonly called Market Basket Analysis (MBA). MBA is widely used by grocery stores, banks, and telecommunications among others. Its results are used to optimize store layouts, design product bundles, plan coupon offers, choose appropriate specials and choose attached mailing in direct marketing. The MBA helps us to understand what items are likely to be purchased together. On-line transaction processing systems often provide the data sources for association discovery.


Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

This is the second post of the series, in which we will talk about a novel Hierarchical Reinforcement Learning built upon HIerarchical Reinforcement learning with Off-policy correction(HIRO) we discussed in the previous post. This post is comprised of two sections. In the first section, we first compared architectures of representation learning for HRL and HIRO; then we started from Claim 4 in the paper, seeing how to learn good representations that lead to bounded sub-optimality and how the intrinsic reward for the low-level policy is defined; we will provide the pseudocode for the algorithm at the end of this section. In section Discussion, we will bring some insight into the algorithm and connect the low-level policy to the probabilistic graphical model to build some intuition.


Building a Search Engine with BERT and TensorFlow

In this experiment, we will use a pre-trained BERT model checkpoint to build a general-purpose text feature extractor.


N-step TD Method

In previous posts, we have been together explored some general reinforcement learning methods, including SARSA, an on-policy updates, where the Q value is updated based on the trajectory the agent takes, and Monte Carlo method, which is generally used to estimate a policy. In this post, we will
• recall SARSA and Monte Carlo method
• explain why these two methods can be unified(in fact, they are the same method with different parameters)
• implement random walk example and compare the effectiveness of different n-step methods


A Beginner’s Guide to Rasa NLU for Intent Classification and Named-entity Recognition

The purpose of this article is to explore the new way to use Rasa NLU for intent classification and named-entity recognition. Since version 1.0.0, both Rasa NLU and Rasa Core have been merged into a single framework. As a results, there are some minor changes to the training process and the functionality available. First and foremost, Rasa is an open source machine learning framework to automate text-and voice-based conversation. In other words, you can use Rasa to build create contextual and layered conversations akin to an intelligent chatbot. In this tutorial, we will be focusing on the natural-language understanding part of the framework to capture user’s intention.


Examining the Transformer Architecture

We’ve taken a look at transformer networks, how and why they are so effective. Currently the state of the art architecture, this area is an active area of NLP research. You should also now have a general idea of what it takes to train a transformer network For a deeper dive into training transformers visit the official transformer implementation in the TensorFlow github repo. We hope you’ve enjoyed this blog series, now get out there and build something awesome!


Swarm intelligence: Inside the ant colony

The whole is greater than the sum of its parts’ is a well-known quote that according to the Gestalt psychology sums up the idea that a system – the whole – is something more complex and different from the aggregation of its basic elements. In other words, the attempt to understand a complex system trying to dissect it into its basic parts would inevitably fail, leaving out something that the single parts cannot express by their own. Leaving aside the Gestalt psychology, the opening quote will be useful to understand the topic of this dissertation: the ant colony optimization algorithm (ACO). The ant society is a well-organized and hierarchical structure with specific roles and codified behavior.