Bayes Theorem in One Picture

Bayes’ Theorem is a way to calculate conditional probability. The formula is very simple to calculate, but it can be challenging to fit the right pieces into the puzzle. The first challenge comes from defining your event (A) and test (B); The second challenge is rephrasing your question so that you can work backwards: turning P(A|B) into P(B|A). The following image shows a basic example involving website traffic.

The Ultimate Technical Skill in Data Visualization for Data Scientists

Anyone working with data must be able to visualize it. With the recent advance in user interface (UI) technologies, expectations about the UI increased considerably. I advocate that common frameworks for data scientists to create rich web applications encourage a frustrating development experience and that learning a few selected basic skills in web development will improve drastically the development experience.

Using Gitlab’s CI for Periodic Data Mining

One of the most time-consuming and difficult stages in a standard data science development pipeline is creating a dataset. In the case where you have already been provided with a dataset kudos for you! You have just saved yourself a good amount of time and effort. Still though, on many occasions that would not be the case. As a matter of fact, the data mining stage can be one of the most demotivating periods in your project timeline. Thus it is always a plus when there are simple and easy techniques to mine the data required. That being said, in this post I will be giving describing how GitLab’s CI pipelines can be used for periodic data mining jobs without the need of storage buckets, VPSes, external servers and so forth. So without further ado let’s dive in the tutorial.

Accessing Google Spreadsheet Data using Python

As you all are familiar with importing, exporting and manipulating comma separate files (CSV) using Python, Hereby in this article I’m going to show you the step by step guide to access Google Spreadsheets on the cloud using Python. As the very first thing, go to Google API Manager by simply googling it and go to

Beginning to Replicate Natural Conversation in Real Time

To start my new project, the first thing I of course have to do is run through the current research and state of the art models. I was interviewed recently in which I explain this new project with Wallscope but in short (extremely short): I aim to step towards making conversational agents more natural to talk with. I have by no means exhausted all literature in this field, I have barely scratched the surface (link relevant papers below if you know of any I must read). Here is an overview of some of this research and the journey towards more natural conversational agents.

tsbox 0.1: class-agnostic time series

The R ecosystem knows a vast number of time series classes: ts, xts, zoo, tsibble, tibbletime or timeSeries. The plethora of standards causes confusion. As different packages rely on different classes, it is hard to use them in the same analysis. tsbox provides a set of tools that make it easy to switch between these classes. It also allows the user to treat time series as plain data frames, facilitating the use with tools that assume rectangular data.

Animations with Matplotlib

Animations are an interesting way of demonstrating a phenomenon. We as humans are always enthralled by animated and interactive charts rather than the static ones. Animations make even more sense when depicting time series data like stock prices over the years, climate change over the past decade, seasonalities and trends since we can then see how a particular parameter behaves with time.

Historical Word Embeddings & Lexical Semantic Change

I have developed a Git Hub guide that demonstrates a simple workflow for sampling Google n-gram data and building historical word embeddings with the aim of investigating lexical semantic change. Here, we build on this workflow, and unpack some methods presented in Hamilton, Leskovec, and Jurafsky (2016) & Li et al. (2019) for aligning historical matrices/embeddings and visualizing semantic change.

Introducing graphlayouts with Game of Thrones

This post introduces the new R package graphlayouts which is available on CRAN since a few days. We will use network data from the Game of Thrones TV series (seemed timely at the time of writing) to illustrate the core layout algorithms of the package. Most of the algorithms use stress majorization as its basis, which I described in more detail in and older post. Here, I will only focus on the practical aspects of the package.

Polyaxon v0.4.3: stable – Improved dashboard, setup, and documentation

Today, we are pleased to announce the v0.4.3 release, a stable version with improved functionalities and documentation. This release also marks the anniversary of several clusters running non-stop for over year, the adoption of the platform by several Fortune 500 companies, and millions of experiments scheduled and tracked by Polyaxon since the initial release (based on the anonymous opt-in metrics reporting). Polyaxon is now moving towards semantic versioning, and will provide versioned documentation and migrations notes from one version to another starting from the next release v0.5.

Use AI for Augmenting Yourself

Now just imagine that our virtual personalities are able to talk, think, choose, carry out certain actions characteristic of us. Digital profiles born from the online personality patterns related directly to us. The augmented world with our 3D intelligent profiles that have our personality, that we can command knowing that it is a trusted representative of us, that like everything we like, our second identity which lives in the virtual world. We will have endless possibilities to use our virtual profiles anywhere in the digital space. And the most wonderful thing is that there will be a huge use for this, our world will change once and for all.

How to easily automate R analysis, modeling and development work using CI/CD, with working examples

In this post, we will show a quick and simple way to automate R data analysis and package development checking, testing and installation with GitLab CI/CD and provide example files that can be used for testing packages and deploying blogdown-based websites.

XGBoost: Predicting Life Expectancy with Supervised Learning

Today we’ll use XGBoost Boosted Trees for regression over the official Human Development Index dataset. Who said Supervised Learning was all about classification?

Discriminating network for Classification

How I have used Siamese network to build a classifier with very few images.

(Robot) data scientists as a service

Automating data science with symbolic regression and probabilistic programming.

Avoiding Obvious Insights Using Analyze With Insight Miner

Analyze with Insight Miner delivers value for every business user with machine learning. Learn how it was created from Sisense Data Scientist, Ayelet Arditi.

Introduction to Factor Analysis in Python

In this tutorial, you’ll earn the basics of factor analysis and how to implement it in python.

The Futuristic World of Autonomous ERP – In Realistic Terms!’ by Duy Nguyen

If you have followed major ERP vendors’ technology roadmap announcements for the past two years, the term ‘autonomous ERP’ is used quite often to promote self-executing business processes as a future capability. What I am not hearing is how to determine the trust level needed in order for organizations to allow the autonomous ERP to take over. Let’s take a look at how the US Department of Transportation’s National Highway Traffic Safety Administration (NHTSA) defined five different levels of maturity when it comes to autonomous driving. Keep in mind each level of autonomy describes the system and not the company that is using it.