Using Semantic Web technologies in the development of data warehouses: A systematic mapping

The exploration and use of Semantic Web technologies have attracted considerable attention from researchers examining data warehouse (DW) development. However, the impact of this research and the maturity level of its results are still unclear. The objective of this study is to examine recently published research articles that take into account the use of Semantic Web technologies in the DW arena with the intention of summarizing their results, classifying their contributions to the field according to publication type, evaluating the maturity level of the results, and identifying future research challenges. Three main conclusions were derived from this study: (a) there is a major technological gap that inhibits the wide adoption of Semantic Web technologies in the business domain;(b) there is limited evidence that the results of the analyzed studies are applicable and transferable to industrial use; and (c) interest in researching the relationship between DWs and Semantic Web has decreased because new paradigms, such as linked open data, have attracted the interest of researchers.


PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers.

The Data Journalism Handbook

When you combine the sheer scale and range of digital information now available with a journalist’s ‘nose for news’ and her ability to tell a compelling story, a new world of possibility opens up. Explore the potential, limits, and applied uses of this new and fascinating field.

Introducing the new Data Journalism Handbook

Interested in data journalism and where it’s going? Today is the launch of the first part of the new Data Journalism Handbook. We announced that we would be supporting a new version of the Data Journalism Handbook with the European Journalism Centre last year. Since then the editors, Liliana Bounegru and Jonathan Gray – who edited the first version with Lucy Chambers – have been hard at work bringing together journalists and academics to present a state of the data journalism nation right now. The book will be published online today and in print next year. Today’s publication includes 21 chapters from writers around the world, from China to Cuba to the US and UK.

Extracting information from a picture, round 1

This week, I wanted to get information I found on the nice map, below. I could not get access to the original dataset, per zip code… and I was wondering, if (assuming that the map was with high resolution) it was actually possible to extract information, using a simple R function…

Extracting information from a picture, round 2

Yesterday, I published a post on extracting information from a picture, but it did not work as expected. I claimed that it was because of the original graph I had. More precisely, the was based on some weird projection, and I could not reconcile. So I decide to cheat a little be, by creating my own map,

Here are the most popular Python IDEs / Editors

We report on the most popular IDE and Editors, based on our poll. Jupyter is the favorite across all regions and employment types, but there is competition for no. 2 and no. 3 spots.

A comprehensive list of Machine Learning Resources: Open Courses, Textbooks, Tutorials, Cheat Sheets and more

A thorough collection of useful resources covering statistics, classic machine learning, deep learning, probability, reinforcement learning, and more.

The Machine Learning Project Checklist

In an effort to further refine our internal models, this post will present an overview of Aurélien Géron’s Machine Learning Project Checklist, as seen in his bestselling book, ‘Hands-On Machine Learning with Scikit-Learn & TensorFlow.’
1. Frame the problem
2. Get the data
3. Explore the data
4. Prepare the data
5. Model the data
6. Fine-tune the models
7. Present the solution
8. Launch the ML system

Automated Dashboard Visualizations with Ranking in R

In this article, you learn how to make Automated Dashboard Visualizations with Ranking in R. First you need to install the rmarkdown rmarkdown package into your R library. Assuming that you installed the rmarkdown rmarkdown , next you create a new rmarkdown rmarkdown script in R.

Shinyfit: Advanced regression modelling in a shiny app

Many of our projects involve getting doctors, nurses, and medical students to collect data on the patients they are looking after. We want to involve many of them in data analysis, without the requirement for coding experience or access to statistical software. To achieve this we have built Shinyfit, a shiny app for linear, logistic, and Cox PH regression.

Network Centrality in R: An Introduction

Research involving networks has found its place in a lot of disciplines. From the social sciences to the natural sciences, the buzz-phrase ‘networks are everywhere’, is everywhere. One of the many tools to analyze networks are measures of centrality. In a nutshell, a measure of centrality is an index that assigns a numeric values to the nodes of the network. The higher the value, the more central the node (A more thorough introduction is given in the extended tutorial). Related ideas can already be found in Moreno’s work in the 1930s, but most relevant ground work was done in the late 1940s to end 1970s. This period includes the work of Bavelas and Leavit on group communication experiments and of course the seminal paper ‘Centrality in Social Networks: Conceptual Clarification’ of Linton Freeman. If you work, or intend to work, with centrality and haven’t read it, I urge you to do so. Freeman does not mince matters and goes all-in on criticizing contemporary work of that time. He calls existing indices ‘nearly impossible to interpret’, ‘hideously complex’, ‘seriously flawed’, ‘unacceptable’, and ‘arbitrary and uninterpretable’. His rant culminates in the following statement.

Maximum Likelihood Estimation: How it Works and Implementing in Python

Previously, I wrote an article about estimating distributions using nonparametric estimators, where I discussed the various methods of estimating statistical properties of data generated from an unknown distribution. This article covers a very powerful method of estimating parameters of a probability distribution given the data, called the Maximum Likelihood Estimator.