3 Theorems on the Economic Value of Data

• Economic Value of Data Theorem #1: It isn’t the data that’s valuable; it’s the relationships and patterns (insights) gleaned from the data that are valuable.
• Economic Value of Data Theorem #2: It is from the quantification of the relationships and patterns that we can make predictions about what is likely to happen.
• Economic Value of Data Theorem #3: Predictions drive monetization opportunities through improved (optimized) strategic and operational use cases.


Risk is for Real if not Artificial Intelligence

We should see in next couple of years a vast improvement in current state-of-the-art machine learning in cyber-security, payment intelligence and info-security intelligence. Instead, business silently gravitates toward the subtasks that have implicitly performed. Classic example for this in AI is a critical tool to improve customer experience i.e. facial recognition technology, which is 10 to 15 times more accurate than human beings at identifying people. Innovation, which is fueled by advances in computing power and connectivity, the fields of the robotics and artificial intelligence have grown rapidly. With technology advancing at breakneck speeds and demystifying robotics and artificial intelligence with new applications, machinery, and ultra fast process in business, factories and homes in short from teleportation to autonomy.


Trend Analysis of Fragmented Time Series for mHealth Apps: Hypothesis Testing Based Adaptive Spline Filtering Method with Importance Weighting

The growth of mobile devices has provided significant opportunities for developing healthcare apps based on the mobile device ability to collect data. Unfortunately, the data collection is often intermittent. Missing data present significant challenges to trend analysis of time series. Straightforward approaches consisting of supplementing missing data with constant or zero values or with linear trends can severely degrade the quality of the trend analysis. In this paper, we present a robust adaptive approach to discover the trends from fragmented time series. The approach proposed in this paper is based on the HASF (Hypothesis-testing-based Adaptive Spline Filtering) trend analysis algorithm, which can accommodate non-uniform sampling and is therefore inherently robust to missing data. HASF adapts the nodes of the spline based on hypothesis testing and variance minimization, which adds to its robustness. Further improvement is obtained by filling gaps by data estimated in an earlier trend analysis, provided by HASF itself. Three variants for filling the gaps of missing data are considered, the best of which seems to consist of filling significantly large gaps with linear splines matched for continuity and smoothness with cubic splines covering data-dense regions. Small gaps are ignored and addressed by the underlying cubic spline fitting. Finally, the existing measurements are weighted according to their importance by simply transferring the importance of the missing data to their existing neighbors. The methods are illustrated and evaluated using heart rate datasets, blood pressure datasets, and noisy sine datasets.


Convert A NumPy Array To A PyTorch Tensor

Convert a NumPy Array into a PyTorch Tensor so that it retains the specific data type


The three different types of machine learning

If you’re new to machine learning it’s worth starting with the three core types: supervised learning, unsupervised learning, and reinforcement learning. In this tutorial, taken from the brand new edition of Python Machine Learning, we’ll take a closer look at what they are and the best types of problems each one can solve.


Deep convolutional generative adversarial networks with TensorFlow

The concept of generative adversarial networks (GANs) was introduced less than four years ago by Ian Goodfellow. Goodfellow uses the metaphor of an art critic and an artist to describe the two models—discriminators and generators—that make up GANs. An art critic (the discriminator) looks at an image and tries to determine if its real or a forgery. An artist (the generator) who wants to fool the art critic tries to make a forged image that looks as realistic as possible. These two models “battle” each other; the discriminator uses the output of the generator as training data, and the generator gets feedback from the discriminator. Each model becomes stronger in the process. In this way, GANs are able to generate new complex data, based on some amount of known input data, in this case, images. It may sound scary to implement GANs, but it doesn’t have to be. In this tutorial, we will use TensorFlow to build a GAN that is able to generate images of human faces.


Image Convolution in R using Magick

Release 1.4 of the magick package introduces a new feature called image convolution that was requested by Thomas L. Pedersen. In this post we explain what this is all about.


Promises and Closures in R

At the moment I try to improve my knowledge about functional programming in R. Luckily there are some explanations on the topic in the web (adv-r and Cartesian Faith). Beginning to (re)discover the usefulness of closures, I remember some (at first sight) very strange behaviour. Actually it is consistent within the scoping rules of R, but until I felt to be on the same level of consistency it took a while.


Exploring, Clustering, and Mapping Toronto’s Crimes

I have had a lot of fun exploring The US cities’ Crime data via their Open Data portals. Because Toronto’s crime data was simply not available. Not until the summer of this year, Toronto police launch a public safety data portal to increase transparency between the public and officers. I recently have had the chance to explore Toronto’s crimes via Toronto Police Service Public Safety Data Portal. I am particularly interested in Major Crime Indicators (MCI) 2016 which contains a tabular list of 32, 612 reports in 2016 (The only year that the data were made available).


Probabilistic Graphical Models Tutorial – Part 1

A lot of common problems in machine learning involve classification of isolated data points that are independent of each other. For instance, given an image, predict whether it contains a cat or a dog, or given an image of a handwritten character, predict which digit out of 0 through 9 it is. It turns out that a lot of problems do not fit into the above framework. For example, given a sentence, “I like machine learning,” tag each word with its part-of-speech (a noun, a pronoun, a verb, an adjective, etc.). As even this simple example demonstrates, this task cannot be solved by treating each word independently?—?“learning” could be a noun or a verb depending on its context. This task is important for a lot of more complex tasks on text, such as translating from one language to another, text to speech, etc. It is not obvious how you would use a standard classification model to handle these problems. A powerful framework which can be used to learn such models with dependency is probabilistic graphical models (PGM). For this post, the Statsbot team asked a data scientist, Prasoon Goyal, to make a tutorial on this framework to us.


Machine Ethics and Artificial Moral Agents

This article is simply a stream of consciousness on questions and problems I have been thinking and asking myself, and hopefully, it will stimulate some discussion.