Data Science “Paint by the Numbers” with the Hypothesis Development Canvas

The Business Model Canvas forces organizations to ‘paint in’ your business’s value proposition, cost structures, revenue streams, supplier network and customer segments. Overnight, everyone can become Jack Welch! Now you are ready to take the next step from a Big Data MBA perspective by building off of the Business Model Canvas to flesh out the business use cases – or hypothesis – which is where we can become more effective at leveraging data and analytics to optimize our the business. That next step involves the newly created Hypothesis Development canvas.

Time series visualizations with wind turbine energy data in R

One of the sectors with a huge demand for data science/analysis is the energy sector. A branch of this sector where demand is high is the green wind energy turbine sector. In this analysis, you will learn to do a time series wind turbine analysis in R.

Causal Inference Book

My colleague Jamie Robins and I are working on a book that provides a cohesive presentation of concepts of, and methods for, causal inference. Much of this material is currently scattered across journals in several disciplines or confined to technical articles. We expect that the book will be of interest to anyone interested in causal inference, e.g., epidemiologists, statisticians, psychologists, economists, sociologists, political scientists, computer scientists… The book is divided in 3 parts of increasing difficulty: causal inference without models, causal inference with models, and causal inference from complex longitudinal data.

Self Learning AI-Agents Part II: Deep Q-Learning

A mathematical Guide through Deep-Q Learning. In this second part of the multi-part series on Deep Reinforcement Learning I will present you an effective approach to how AI agents can learn to behave in environments with discrete action spaces.

Introducing Tensor Shape Annotation Library: tsalib

tsalib is a library to allow labeling tensor variables with their shapes directly in the code, as first-class type annotations (Python 3). Works with arbitrary tensor libraries. Explicit shape annotations accelerate debugging of Deep Learning programs + improve developer productivity, code readability. Writing deep learning programs which manipulate tensors (e.g., using numpy, pytorch, tensorflow, keras ..) requires you to carefully keep track of shapes of tensor variables. Carrying around the shapes in your head gets increasingly hard as you write more complex programs. For example, when creating a new RNN cell or designing a new kind of attention mechanism or trying to do a surgery of non-trivial pre-trained architectures (resnet101, densenet). Unfortunately, there is no principled way of shape tracking inside code?-?most developers resort to writing adhoc comments embedded in code to keep track of tensor shapes.

Feature Selection Techniques in Machine Learning with Python

We all may have faced this problem of identifying the related features from a set of data and removing the irrelevant or less important features with do not contribute much to our target variable in order to achieve better accuracy for our model. Feature Selection is one of the core concepts in machine Learning which hugely impacts the performance of your model. The data features that you use to train your machine learning models have a huge influence on the performance you can achieve. Irrelevant or partially relevant features can negatively impact model performance. Feature selection and Data cleaning should be the first and most important step of your model designing. In this post, you will discover feature selection techniques that you can use in Machine Learning.

Why random forests outperform decision trees

Random forests consist of multiple single trees each based on a random sample of the training data. They are typically more accurate than single decision trees. The following figure shows the decision boundary becomes more accurate as more trees are added.

Demystifying Cross-Entropy

Some of us might have used the cross-entropy for calculating classification losses and wondered why we use the natural logarithm. Some might have seen the binary cross-entropy and wondered whether it is fundamentally different from the cross-entropy or not. If so, reading this article should help to demystify those questions. The word ‘cross-entropy’ has ‘cross’ and ‘entropy’ in it, and it helps to understand the ‘entropy’ part to understand the ‘cross’ part. So, let’s review the entropy formula.

Confused by The Confusion Matrix Part 2 (‘Accuracy’ is But One of Many Measures of Accuracy…)

In the previous post, I explained the concept of interpreting a confusion matrix by clarifying all the different terms that actually mean the same thing. Unfortunately, the confusion does not end there. If you finished reading the previous post, you should have probably come to the realisation that a simple Hit Rate or False Alarm Rate is not actually very informative, because each rate looks at a different portion of the confusion matrix. Because of that, different fields have come up with different measures of accuracy that try to incorporate information from different portions of the confusion matrix. This post attempts to cover the more commonly known ones.

Review: FSRCNN (Super Resolution)

This time, FSRCNN, by CUHK, is reviewed. In this paper, a real-time super resolution approach is proposed. Fast Super-Resolution Convolutional Neural Network (FSRCNN) has been published in 2016 ECCV with nearly 300 citations when I was writing this story. (SH Tsang @ Medium) FSRCNN has a relatively shallow network which makes us easier to learn about the effect of each component. It is even faster with better reconstructed image quality than the previous SRCNN as the figure below.

Leveraging your data in a Post-GDPR world

Search, social media, cloud computing, and business analytics have largely fulfilled their promises in developing our economies’ global players. To protect our own digital investments, it is of little wonder that we are pouring resources today into addressing GDPR. These efforts of regulatory compliance may none-the-less prove to be misguided, for your corporate data will never be worth more than the confidence your stakeholders have in your data practices. In the digital economy, competitive advantage is tied to understanding the primacy of trust rather than that of data. Let’s explore the nature of the challenge, to what extent GDPR is the answer, and how companies can harness ‘Trust by Design’ in preparing for a post-GDPR world.

Simulated Annealing For Clustering Problems: Part 1

In this post, I will try to explain how Simulated Annealing (AI algorithm), which is a probabilistic technique for approximating the global optimum of a given function can be used in clustering problems. First of all, I want to explain what Simulated Annealing is, and in the next part, we will see a code along article which is an implementation of this Research Paper. Simulated Annealing (SA) is widely used in search problems (ex: finding the best path between two cities) where the search space is discrete(different and individual cities). A wonderful explanation with an example can be found in this book written by Stuart Russel and Peter Norvig. AIMA.

On Generalized Bellman Equations and Temporal-Difference Learning

We consider off-policy temporal-difference (TD) learning in discounted Markov decision processes, where the goal is to evaluate a policy in a model-free way by using observations of a state process generated without executing the policy. To curb the high variance issue in off-policy TD learning, we propose a new scheme of setting the ? ? -parameters of TD, based on generalized Bellman equations. Our scheme is to set ? ? according to the eligibility trace iterates calculated in TD, thereby easily keeping these traces in a desired bounded range. Compared with prior work, this scheme is more direct and flexible, and allows much larger ? ? values for off-policy TD learning with bounded traces. As to its soundness, using Markov chain theory, we prove the ergodicity of the joint state-trace process under nonrestrictive conditions, and we show that associated with our scheme is a generalized Bellman equation (for the policy to be evaluated) that depends on both the evolution of ? ? and the unique invariant probability measure of the state-trace process. These results not only lead immediately to a characterization of the convergence behavior of least-squares based implementation of our scheme, but also prepare the ground for further analysis of gradient-based implementations.

Random Forests, Decision Trees, and Categorical Predictors: The ‘Absent Levels’ Problem

One advantage of decision tree based methods like random forests is their ability to natively handle categorical predictors without having to first transform them (e.g., by using feature engineering techniques). However, in this paper, we show how this capability can lead to an inherent ‘absent levels’ problem for decision tree based methods that has never been thoroughly discussed, and whose consequences have never been carefully explored. This problem occurs whenever there is an indeterminacy over how to handle an observation that has reached a categorical split which was determined when the observation in question’s level was absent during training. Although these incidents may appear to be innocuous, by using Leo Breiman and Adele Cutler’s random forests FORTRAN code and the randomForest R package (Liaw and Wiener, 2002) as motivating case studies, we examine how overlooking the absent levels problem can systematically bias a model. Furthermore, by using three real data examples, we illustrate how absent levels can dramatically alter a model’s performance in practice, and we empirically demonstrate how some simple heuristics can be used to help mitigate the effects of the absent levels problem until a more robust theoretical solution is found.

Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions

Kernel mean embeddings have become a popular tool in machine learning. They map probability measures to functions in a reproducing kernel Hilbert space. The distance between two mapped measures defines a semi-distance over the probability measures known as the maximum mean discrepancy (MMD). Its properties depend on the underlying kernel and have been linked to three fundamental concepts of the kernel literature: universal, characteristic and strictly positive definite kernels. The contributions of this paper are three-fold. First, by slightly extending the usual definitions of universal, characteristic and strictly positive definite kernels, we show that these three concepts are essentially equivalent. Second, we give the first complete characterization of those kernels whose associated MMD-distance metrizes the weak convergence of probability measures. Third, we show that kernel mean embeddings can be extended from probability measures to generalized measures called Schwartz-distributions and analyze a few properties of these distribution embeddings.

Markov Blanket and Markov Boundary of Multiple Variables

Markov blanket (Mb) and Markov boundary (MB) are two key concepts in Bayesian networks (BNs). In this paper, we study the problem of Mb and MB for multiple variables. First, we show that Mb possesses the additivity property under the local intersection assumption, that is, an Mb of multiple targets can be constructed by simply taking the union of Mbs of the individual targets and removing the targets themselves. MB is also proven to have additivity under the local intersection assumption. Second, we analyze the cases of violating additivity of Mb and MB and then put forward the notions of Markov blanket supplementary (MbS) and Markov boundary supplementary (MBS). The properties of MbS and MBS are studied in detail. Third, we build two MB discovery algorithms and prove their correctness under the local composition assumption. We also discuss the ways of practically doing conditional independence tests and analyze the complexities of the algorithms. Finally, we make a benchmarking study based on six synthetic BNs and then apply MB discovery to multi-class prediction based on a real data set. The experimental results reveal our algorithms have higher accuracies and lower complexities than existing algorithms.