BFDA: A MATLAB Toolbox for Bayesian Functional Data Analysis

We provide a MATLAB toolbox, BFDA, that implements a Bayesian hierarchical model to smooth multiple functional data samples with the assumptions of the same underlying Gaussian process distribution, a Gaussian process prior for the mean function, and an Inverse-Wishart process prior for the covariance function. This model-based approach can borrow strength from all functional data samples to increase the smoothing accuracy, as well as simultaneously estimate the mean-covariance functions. An option of approximating the Bayesian inference process using cubic B-spline basis functions is integrated in BFDA, which allows for efficiently dealing with high-dimensional functional data. Examples of using BFDA in various scenarios and conducting follow-up functional regression are provided. The advantages of BFDA include: (1) simultaneously smooths multiple functional data samples and estimates the mean-covariance functions in a nonparametric way; (2) flexibly deals with sparse and high-dimensional functional data with stationary and nonstationary covariance functions, and without the requirement of common observation grids; (3) provides accurately smoothed functional data for follow-up analysis.


Distance Sampling in R

Estimating the abundance and spatial distribution of animal and plant populations is essential for conservation and management. We introduce the R package Distance that implements distance sampling methods to estimate abundance. We describe how users can obtain estimates of abundance (and density) using the package as well as documenting the links it provides with other more specialized R packages. We also demonstrate how Distance provides a migration pathway from previous software, thereby allowing us to deliver cutting-edge methods to the users more quickly.


Directional Statistics and Filtering Using libDirectional

In this paper, we present libDirectional, a MATLAB library for directional statistics and directional estimation. It supports a variety of commonly used distributions on the unit circle, such as the von Mises, wrapped normal, and wrapped Cauchy distributions. Furthermore, various distributions on higher-dimensional manifolds such as the unit hypersphere and the hypertorus are available. Based on these distributions, several recursive filtering algorithms in libDirectional allow estimation on these manifolds. The functionality is implemented in a clear, well-documented, and object-oriented structure that is both easy to use and easy to extend.


Likert-plots and grouped Likert-plots

I’m pleased to anounce an update of my sjPlot-package, a package for Data Visualization for Statistics in Social Science. Thanks to the help of Alexander, it is now possible to create grouped Likert-plots. This is what I want to show in this post… First, we load the required packages and sample data. We want to plot several items from an index, where all these items are 4-point Likert scales. The items deal with coping of family caregivers, i.e. how well they cope with their role of caring for an older relative.


Comprehensive Introduction to Turing Learning and GANs: Part 2

This is the second part of a 3 part tutorial on creating deep generative models specifically using generative adversarial networks. This is a natural extension to the previous topic on variational autoencoders (found here). We will see that GANs are largely superior to variational autoencoders, but are notoriously difficult to work with.


Outlier Detection and Treatment: A Beginner’s Guide

One of the most important steps in data pre-processing is outlier detection and treatment. Machine learning algorithms are very sensitive to the range and distribution of data points. Data outliers can deceive the training process resulting in longer training times and less accurate models. Outliers are defined as samples that are significantly different from the remaining data. Those are points that lie outside the overall pattern of the distribution. Statistical measures such as mean, variance and correlation are very susceptible to outliers.


How to Generate Prediction Intervals with Scikit-Learn and Python

In this article, we’ll walk through one method of producing uncertainty intervals in Scikit-Learn. The full code is available on GitHub with an interactive version of the Jupyter Notebook on nbviewer. We’ll focus primarily on implementation, with a brief section and resources for understanding the theory at the end. Generating prediction intervals is another tool in the data science toolbox, one critical for earning the trust of non-data-scientists.


Tutorial: Using Deep Learning and CNNs to make a Hand Gesture recognition model

Machine Learning is very useful for a variety of real-life problems. It is commonly used for tasks such as classification, recognition, detection and predictions. Moreover, it is very efficient to automate processes that use data. The basic idea is to use data to produce a model capable of returning an output. This output may give a right answer with a new input or produce predictions towards the known data. The goal of this project is to train a Machine Learning algorithm capable of classifying images of different hand gestures, such as a fist, palm, showing the thumb, and others. This particular classification problem can be useful for Gesture Navigation, for example. The method I’ll be using is Deep Learning with the help of Convolutional Neural Networks based on Tensorflow and Keras. Deep Learning is part of a broader family of machine learning methods. It is based on the use of layers that process the input data, extracting features from them and producing a mathematical model. The creation of this said ‘model’ will be more clear in the next session. In this specific project, we’ll be aiming to classify different images of hand gestures, which means that the computer will have to ‘learn’ the features of each gesture and classify them correctly. For example, if it is given an image of a hand doing a thumbs up gesture, the output of the model needs to be ‘the hand is doing a thumbs up gesture’. Let’s begin.


DeepPiCar – Part 1: How to Build a Deep Learning, Self Driving Robotic Car on a Shoestring Budget

Today, Tesla, Google, Uber, and GM are all trying to create their own self-driving cars that can run on real-world roads. Many analysts predict that within the next 5 years, we will start to have fully autonomous cars running in our cities, and within 30 years, nearly ALL cars will be fully autonomous. Wouldn’t it be cool to build your very own self-driving car using some of the same techniques the big guys use? In this and next few articles, I will guide you through how to build your own physical, deep-learning, self-driving robotic car from scratch. You will be able to make your car detect and follow lanes, recognize and respond to traffic signs and people on the road in under a week. Here is a sneak peek at your final product.


Racist Data? Human Bias is Infecting AI Development

Machine learning algorithms process vast quantities of data and spot correlations, trends and anomalies, at levels far beyond even the brightest human mind. But as human intelligence relies on accurate information, so too do machines. Algorithms need training data to learn from. This training data is created, selected, collated and annotated by humans. And therein lies the problem. Bias is a part of life, and something that not a single person on the planet is free from. There are, of course, varying degrees of bias – from the tendency to be drawn towards the familiar, through to the most potent forms of racism. This bias can, and often does, find its way into AI platforms. This happens completely under the radar and through no concerted effort from engineers. BDJ spoke to Jason Bloomberg, President of Intellyx, a leading industry analyst and author of ‘The Agile Architecture Revolution’, on the dangers that are faced from bias creeping in to AI.


I’m an AI researcher, and here’s what scares me about AI

AI is being increasingly used to make important decisions. Many AI experts say that warnings about sentient robots are overblown, but other harms are not getting enough attention. I agree. I am an AI researcher, and I’m worried about some of the societal impacts that we’re already seeing. In particular, these 5 things scare me about AI:
1. Algorithms are often implemented without ways to address mistakes.
2. AI makes it easier to not feel responsible.
3. AI encodes & magnifies bias.
4. Optimizing metrics above all else leads to negative outcomes.
5. There is no accountability for big tech companies.
At the end, I’ll briefly share some positive ways that we can try to address these.


Random Forest vs AutoML (with python code)

Random Forest versus AutoML you say. Hmmm…, it’s obvious that the performance of AutoML will be better. You will check many models and then ensemble them. This is true, but I would like to show you other advantages of AutoML, that will help you deal with dirty, real-life data.
1. I will show you how to prepare the data and train Random Forest model on Adult dataset with python and scikit-learn.
2. On the same dataset, I will show you how to train Random Forest with AutoML mljar-supervised, which is an open source package developed by me 🙂 You will see how AutoML can make your life easier when dealing with real-life, dirty data.


Tukey, Design Thinking, and Better Questions

Roughly once a year, I read John Tukey’s paper ‘The Future of Data Analysis’, originally published in 1962 in the Annals of Mathematical Statistics. I’ve been doing this for the past 17 years, each time hoping to really understand what it was he was talking about. Thankfully, each time I read it I seem to get something new out of it. For example, in 2017 I wrote a whole talk around some of the basic ideas.