A comparison of machine learning classifiers for energy-efficient implementation of seizure detection

The closed-loop application of electrical stimulation via chronically implanted electrodes is a novel approach to stop seizures in patients with focal-onset epilepsy. To this end, an energy efficient seizure detector that can be implemented in an implantable device is of crucial importance. In this study, we first evaluated the performance of two machine learning algorithms (Random Forest classifier and support vector machine (SVM)) by using selected time and frequency domain features with a limited need of computational resources. Performance of the algorithms was further compared to a detection strategy implemented in an existing closed loop neurostimulation device for the treatment of epilepsy. The results show a superior performance of the Random Forest classifier compared to the SVM classifier and the reference approach. Next, we implemented the feature extraction and classification process of the Random Forest classifier on a microcontroller to evaluate the energy efficiency of this seizure detector. In conclusion, the feature set in combination with Random Forest classifier is an energy efficient hardware implementation that shows an improvement of detection sensitivity and specificity compared to the presently available closed-loop intervention in epilepsy while preserving a low detection delay.


Algojammer is an experimental, proof-of-concept code editor for writing algorithms in Python. It was mainly written to assist with solving the kind of algorithm problems that feature in competitions like Google Code Jam, Topcoder and HackerRank. Algojammer is heavily inspired by (stolen from) the work of Bret Victor, particularly Learnable Programming (2012) and Inventing On Principle (2012), although it only incorporates some of the ideas presented. A longer list of other influences and similar projects is given in Inspiration. The project is not finished, but I am not in a position to work on it at the moment, so I am putting it out there in the hope of inspiring others to contribute, or at least provoke some interesting discussion.

An Intuitive Guide to Interpret a Random Forest Model using fastai library (Machine Learning for Programmers – Part 2)

Machine Learning is a fast evolving field – but a few things would remain as they were years ago. One such thing is ability to interpret and explain your machine learning models. If you build a model and can not explain it to your business users – it is very unlikely that it will see the light of the day. Can you imagine integrating a model into your product without understanding how it works? Or which features are impacting your final result? In addition to backing from stakeholders, we as data scientists benefit from interpreting our work and improving upon it. It’s a win-win situation all around! The first article of this fast.ai machine learning course saw an incredible response from our community. I’m delighted to share part 2 of this series, which primarily deals with how you can interpret a random forest model. We will understand the theory and also implement it in Python to solidify our grasp on this critical concept. As always, I encourage you to replicate the code on your own machine while you go through the article. Experiment with the code and see how different your results are from what I have covered in this article. This will help you understand the different facets of both the random forest algorithm and the importance of interpretability.

Practical Apache Spark in 10 minutes. Part 7 – GraphX and Neo4j

As is known and seen from the series of blog posts, Apache Spark is a powerful tool with many useful libraries (like MLlib and GraphX) which deals with big data. Sometimes you have to work with data that has lots of relationships, and usual SQL databases are not the best option then. However, that’s when NoSQL databases come to action. They are more flexible, and will better represent data with numerous connections. In the previous blog post, we have considered the Apache Spark GraphX tool but to illustrate its possibilities we have used a small graph object. In the real world, the graphs can be much bigger and more complicated. Therefore in this post, we will create a bigger graph, will store it in Neo4j database and will analyze it using Apache Spark GraphX tool, namely Pagerank algorithm. To do this, we are going to load some data from csv files which contain words with corresponding parts of speech (nodes) and edges between them (edges have ‘with’ property which means that words can be often seen in one sentence). At first, let’s prepare data using pandas. Then we will push this data to Neo4j database. And finally, we’ll integrate Neo4j with GraphX to load graph from Neo4j database and analyze it using Spark GraphX.

Making a Map in D3.js v.5

Our objective is to create a map centered on Spain and its neighbouring countries, and add some reasonably big cities onto it. The visualisation will be done in D3.js and the data transformation will be a joint effort between OGR2OGR and R. Prior to launching our D3 playground we need to acquire a base map and a list of locations we want to plot. Datasets obtained from external sources are rarely ‘plot-ready’ and require to be transformed to the right format before use. I’ve collected these under the ‘preparation tasks’ section in this guide. Once we have the data groomed and ready, and a good idea of what we want to visualise, the script is rather straightforward.

Introduction to Deep Learning with Keras

In this article, we’ll build a simple neural network using Keras. We’ll assume you have prior knowledge of machine learning packages such as scikit-learn and other scientific packages such as Pandas and Numpy.

Bootstrap Testing with MCHT

Now that we’ve seen MCHT basics, how to make MCHTest() objects self-contained, and maximized Monte Carlo (MMC) testing with MCHT, let’s now talk about bootstrap testing. Not much is different when we’re doing bootstrap testing; the main difference is that the replicates used to generate test statistics depend on the data we feed to the test, and thus are not completely independent of it. You can read more about bootstrap testing in .

Is the answer to everything Gaussian?

As an applied statistician you get in touch with many challenging problems in need of a statistical solution. Often, your client/colleague already has a working solution and just wants to clarify a small statistical detail with you. Equally often, your intuition suggests you that the working solution is not statistically adequate, but how to substantiate this? As motivating example we use the statistical process control methodology used in Sarma et al. (2018) for monitoring a binomial proportion as part of a syndromic surveillance kit.

Password Cracker – Generating Passwords with Recurrent Neural Networks (LSTMs)

In today’s article, I am going to show you how to exploit Recurrent Neural Networks (LSTMs) for password cracking. I will demonstrate you how easy is to derive and break certain types of passwords. Furthermore, you will find out how to generate an infinite amount of probable passwords, but what’s more important, you will learn how to pick the right passwords to increase the security of your data.

Scaling up with Distributed Tensorflow on Spark

As you may have experienced in the past, or probably will at some point, running out of memory is a very common issue in Data Science. Due to the large facets of businesses, it is not uncommon to create datasets with over 10,000 or more features. We may choose to process such dataset with tree-algorithms. Deeplearning, however, easily engineers features more automatically and process them into a model of your choice. An often occurring problem is figuring how to train your favorite model within a respectful amount while processing such a huge amount of data. Distributed Deeplearning combining Tensorflow and Spark offers a handy set of solutions to this issue.

Is That a Duplicate Quora Question?

I achieved near state-of-the-art accuracy by using a very deep neural net. The code is available here: https://…/is_that_a_duplicate_quora_question Quora released its first ever dataset publicly on 24th Jan, 2017. This dataset consists of question pairs which are either duplicate or not. Duplicate questions mean the same thing.

Transfer learning using the fastai library

I am currently doing a fast.ai Live MOOC called ‘Practical Deep learning for Coders’ which will be publically available in January 2019 on fast.ai website. The following code is based on lesson 1 from that course. I will be using fastai V1 library which sits on top of Pytorch 1.0. The fastai library provides many useful functions that enable us to quickly and easily build neural networks and train our models. I am writing this blog as a part of experimenting the course example on a dataset which is different in structure and complexity and to show how easy it is to use fastai library.

How to spot Data Leakage thanks to Heat Maps (on your laptop!)

Data set and source code needed to generate below pictures are available in this GitHub repository. All the content of this article is reproducible on a laptop which has a 1 Go memory GPU.
In this article, you will learn:
• how to spot Data Leakage on an image classification task, and
• how to fix it (for this given image classification task)

Decoded Entity Embeddings of Categorical Variables in Neural Networks

Categorical variables are known to hide and mask lots of interesting information in a data set and many times they might even be the most important variables in a model. A good data scientist should be capable of handling such variables effectively and efficiently. If you are a smart data scientist, you’d hunt down the categorical variables in the data set, and dig out as much information as you can. The goal of this article is to present an advanced technique called ‘Entity Embeddings’ to deal with categorical variables in Neural Networks. It was found that neural networks with entity embedding generate better results than tree based methods when using the same set of features for structured data.

Tips on Working with Datetime Index in pandas

As you may understand from the title it is not a complete guide on Time Series or Datetime data type in Python. So if you expect to get in-depth explanation from A to Z it’s a wrong place. Seriously. There is a fantastic article on this topic, well explained, detailed and quite straightforward. Don’t waste your time on this one. For those who have reached this part I will tell that you will find something useful here for sure. Again, seriously. I found my notes on Time Series and decided to organize it into a little article with general tips, which are aplicable, I guess, in 80 to 90% of times you work with dates. So it’s worth sharing, isn’t it? I have a dataset with air pollutants measurements for every hour since 2016 in Madrid, so I will use it as an example.

How to Engineer Your Way Out of Slow Models

So you just finished designing that great neural network architecture of yours. It has a blazing number of 300 fully connected layers interleaved with 200 convolutional layers with 20 channels each, where the result is fed as the seed of a glorious bidirectional stacked LSTM with a pinch of attention. After training you get an accuracy of 99.99%, and you’re ready to ship it to production. But then you realize the production constraints won’t allow you to run inference using this beast. You need the inference to be done in under 200 milliseconds. In other words, you need to chop off half of the layers, give up on using convolutions, and let’s not get started about the costly LSTM… If only you could make that amazing model faster!

SRGAN, a TensorFlow Implementation

We’ve all seen that moment in a crime thriller where the hero asks the tech guy to zoom and enhance an image, number plates become readable, pixelated faces become clear and whatever evidence needed to solve the case is found. And, we’ve all scoffed, laughed and muttered something under our breath about how lost information can’t be restored. Not any more. Well, sort of. It turns out the information is only partly lost. In a similar way that as humans we might infer the detail of a blurry image based on what we know about the world, now we can successfully apply the same logic to images to recover ‘photorealistic’ details lost to resolution effects. This is the essence of Super Resolution, unlocking information on the sub pixel scale through a complicated understanding of the translation of low to high resolution images.

Deep Learning Tutorial – Part I: Translate Pictures of Food into Recipes with Deep Learning

In this tutorial I will show how to train deep convolutional neural networks with Keras to classify images into food categories and to output a matching recipe. The dataset contains >400’000 food images and >300’000 recipes from chefkoch.de.

Demystifying Data Science and Machine Learning

This brief tutorial covers instructions for:
• Installing Anaconda for Windows
• Installing Jupyter Notebook, OpenCV, and other required packages
• Implementing image blending using OpenCV