Wolkenkit-Boards is a tool for collaboratively organizing notes. It allows you to mount public and private boards and attach notes and images to them. Its backend is powered by wolkenkit. Wolkenkit is a CQRS and event-sourcing framework for JavaScript and Node.js. wolkenkit uses an event-driven model based on DDD to setup an API for your business in no time. This way, wolkenkit bridges the language gap between your domain and technology.

Understanding Scoring Propensity: A Mixed Model Approach to Evaluating NBA Players

Who’s the best scorer in the NBA?’ is a question that comes up a lot during conversations with my friends. Names like Kevin Durant, James Harden, and Steph Curry always come up. It’s often difficult to come up with a single answer; the question becomes more nuanced when distinctions are made within scorers. How do we distinguish talent when taking into context the different situations in which players score?

Travis CI for R – Advanced guide Continuous integration for building an R project in Travis CI

Travis CI is a common tool to build R packages. It is in my opinion the best platform to use R in continuous integration. Some of the most downloaded R packages built at this platform. These are for example testthat, magick or covr. I also built my package RTest at this platform. During the setup I ran into some trouble. The knowledge I gained I’m sharing with you in this guide.

TimeSeries Data Munging – Lagging Variables that are Distributed Across Multiple Groups

Modeling time series data can be challenging, so it makes sense that some data enthusiasts (including myself) put off learning this topic until they absolutely have to. Before you can apply machine learning models to time series data, you have to transform it to an ‘ingestible’ format for your models, and this often involves calculating lagged variables, which can measure auto-correlation i.e. how past values of a variable influence its future values, thus unlocking predictive value.

Speech recognition is hard – Part 1

Speech is the most natural form of communication for us?-?it’s second nature to us. And now, our machines have started to recognize our speech and they’re getting better and better at communicating with us. Current voice assistants and devices like Amazon Alexa and Google Home are getting more and more popular each month?-?they are changing how we shop, how we search, how we interact with our devices and even each other.

Speech and music classification using spectrogram based statistical descriptors and extreme learning machine

This article proposes a novel feature extraction approach for speech/music classification based on generalized Gaussian distribution descriptors extracted from IIR-CQT spectrogram representation. IIR-CQT spectrogram visual representation provides superior temporal resolution at high frequencies and better spectral resolution for low frequencies compared to the conventional short-time Fourier transform analysis which provides uniform frequency resolution. Multi-level decomposition of the spectrogram image is then performed using the Nonsubsampled Contourlet Transform (NSCT) which a fully shift-invariant, multi-scale, and multi-direction expansion that can preserve the edges of the textural pattern of speech and music. The generalized Gaussian distribution (GGD) parameters are produced using maximum likelihood estimation (MLE) from the NSCT subbands to create the image feature descriptor. Chaos crow search algorithm is employed to chose the most relevant feature sub-set and to discard redundant features and finally the extreme learning machine classifier categorizes input audio segment into speech/music. The experimental results show that the proposed feature descriptor is effective and performs better compared to the existing approaches in the speech/music classification. In addition, mismatched training and testing results are also presented.

Reinforcement Learning (Part 1) – The Mario Bros Example

Imagine a world where every computer system is customized specifically to your own personality. It learns the nuances of how you communicate and how you wish to be communicated with. Interacting with a computer system becomes more intuitive than ever and technological literacy sky rockets. These are the potential outcomes you could see in a future where reinforcement learning is the norm. In this article, we are going to break down reinforcement learning and dissect some of the components that come together to make up a reinforcement learning system.

Playing Blackjack using Model-free Reinforcement Learning in Google Colab!

A comparative study of algorithms like Monte-Carlo Control and Temporal-Difference Control used to solve games like Blackjack.

pcLasso: a new method for sparse regression

I’m excited to announce that my first package has been accepted to CRAN! The package pcLasso implements principal components lasso, a new method for sparse regression which I’ve developed with Rob Tibshirani and Jerry Friedman. In this post, I will give a brief overview of the method and some starter code. (For an in-depth description and elaboration of the method, please see our arXiv preprint. For more details on how to use the package, please see the package’s vignette.)

Mathematics for Data Science

Learning the theoretical background for data science or machine learning can be a daunting experience, as it involves multiple fields of mathematics, and a long list of online resources. In this piece, my goal is to suggest resources to build the mathematical background necessary to get up and running in data science practical/research work. These suggestions are derived from my own experience in the data science field, and following up with the latest resources suggested by the community.

Making computers understand the sentiment of tweets

Understanding whether a tweet is meant as positive or negative is something humans rarely have problems with. For computers, however, it is an entirely different story?-?complicated sentence structure, sarcasm, figurative language etc. make it difficult for computers to judge the meaning and sentiment of a sentence. However, automatically assessing the sentiment of a tweet would allow for large-scale opinion-mining of the population on all sorts of issues and could help understanding why certain groups of the population hold certain opinions. On a more fundamental level, understanding the sentiment of text is a key part of natural language understanding and thus an essential task to solve if we want computers to be able to communicate efficiently with us. In this blog post, I will present the results of a small research project carried out as part of the SoBigData project at the University of Sheffield. We tested different approaches to processing text and analysed how much of the sentiment they are able to pick up. Read on for a full tour of the project and the results!

Machine Learning Security

As more and more systems leverage ML models in their decision-making processes, it will become increasingly important to consider how malicious actors might exploit these models, and how to design defenses against those attacks. The purpose of this post is to share some of my recent learnings on this topic.

Machine Learning – Perfection always starts with mistakes

There are a lot of articles out there explaining common mistakes new Data Scientists make, focusing mainly on practices but not on the Machine Learning (ML) process itself. This article is going to cover just that: What kind of mistakes a Data Scientist can make in the ML pipeline and a few ways to address them…

Logical Positivism and the Scientific Method in Genetic Algorithmics

The genetic algorithm owes its form to biomimicry, not derivation from first principles. So, unlike the workings of conventional optimization algorithms, which are typically apparent from the underlying mathematical derivations, the workings of the genetic algorithm require elucidation. Attempts to explain how genetic algorithms work can be divided in two: those based to a lesser or greater extent on the scientific method, and those that reject the scientific method in favor of logical positivism.