Three reasons to choose the right Machine Learning algorithm

There is no ‘one fits all’ algorithm. There are lots of different algorithms that must work with training data sets of different types, volume and accuracy. The job of a data scientist is to choose the right algorithm that fits the data and the underlying truths, utilizing their experience and professional knowledge. 1. Not every algorithm works with every kind of data 2. Run time complexity grows with the number of data points 3. Low amounts of data bring poor results

Elitist shuffle for recommendation systems

In today’s high pace user experience it is expected that new recommended items appear every time the user opens the application, but what to do if your recommendation system runs every hour or every day I give a solution that you can plug & play without having to re-engineer your recommendation system. card shuffling The common practice to update recommended items is to have the recommendation system re-score the available items every period of time T. This means that for a whole period T, the end-user faces the same content in the application’s entry screen. In today’s high pace user experience if T is even a few hours, let alone a day, the user can get bored of the same content displayed every time it opens the application during the period T. There can be many ways this scenario can happen but imagine the user opens the application and doesn’t like the recommended items and is too lazy or busy to scroll or search for something else. If the user opens the application again some minutes later to find exactly the same content as before this might have a big (negative) impact on the retention for this user. An obvious solution to this problem is to shuffle the content in such a way that it remains relevant to the user while new content appears on the screen each time the user re-opens the application.

Python Import Statement and the Most Important Built-in Modules for Data Scientists

So far we have worked with the most essential concepts of Python: variables, data structures, built-in functions and methods, for loops, and if statements. These are all parts of the core semantics of the language. But this is far from everything that Python knows… actually this is just the very beginning and the exciting stuff is yet to come. Because Python also has tons of modules and packages that we can import into our projects… What does this mean In this article I’ll give you an intro: I’ll show you the Python import statement and the most important built-in modules that you have to know for data science!

Early draft of the book: ‘Feature Engineering and Selection: A Practical Approach for Predictive Models’

Kjell and I are writing another book on predictive modeling, this time focused on all the things that you can do with predictors. It’s about 60% done and we’d love to get feedback. You cna take a look at and provide feedback at https://…/issues.

Update: Can we predict flu outcome with Machine Learning in R

Since I migrated my blog from Github Pages to blogdown and Netlify, I wanted to start migrating (most of) my old posts too – and use that opportunity to update them and make sure the code still works. Here I am updating my very first machine learning post: Can we predict flu deaths with Machine Learning and R . Changes are marked as bold comments.

Talking Machines

Talking Machines is your window into the world of machine learning. Your hosts, Katherine Gorman and Neil Lawrence, bring you clear conversations with experts in the field, insightful discussions of industry news, and useful answers to your questions. Machine learning is changing the questions we can ask of the world around us. Here, we explore how to ask the best questions and what to do with the answers.

Causality in Machine Learning

Given recent advances and interest in machine learning, those of us with traditional statistical training have had occasion to ponder the similarities and differences between the fields. Many of the distinctions are due to culture and tooling, but there are also differences in thinking which run deeper. Take, for instance, how each field views the provenance of the training data when building predictive models. For most of ML, the training data is a given, often presumed to be representative of the data against which the prediction model will be deployed, but not much else. With a few notable exceptions, ML abstracts away from the data generating mechanism, and hence sees the data as raw material from which predictions are to be extracted. Indeed, machine learning generally lacks the vocabulary to capture the distinction between observational data and randomized data that statistics finds crucial. To contrast machine learning with statistics is not the object of this post (we can do such a post if there is sufficient interest). Rather, the focus of this post is on combining observational data with randomized data in model training, especially in a machine learning setting. The method we describe is applicable to prediction systems employed to make decisions when choosing between uncertain alternatives.