How do you know if your model is going to work? Part 4: Cross-validation techniques

When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this concluding Part 4 of our four part mini-series “How do you know if your model is going to work?” we demonstrate cross-validation techniques.

How do you know if your model is going to work?

Our four part article series collected into one piece.

Parsing a large amount of characters into a POSIXct object

When trying to parse a large amount of datetime characters into POSXIct objects, it struck me that strftime and as.POSIXct where actually quite slow. When using the parsing functions from lubridate, these where a lot faster. The following benchmark shows this quite nicely.

13 Amazing Applications / Uses of Data Science Today

One of the questions people ask me commonly is: Is Big Data / Data Science really a buzz or a once in a life time opportunity? Different people have different answers and view points to the question above. I don’t want to get into this debate here. I am rather taking a safer approach here. I would tell you a few applications which are already impacting a lay man’s life. You can read them for yourself and decide whether this is a buzz or an opportunity.

Predicting winners of the Rugby World Cup

For the sake of brevity, not all the relevant data and code are displayed in this post but can rather be found here. And you can visit the final working web application here.

An excerpt from the book: “The Master Algorithm”

Pedro Domingos’ new book, The Master Algorithm, is a readable overview of machine learning. The author discerns and describes five main schools of thought in the field: symbolists, connectionists, evolutionaries, Bayesians and analogizers. Here’a a piece about how Bayesians fit their models, that is, infer parameter values. Even though the context is Bayes nets, the described method is applicable to almost any model.

Spark SQL for Real Time Analytics – Part Two

Apache Spark is the hottest topic in Big Data. Part 2 of this covers basic concepts of Stream Processing and how Spark handles stream processing and for Real Time Analytics and for next frontier, Internet of Things.

Applications of R at EARL 2015

• AstraZeneca
• Allianz
• KPMG
• Allstate
• Douwe Egberts
• Atass Sports

Minimum, Median