A Kaggler’s Guide to Model Stacking in Practice

Stacking (also called meta ensembling) is a model ensembling technique used to combine information from multiple predictive models to generate a new model. Often times the stacked model (also called 2nd-level model) will outperform each of the individual models due its smoothing nature and ability to highlight each base model where it performs best and discredit each base model where it performs poorly. For this reason, stacking is most effective when the base models are significantly different. Here I provide a simple example and guide on how stacking is most often implemented in practice.

R For Beginners: Some Simple R Code to do Common Statistical Procedures, Part Two

This posting contains an embedded Word document. To view the document full screen click on the icon in the lower right hand corner of the embedded document.

Why you should master R (even if it might eventually become obsolete)

It’s a critical question. If you study data science, but forget everything that you learn, you’ll be in big trouble when you go in for an interview. Or, you’ll be in big trouble if you actually get a data science job, but you’ve forgotten the essential skills. Let me be very clear: you need to know your essential toolkit inside and out. You need to remember your tools, and you need to be able to execute quickly and on command if you want to be a top performer. However, the hard truth that you actually need to “memorize some syntax” stirred up several comments. One comment stood out, because it raised a critical question: why memorize your toolkit, if tools become obsolete.

More on Orthogonal Regression

Some time ago I wrote a post about orthogonal regression. This is where we fit a regression line so that we minimize the sum of the squares of the orthogonal (rather than vertical) distances from the data points to the regression line.

Laying the Foundation for a Data Team

After a hugely popular post by Oliver, our Head of Engineering, ‘Building a Modern Bank Backend’ (which you should definitely check out), I wanted to write about how we’re building our data team. At Monzo, we want to build the best bank account in the world. Today, more than ever before, data is central to creating wonderful customer experiences and efficient internal operations. Wouldn’t it be magical if we could predict the issue somebody is facing even before they contact us? Or are there better ways to assess someone’s loan eligibility than just relying on out-of-date credit scores? We want to build a data team which can support the whole company in making data-driven decisions and products, and which contributes to making us operationally 10x more efficient than banks of the past. By doing so, we want to keep the team as lean and highly-leveraged as possible. Here is our story on that journey so far.