Keras Hyperparameter Tuning in Google Colab using Hyperas

Hyperparameter Tuning is one of the most computationally expensive tasks when creating deep learning networks. Luckily, you can use Google Colab to speed up the process significantly. In this post, I will show you how you can tune the hyperparameters of your existing keras models using Hyperas and run everything in a Google Colab Notebook.

Achieving the Machine Learning Dream: Interpretability and Performance in a Single Model

Machine learning is a discipline full of frictions and tradeoffs but none more important like the balance between accuracy and interpretability. In principle, highly accurate machine learning models such as deep neural networks tend to be really hard to interpret while simpler models like decision trees fall short in many sophisticated scenarios. Conventional machine learning wisdom tell us that accuracy and interpretability are opposite forces in the architecture of a model but its that always the case? Can we build models that are both highly performant and simple to understand? Researchers from IBM recently published a paper that proposes a statistical method for improving the performance of simpler machine learning models using the knowledge from more sophisticated models.

How to Build a Market Simulator Using Markov Chains and Python

In this article, I aim to introduce you (regardless of your technical ability) to Markov chains and use it to simulate customer behavior. This isn’t going to be a traditional ‘how to’ tutorial with code snippets every two lines. My primary aim in writing this is to provide you with a conceptual framework that you can flexibly use, so you do not necessarily have to code along to learn something new. Technical details will pop up here and there, but I will provide as much intuition for them as possible.

Art of Vector Representation of Words

Expressing power of notations used to represent a vocabulary of a language has been a great deal of interest in the field of linguistics. Languages in practice have semantic ambiguity. ‘John kissed his wife, and so did Sam’. Sam kissed John’s wife or his own? These ambiguities must be handled in order to represent information in true form. So, how do we develop a machine level understanding for language modelling task. Classical count based or similarity parameter based methods have existed for quite a long time in Computer Science. But, now with the advent of deep learning models like RNNs this expressing power of language modelling is of great interest for creating efficient system to store information and finding relations between vocabulary terms. This will act as fundamental block in encoder-decoder models like seq-to-seq model. In this article we will explore different representation system of words and their power of expression, from the classical NLP to modern Neural Network approaches. In the end we will conclude with hands on coding exercises and explanations.

Explainable Artificial Intelligence (Part 2) – Model Interpretation Strategies

This article in a continuation in my series of articles aimed at ‘Explainable Artificial Intelligence (XAI)’. If you haven’t checked out the first article, I would definitely recommend you to take a quick glance at ‘Part I – The Importance of Human Interpretable Machine Learning’ which covers the what and why of human interpretable machine learning and the need and importance of model interpretation along with its scope and criteria. In this article, we will be picking up from where we left off and expand further into the criteria of machine learning model interpretation methods and explore techniques for interpretation based on scope. The aim of this article is to give you a good understanding of existing, traditional model interpretation methods, their limitations and challenges. We will also cover the classic model accuracy vs. model interpretability trade-off and finally take a look at the major strategies for model interpretation.

Local Outlier Factor for Anomaly Detection

Local Outlier Factor (LOF) is a score that tells how likely a certain data point is an outlier/anomaly.
• LOF = 1: no outlier
• LOF >> 1: outlier
First, I introduce a parameter k which is the number of neighbors the LOF calculation is considering. The LOF is a calculation that looks at the neighbors of a certain point to find out its density and compare this to the density of other points later on. Using a right number k isn’t straight forward. While a small k has a more local focus, i.e. looks only at nearby points, it is more erroneous when having much noise in the data. A large k, however, can miss local outliers.

Machine Learning for the Average Person: What are the types of machine learning?

At a high-level, machine learning is simply the study of teaching a computer program or algorithm how to progressively improve upon a set task that it is given. On the research-side of things, machine learning can be viewed through the lens of theoretical and mathematical modeling of how this process works. However, more practically it is the study of how to build applications that exhibit this iterative improvement. There are many ways to frame this idea, but largely there are three major recognized categories: supervised learning, unsupervised learning, and reinforcement learning.

Neural Collaborative Filtering

In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neu- ral networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in rec- ommendation | collaborative ltering | on the basis of implicit feedback. Although some recent work has employed deep learning for recommendation, they primarily used it to model auxil- iary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative ltering | the interaction between user and item features, they still resorted to matrix factor- ization and applied an inner product on the latent features of users and items. By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural network- based Collaborative Filtering. NCF is generic and can ex- press and generalize matrix factorization under its frame- work. To supercharge NCF modelling with non-linearities, we propose to leverage a multi-layer perceptron to learn the user{item interaction function. Extensive experiments on two real-world datasets show signi cant improvements of our proposed NCF framework over the state-of-the-art methods. Empirical evidence shows that using deeper layers of neural networks o ers better recommendation performance.

TagOverflow – Correlating Tags in Stackoverflow

In this post I want to show how to use the Jaccard procedure in the Neo4j-Graph-Algorithms library to achieve that. We imported the whole dump of StackOverflow into Neo4j, ran the algorithms and then visualized the results using Neo4j Browser and the large graph visualization tool Graphistry. In September we had the opportunity of running the GraphConnect ‘Buzzword-Bingo’ GraphHack in the offices of StackOverflow in New York which was really cool. I was impressed that the folks there really went through with what Joel Spolsky published many years ago as a better office layout for software companies (and probably other companies too). Lots of open space and rooms to discuss but a private, hexagon, glass-walled office for every team member, an individualized thinking and working place.

Data Retention: Handling Data with Many Missing Values and Less Than 1000 Observations

The data used in the current project contains a number of diagnostic measures of type 2 diabetes in women of the Pima Indian heritage, and whether or not the individual has type 2 diabetes. The dataset was obtained from Kaggle at (https://…/pima-indians-diabetes-database ). There is a total of 768 observations and 9 variables.

How to Make Data-Driven Decisions with Contextual Bandits

In 2018, there is a huge amount of pressure in virtually every industry to make data-driven decisions. In the era of powerful machine learning algorithms, this is more than a good idea; this is survival. But how do we apply ‘predictive analytics’ to make decisions? So what if we can predict customer churn or the likelihood of conversion? Does a good prediction really bring value to your organization?

Become Data-Driven or Perish: Why your company needs a Data Strategy and not just more Data People

For the last fourteen years, I have worked with Data in one way or another. I started out as a Management Information Systems Manager (ABN Amro), a great title but I basically downloaded PDF reports and manually entered them into a spreadsheet to produce a Daily Financial Report, and through the years I have worked as a Business Intelligence Manager (ING Bank, Rabobank, Delta Lloyd), Data Analyst (Microsoft), Data Scientist (Adyen, De Bijenkorf) and now as the Head of Data at a Dutch-based Payments Technology Startup (Dimebox). For all my experience, the thing that stuck with me the most are the times that my work actually made an impact. Of course, when you work in a company that has over 15,000 employees, your impact is very small compared to when you are in a 30-person startup, but knowing that the work that you did helps to make a decision, is probably one of the best parts of working with data. But as data and especially Data Science and Analytics has become the latest hot topic, it surprises me how many companies, rush in trying to attract Data Scientist, Data Engineers, Machine Learning Engineers, Artificial Intelligence Engineers, but never stop to think about their Data Strategy.

Interpreting Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (graphical and quantitative) to better understand data. It is easy to get lost in the visualizations of EDA and to also lose track of the purpose of EDA. EDA aims to make the downstream analysis easier. To put EDA in context, the Data Science steps are: Obtain data, Clean and load data; Exploratory Data Analysis; Model building; Model evaluation; Data visualization and presentation The Objectives of EDA are to discover underlying patterns, spot anomalies, frame the hypothesis and check assumptions with the aim to find a good fitting model (if one exists). At a more granular level, EDA involves understanding the relationship between variables including determining relationships among the explanatory variables; assessing the relationships between explanatory and outcome variables (direction and rough size estimates); the presence of outliers; a ranking of the important explanatory variables; conclusions as to whether individual explanatory variables are statistically significant. In this post, we present a systematic approach to EDA (based on the sources listed below) to present EDA techniques in a concise manner.

Common mistakes when carrying out machine learning and data science

We examine typical mistakes in Data Science process, including wrong data visualization, incorrect processing of missing values, wrong transformation of categorical variables, and more. Learn what to avoid!

Assessing progress in automation technologies

In this post, I share slides and notes from a keynote Roger Chen and I gave at the Artificial Intelligence conference in London in October 2018. We presented an overview of the state of automation technologies: we tried to highlight the state of the key building block technologies and we described how these tools might evolve in the near future.

Animating Your Data Visualizations Like a Boss Using R

When telling a data-driven story, few things grab attention more than motion. Our eyes are naturally drawn to bright colors and movement. A good visualization captures the interest of the audience and makes an impression. Fortunately, making such displays has become quite simple with a few easy to use R packages. All of the visualization examples in this article can be forklifted and run from the MatrixDS project here. I do assume you have some familiarity with basic plotting.

Predicting the task duration based on a range

In one of the previous entries we have established a statistical model for predicting the actual project time and cost based on the estimates. We discussed that we can fit the estimates (both for the Agile and Waterfall projects) to a Log-Normal distribution, which guarantees the positive support. Using statistical approach to estimation allows us to give prediction with a required confidence level, and also project monetary benefits, costs and risk, as we discussed in another post.

ML Intro 6: Reinforcement Learning for non-Differentiable Functions

This post will follows this series, but represents a significant transition point. We are now done with the problem of teaching supervised learning basics, and are expanding to handle non-differentiable problems (Motivating deep reinforcement learning).

Bayesian Nonparametric Models in NIMBLE, Part 2: Nonparametric Random Effects

NIMBLE is a hierarchical modeling package that uses nearly the same language for model specification as the popular MCMC packages WinBUGS, OpenBUGS and JAGS, while making the modeling language extensible – you can add distributions and functions – and also allowing customization of the algorithms used to estimate the parameters of the model. Recently, we added support for Markov chain Monte Carlo (MCMC) inference for Bayesian nonparametric (BNP) mixture models to NIMBLE. In particular, starting with version 0.6-11, NIMBLE provides functionality for fitting models involving Dirichlet process priors using either the Chinese Restaurant Process (CRP) or a truncated stick-breaking (SB) representation of the Dirichlet process prior. We will illustrate NIMBLE’s BNP capabilities using two examples. In a previous post, we showed how to use nonparametric mixture models with different kernels for density estimation. In this post, we will take a parametric generalized linear mixed model and show how to switch to a nonparametric representation of the random effects that avoids the assumption of normally-distributed random effects.