Automated Machine Learning Project Implementation Complexities

To demonstrate the implementation complexity differences along the AutoML highway, let’s have a look at how 3 specific software projects approach the implementation of just such an AutoML ‘solution,’ namely Keras Tuner, AutoKeras, and automl-gs.

Graph Neural Ordinary Differential Equations

Extending Graph Neural Networks into a continuous depth domain. Multi-agent systems are prevalent across a variety of scientific fields: from physics to robotics, game-theory, finance and molecular biology, among others. Often, closed – form analytic formulations are not available and forecasting or decision making tasks have to rely on noisy, irregularly sampled observations. This class of systems offers a crystal clear example of inductive relational biases. Introducing inductive biases in statistics or machine learning is a well known approach to improving sample efficiency and generalization performance. From the choice of objective function, to the design of ad – hoc deep learning architectures suited to the specific problem at hand, biases are common and effective. Relational inductive biases represent a special class of biases, concerned with relationship between entities. Graphical models, probabilistic or otherwise, are a traditional class of models specialized in imposing relational biases in the form of prior structures on entities. These graph structures are useful in different ways; namely reducing computational complexity by introducing conditional independence assumptions and increasing sample efficiency by encoding prior knowledge in graph – form. Graph neural networks (GNNs) are the deep learning counterpart to graphical models. They are usually utilized when the target problem structure can be encoded as a graph or in settings where prior knowledge about relationships among input entities can itself be described as a graph. GNNs have shown remarkable results in various application areas such as node classification, graph classification and forecasting as well as generative tasks .

Probabilistic Matrix Factorization

You are given a design matrix X, how would you predict Y?’ This question is at the heart of supervised learning and regression based tasks. This question assumes perfect knowledge of the design matrix X. In X, we store records in the rows and features describing these records in the columns. But what happens when this matrix is corrupted? What happens when some observations are missing?

Generalized Linear Models – Introduction

This article explores the background of linear models, their limitations and the idea behind the generalized linear models and what makes them generalized.

Deep Multi-Input Models Transfer Learning for Image and Word Tag Recognition

A multi-models deep learning approach for image and text understanding. With the advancement of deep learning such as convolutional neural network (i.e., ConvNet) [1], computer vision becomes a hot scientific research topic again. One of the main goals of computer vision nowadays is to use machine learning (especially deep learning) to train computers to gain human-level understanding from digital images, texts, or videos.

Simple tutorial for distilling BERT

BERT and transformers, in general, is a completely new step in NLP. It was introduced by Google in 2018 and since then it has shown state-of-the-art results in different language understanding tasks. Check out their paper and results at https://…/bert-pre-training-of-deep-bidirectional. But there is a little fly in the ointment. It is hard to use BERT in production. BERT-base contains 110M parameters. The larger variant BERT-large contains 340M parameters. Such large neural networks are problematic in practice. Due to the large numbers of parameters, it’s very difficult to deploy BERT in resource-restricted systems such as mobile devices. Additionally, low-inference time makes it useless in real-time systems. That’s why finding ways to make it faster is so important. When I faced transformers for the first time it was very tempting to try them for routine tasks. Text classification was one of them. But how to overcome the limitations I wrote above? In this post, I want to show a simple, but effective way to train a task-specific classification model that performs on the same level as the BERT-based model.

On Semantic Search

took me a long time to realise that search is the biggest problem in NLP. Just look at Google, Amazon and Bing. These are multi-billion dollar businesses possible only due to their powerful search engines. My initial thoughts on search were centered around unsupervised ML, but I participated in Microsoft Hackathon 2018 for Bing and came to know the various ways a search engine can be made.

An Introduction to Graph Theory

Before diving into graph theory, we need to understand what data structure and networks within machine learning. Networks are a useful data structure to map a range of applications from driving directions to social networks.

A Doomed Marriage of ML and Agile

Mistake #1: We changed the target variable mid-way because of a misguided definition of the core business question. This led to data leakage that ruined our model performance. We had to re-code ~35% of our codebase and spend ~2 weeks re-trying different data transformation and model architectures. It was a nightmare.
Mistake #2: We prioritized low impact features because our sub-questions weren’t well thought out. Well, some of these features were innovative, but not useful in retrospect. This wasted valuable development time, which is extremely important in a time-boxed consulting setting.
Mistake #3: We changed our evaluation metrics. We had to re-design our model architecture and re-run the hyper-parameter search. This required new test cases. We had to run many regression tests manually.

Digging Deeper into Metric Learning with Loss Functions

Recent advancements in deep learning have made it possible to learn a similarity measure for a set of images using a deep metric learning network that maps visually similar images onto nearby locations in an embedding manifold, and visually dissimilar images apart from each other. Deep features learned using this approach result in well discriminative features with compact intra-product variance and well separated inter-product differences, which are key to have better visual search engines. In addition, learning such discriminative features enables the network to generalize well on unseen product images, which end up forming new clusters in the embedding space.