Applying AutoML to Transformer Architectures

Since it was introduced a few years ago, Google’s Transformer architecture has been applied to challenges ranging from generating fantasy fiction to writing musical harmonies. Importantly, the Transformer’s high performance has demonstrated that feed forward neural networks can be as effective as recurrent neural networks when applied to sequence tasks, such as language modeling and translation. While the Transformer and other feed forward models used for sequence problems are rising in popularity, their architectures are almost exclusively manually designed, in contrast to the computer vision domain where AutoML approaches have found state-of-the-art models that outperform those that are designed by hand. Naturally, we wondered if the application of AutoML in the sequence domain could be equally successful.


Interactive Network Visualization with R

Networks are everywhere. We have social networks like Facebook, competitive product networks or various networks in an organisation. Also, for STATWORX it is a common task to unveil hidden structures and clusters in a network and visualize it for our customers. In the past, we used the tool Gephi to visualize our results in network analysis. Impressed by this outstanding pretty and interactive visualization, our idea was to find a way to do visualizations in the same quality directly in R and present it to our customers in an R Shiny app.


An End to End Introduction to GANs

I bet most of us have seen a lot of AI-generated people faces in recent times, be it in papers or blogs. We have reached a stage where it is becoming increasingly difficult to distinguish between actual human faces and faces that are generated by Artificial Intelligence. In this post, I will help the reader to understand how they can create and build such applications on their own. I will try to keep this post as intuitive as possible for starters while not dumbing it down too much.


Anomaly Detection with LSTM in Keras

I read ‘anomaly’ definitions in every kind of contest, everywhere. In this caos the only truth is the variability of this definition, i.e. anomaly explanation is completely releted to the domain of interest. Detection of this kind of behavior is usefull in every business and the difficultness to detect this observations depends on the field of applications. If you are engaged in a problem of anomaly detection, which involves human activity (like prediction of sales or demand), you can take advantages from fundamental assumptions of human behaviors and plan a more efficient solution. This is exactly what we are doing in this post. We try to predict the Taxi demand in NYC in a critical time period. We formulate easy and important assumptions about human behaviors, which will permit us to detect an easy solution to forecast anomalies. All the dirty job is made by a loyalty LSTM, developed in Keras, which makes predictions and detection of anomalies at the same time!


TD3: Learning To Run With AI

This article looks at one of the most powerful and state of the art algorithms in Reinforcement Learning (RL), Twin Delayed Deep Deterministic Policy Gradients (TD3)( Fujimoto et al., 2018). By the end of this article you should have a solid understanding of what makes TD3 perform so well, be capable of implementing the algorithm yourself and use TD3 to train an agent to successfully run in the HalfCheetah environment.


The Data Fabric, Containers, Kubernetes, Knowledge-Graphs, and more…

In the last article we talked about the building blocks of a knowledge-graph, now we will go a step further and learn the basic concepts, technologies and languages we need to understand to actually build it.


Advanced Ensemble Classifiers

Ensemble is a Latin-derived word which means ‘union of parts’. The regular classifiers that are used often are prone to make errors. As much as these errors are inevitable they can be reduced with the proper construction of a learning classifier. Ensemble learning is a way of generating various base classifiers from which a new classifier is derived which performs better than any constituent classifier. These base classifiers may differ in the algorithm used, hyperparameters, representation or the training set. The key objective of the ensemble methods is to reduce bias and variance.


A Game of Words: Vectorization, Tagging, and Sentiment Analysis

Full disclosure: I haven’t watched or read Game of Thrones, but I am hoping to learn a lot about it by analyzing the text. If you would like more background about the basic text processing, you can read my other article. The text from all 5 books can be found on Kaggle. In this article I will be taking the cleaned text and using it to explain the following concepts:
• Vectorization: Bag-of-Words, TF-IDF, and Skip-Thought Vectors
• After Vectorization
• POS tagging
• Named Entity Recognition (NER)
• Chunking and Chinking
• Sentiment Analysis
• Other NLP packages


Data Science as Software: from Notebooks to Tools [Part 3]

This is the final part of the series of how to go on from Jupyter Notebooks to software solutions in Data Science. Part 1 covered the basics of setting up the working environment and data exploration. Part 2 dived deep into data pre-processing and modelling. Part 3 will deal with how you can move on from Jupyter, front end development and your daily work in the code. The overall agenda of the series is the following:
• Setting up your working environment [Part 1]
• Important modules for data exploration [Part 1]
• Machine Learning Part 1: Data pre-processing [Part 2]
• Machine Learning Part 2: Models [Part 2]
• Moving on from Jupyter [Part 3]
• Shiny stuff: when do we get a front end? [Part 3]
• Your daily work in the code: keeping standards [Part 3]


Machine Learning: Recurrent Neural Networks And Long Short Term Memory (LSTM) Python Keras Example

Recurrent neural networks have a wide array of applications. These include time series analysis, document classification, speech and voice recognition. In contrast to feedforward artificial neural networks, the predictions made by recurrent neural networks are dependent on previous predictions. To elaborate, imagine we decided to follow an exercise routine where, every day, we alternate between lifting weights, swimming and yoga. We could then build a recurrent neural network to predict today’s workout given what we did yesterday. For example, if we lifted weights yesterday then we’d go swimming today. More often than not, the problems you’ll be tackling in the real world will be a function of the current state as well as other inputs. For instance, suppose we signed up for hockey once a week. If we’re playing hockey on the same day that we’re supposed to lift weights then we might decide to skip the gym. Thus, our model now has to differentiate between the case when we attended a yoga class yesterday and we’re not playing hockey as well as the case when we attended a yoga class yesterday and we’re playing hockey today in which case we’d jump directly to swimming.


Learning Like Babies

Convolutional Neural Nets (CNNs), a concept that has achieved the greatest performance for image classification, was inspired by the mammalian visual cortex system. In spite of the drastic progress in automated computer vision systems, most of the success of image classification architectures comes from labeled data. The problem is that most of the real world data is not labeled. According to Yann LeCun, father of CNNs and professor at NYU, the next ‘big thing’ in artificial intelligence is semi-supervised learning – a type of machine learning task that makes use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data. That is why recently a large research effort has been focused on unsupervised learning without leveraging a large amount of expensive supervision.


Rethinking the Data Science Life Cycle for the Enterprise

While the technology and tools used by data scientists have grown dramatically, the data science lifecycle has stagnated. In fact, little has changed between the earliest versions of CRISP-DM created over 20 years ago and the more recent lifecycles offered by leading vendors such as Google, Microsoft, and DataRobot. Most versions of the data science lifecycle still address the same set of tasks: understanding the business problem, understanding domain data, acquiring and engineering data, model development and training, and model deployment and monitoring (see Figure 1). But enterprise needs have evolved as data science has become embedded in most companies. Today, model reproducibility, traceability, verifiability has become a fundamental requirement for data science in large enterprises. Unfortunately, these requirements are omitted or significantly underplayed in leading AI/ML lifecycles.


Deep Learning for Sentiment Analysis

Sentiment Analysis is a classic example of machine learning, which (in a nutshell) refers to: ‘A way of ‘learning’ that enables algorithms to evolve.’ This ‘learning’ means feeding the algorithm with a massive amount of data so that it can adjust itself and continually improve.’ Sentiment analysis is the automated process of understanding an opinion about a given subject from written or spoken language. In a world where we generate 2.5 quintillion bytes of data every day, sentiment analysis has become a key tool for making sense of that data. This has allowed companies to get key insights and automate all kind of processes.


Regression with Regularization Techniques.

The article assumes that you have some brief idea about the regression techniques that could predict the required variable from a stratified and equitable distribution of records in a dataset that are implemented by a statistical approach. Just kidding! All you need is adequate math to be able to understand basic graphs. Before entering the topic, a little brush up…


K-Medoids Clustering Using ATS: Unleashing the Power of Templates

k-medoids clustering is a classical clustering machine learning algorithm. It is a sort of generalization of the k-means algorithm. The only difference is that cluster centers can only be one of the elements of the dataset, this yields an algorithm which can use any type of distance function whereas k-means only provably converges using the L2-norm.


How to stop training a neural-network using callback?

Often, when training a very deep neural network, we want to stop training once the training accuracy reaches a certain desired threshold. Thus, we can achieve what we want (optimal model weights) and avoid wastage of resources (time and computation power). In this brief tutorial, let’s learn how to achieve this in Tensorflow and Keras, using the callback approach, in 4 simple steps.
Advertisements