A Must-Read NLP Tutorial on Neural Machine Translation – The Technique Powering Google Translate

The beauty of language transcends boundaries and cultures. Learning a language other than our mother tongue is a huge advantage. But the path to bilingualism, or multilingualism, can often be a long, never-ending one. There are so many little nuances that we get lost in the sea of words. Things have, however, become so much easier with online translation services (I’m looking at you Google Translate!). I have always wanted to learn a language other than English. I tried my hand at learning German (or Deutsch), back in 2014. It was both fun and challenging. I had to eventually quit but I harboured a desire to start again.


A gentle introduction to Apache Arrow with Apache Spark and Pandas

This time I am going to try to explain how can we use Apache Arrow in conjunction with Apache Spark and Python. First, let me share some basic concepts about this open source project.


31 Statistical Concepts Explained in Simple English – Part 8

This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, decision trees, ensembles, correlation, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, cross-validation, model fitting, and many more. To keep receiving these articles, sign up on DSC.


Universal Method to Sort Complex Information Found

The nearest neighbor problem asks where a new point fits into an existing data set. A few researchers set out to prove that there was no universal way to solve it. Instead, they found such a way.


The Face of (Dis)Agreement – Intraclass Correlations

I was recently introduced to Google Dataset Search, an extension that searches for open access datasets. There I stumbled upon this dataset on childrens’ and adult’s ratings of facial expressions. The data comes from a published article by Vesker et al. (2018). Briefly, this study involved having adults and 9-year-old children rate a series of 48 faces on two dimensions of emotion, valence (positive vs. negative) and arousal (activated vs. deactivated) (see my previous post for more info on valence and arousal). The authors made some interesting observations about differences in childrens’ and adult’s ratings of these facial expressions.


Review: DRRN – Deep Recursive Residual Network (Super Resolution)

In this story, DRRN (Deep Recursive Residual Network) is reviewed. With Global Residual Learning (GRL) and Multi-path mode Local Residual Learning (LRL), plus the recursive learning to control the model parameters while increasing the depth, up to 52 layers can be achieved. And DRRN significantly outperforms state-of-the-art approaches such as SRCNN, FSRCNN, ESPCN, VDSR, DRCN, and RED-Net. And it is published in 2017 CVPR with more than 100 citations. (SH Tsang @ Medium)


Recommender system using Bayesian personalized ranking

In this post, I will be discussing about Bayesian personalized ranking(BPR) , one of the famous learning to rank algorithms used in recommender systems. Before going into the details of BPR algorithm, I will give an overview of how recommender systems work in general and about my project on a music recommendation system. This will help some of you who are reading about recommender systems for the first time and serve as a refresher for the others.


Neural Networks: The theoretical understanding

This article is for students or professionals who want to learn neural networks and still looking for a perfect place to start learning. If you want to learn neural networks but you feel it’s rocket science, this article is for you. In this article, we are focusing more on the theory side of neural networks we won’t be talking much about the mathematics behind it. Basic knowledge of computer science will make it easy to understand.


Multiple Data (Time Series) Streams Clustering

Nowadays, data streams occur in many real scenarios. For example, they are generated from sensors, web traffic, satellites, and other interesting use cases. We have to process them in a fast way and extract from them as much knowledge as we can. Data streams have their own specific characteristics for processing and data mining. For example, they can be very fast, we can not process the whole history of data streams in memory, so we have to do it incrementally or in (e.g. sliding) windows. In this post, I will cover one data streams clustering method that was developed by me. In R, you can do data stream clustering by stream package, BUT! there are methods only for one stream clustering (not multiple streams). However, I want to show you clustering of multiple data streams, so from multiple sources (e.g. sensors).


Maximum Likelihood Estimation

Maximum Likelihood Estimation (MLE) is a frequentist approach for estimating the parameters of a model given some observed data. The general approach for using MLE is:
1. Observe some data.
2. Write down a model for how we believe the data was generated.
3. Set the parameters of our model to values which maximize the likelihood of the parameters given the data.
We’re going to cover the basics of creating models, understanding likelihood, and using the Maximum Likelihood Estimation process for fitting our parameters.


Learning aggregate functions

This article is inspired by the Kaggle competition https://…/elo-merchant-category-recommendation . While I did not participate in the competition, I used the data to explore another problem that often arises working with realistic data. All machine learning algorithms work great with the tabular data, but in reality a lot of data are relational. In this data set we are trying to draw conclusions about the users (identified by card id) based on the transaction history.


February Edition: Data Visualization

Data visualization is an essential step in any data science process. It’s the final bridge between the data scientist and end users. It communicates, validates, confronts and educates. And when done correctly, it opens up the insights from a data science project to a wider audience. Great data visualization is more than painting a pretty picture with numbers. In fact, that’s often only a small part of it. We also need to consider other factors such as the type of audience seeing the visuals, the level of data literacy, the need for interactivity and the overall story that multiple graphs are telling. These 8 articles make up our top picks for posts that provide helpful tools for data visualization, new insights on where the practice is headed, and interesting examples on how charts can be used effectively to tell a story.


Feature Selection Using Regularisation ?

Regularisation consists in adding a penalty to the different parameters of the machine learning model to reduce the freedom of the model and in other words to avoid overfitting. In linear model regularisation, the penalty is applied over the coefficients that multiply each of the predictors. From the different types of regularisation, Lasso or L1 has the property that is able to shrink some of the coefficients to zero. Therefore, that feature can be removed from the model. In this post I will demonstrate how to select features using the Lasso regularisation classification problem. For classification I will use the Paribas claims dataset from Kaggle.
Advertisements