A Friendly Introduction to Recommender Systems

With the ever-growing volume, complexity and availability of online information, recommender systems have been an effective key solution to overcome such information overload. On the Internet, where the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to alleviate the problem of information overload. Recommender systems do so by searching through large volume of dynamically generated data to provide users with personalized content and services. In this article, we will explore the motivation behind recommendation systems, as well as provide an overview to different characteristics and potentials of various prediction techniques.

Hardware for Machine Learning

If you’re trying to create value by using Machine Learning, you need to be using the best hardware for the task. With CPUs, GPUs, ASICs, and TPUs, things can get kind of confusing. While for most of computing history there was only one type of processor, the wide growth of Deep Learning has led to two new entrants into the field: GPUs and ASICs. This post will walk through the different types of compute chips, where they’re available, and which ones are the best to boost your performance.

Visualizing Outliers

Visualizing data that looks like it came straight out of Statistics 101 text book is nice and all — for teaching and learning purposes. You gotta learn to stand before you can run a marathon. Once you’re ready for the real data though, which is fuzzier and more irregular, you run into data points that don’t quite fit in with the rest. The outliers. There are various ways to incorporate outliers into your visualization, but you have to understand them first.

Architecting Data Lakes

Author Ben Sharma explains the steps necessary to deploy data lakes with robust, metadata-driven data management platforms. You’ll learn best practices for building, maintaining, and deriving value from a data lake in your production environment. Included is a detailed checklist to help you construct a data lake in a controlled yet flexible way.

Own your work end-to-end.

Flotilla is a self-service batch job execution framework that dramatically simplifies the process of defining and executing containerized jobs. Focus on the work you’re doing rather than how to do it.

An Ode to Testing, my first review

To give you an idea of where I am in my R developer germination, I’d just started reading about testing when I received an email from @rOpenSci inviting me to review the weathercan package. Many of us in the R community feel like imposters when it comes to software development. In fact, as a statistician, it was a surprise to me when I was recently called a developer. In terms of formal computer science training, I took one subject in first year, with the appropriate initialism OOF. Ostensibly, this was to school me in Object Oriented Fundamentals, but mostly educated me in just how much one person can pontificate about doubles and floats. I am almost always befuddled by regexes on the rare occasions I come across them.

Will GDPR Make Machine Learning Illegal?

Does GDPR require Machine Learning algorithms to explain their output? Probably not, but experts disagree and there is enough ambiguity to keep lawyers busy.

Introduction to Optimization with Genetic Algorithm

Selection of the optimal parameters values for machine learning tasks is challenging. Some results may be bad not because the data is noisy or the used learning algorithm is weak, but due to the bad selection of the parameters values. This article gives a brief introduction about evolutionary algorithms (EAs) and describes genetic algorithm (GA) which is one of the simplest random-based EAs.

K-Means Clustering in R Tutorial

Learn all about clustering and, more specifically, k-means in this R Tutorial, where you’ll focus on a case study with Uber data.

Email Classification into relevant labels using Neural Networks

In the real world, many online shopping websites or service provider have single email-id where customers can send their query, concern etc. At the back-end service provider receive million of emails every week, how they can identify which email is belonged of a particular department? This paper presents an artificial neural network (ANN) model that is used to solve this problem and experiments are carried out on user personal Gmail emails datasets. This problem can be generalised as typical Text Classification or Categorization .

Artificial Neural Networks: Part1

Last week, I gave a one-hour seminar covering one of the machine learning tools which I have used extensively in my research: neural networks. Preparation of the seminar was very useful for me since it required me to make sure that I really understood how the networks function, and I (think I) finally got my head around back-propagation — more on that later. In this post, and depending on length, the next (few), I intend to reinterpret my seminar into something which might be of use to you, dear reader. Here goes! A neural network is a method in the field of machine learning. This field aims to build predictive models to help solve complex tasks by exposing a flexible system to a large amount of data. The system is then allowed to learn by itself how to best form its predictions.