In this post, I explain the maths of Deep Learning in a simplified manner. To keep the explanation simple, we cover the workings of the MLP mode (Multilayer Perceptron). I have drawn upon a number of references – which are indicated in the post in the relevant sections.
It’s been said that Data Scientist is the ‘sexiest job title of the 21st century.’ This is because of one main reason that there is a humongous amount of data available as we are producing data at a rate as never before. With the dramatic access to data, there are sophisticated algorithms present such as Decision trees, Random Forests etc. When there is a humongous amount of data available, the most intricate part is to select the correct algorithm to solve the problem. Each model has its own pros and cons and should be selected depending on the type of problem at hand and data available.
In the Deep Learning scene, datasets increase in size and models in complexity. Accelerating their training is a major challenge that entails greater computational resources demands that can be satisfied by Supercomputing. In this post, we are going to explore how we can distribute the training of a single Deep Neural Network (DNN) over many GPUs and servers in order to reduce the training time. We will use Estimators as a TensorFlow API and Horovod as a distributing method.
Clustering is an unsupervised machine learning methodology that aims to partition data into distinct groups, or clusters. There are a few different forms including hierarchical, density, and similarity based. Each have a few different algorithms associated with it as well. One of the hardest parts of any machine learning algorithms is feature engineering, which can especially be difficult with clustering as there is no easy way to figure out what best segments your data into separate but similar groups.
Indeed, true personalization understands customers at a deeper level?-?their real-time intent, purchasing history, preferences and complex shopping journeys. It then utilizes these insights to tailor congruent, 1:1 interactions across channels. So far, most companies rely on machine learning, to take all this customer data and build predictive models on it, operating not just on what’s been programmed in rules, but also adapting to visitor behavior on the fly, based on what has been learned.
Introduction to likelihood-free inference and distillation of the paper Recurrent Machines for Likelihood-Free Inference, published at the NeurIPS 2018 Workshop on Meta-Learning.
The more I delve in data science, the more convinced I am that companies and data science practitioners must have a clear view on how to cut through the machine learning and AI hype, to implement an effective data science strategy that drives business value. This article hopes to establish a framework to conceptualize and implement effective data science projects.
Not a week goes by in sunny Silicon Valley that a new Artificial Intelligence company makes the headlines, with shiny new promises and/or mind-blowing funding rounds. The frenzy is not limited to the Valley: a cloud of somewhat overlapping concepts?-?big data, data science, machine learning, Artificial Intelligence, deep learning- have become mainstream in recent years; serious business people went as far as claiming that data scientist is ‘the Sexiest Job of the 21st Century’, partially vindicating my failed attempt to become an NBA superstar (for some clear photographic evidence, compare this life with the one below).
TensorFlow is the dominating Deep Learning framework for Data Scientists and Jupyter Notebook is the go-to tool for Data Scientists. What if you can use TensorFlow from anywhere without the hassle of setting up the environment? Better yet, what if you can use GPU to train your Deep Learning models for free? Google Colaboratory (Colab)is the answer! It is a very exciting technology that allows Data Scientists to focus on building Machine Learning models instead of the logistics! In this article, we’ll not only walk through the basics of using Colab, but also help you get started with TensorFlow with easy to understand examples.
Tabular methods refer to problems in which the state and actions spaces are small enough for approximate value functions to be represented as arrays and tables.
PySpark is a great language for performing exploratory data analysis at scale, building machine learning pipelines, and creating ETLs for a data platform. If you’re already familiar with Python and libraries such as Pandas, then PySpark is a great language to learn in order to create more scalable analyses and pipelines. The goal of this post is to show how to get up and running with PySpark and to perform common tasks.
I tried my hand at using the R package, randomForest to create two regression models for tree height and basal area based off some lidar and field-collected data in the Finger Lakes National Forest, NY.
If you haven’t committed to a comprehensive AI strategy, your competitors might already have an unfair advantage. At the New York Times DealBook conference, Intel emphasized it was urgent that every company put an artificial intelligence (AI) strategy in place. The reason, in a word, is data. The data deluge continues to accelerate, with data points from the Internet of Things (IoT) alone expected to bring another 20 billion new sources of information within the next two years.