Does Synthetic Data Hold The Secret To Artificial Intelligence?

Could synthetic data be the solution to rapidly train artificial intelligence (AI) algorithms? There are advantages and disadvantages to synthetic data; however, many technology experts believe that synthetic data is the key to democratizing machine learning and to accelerate testing and adoption of artificial intelligence algorithms into our daily lives.


Best Deals in Deep Learning Cloud Providers

I wanted to figure out where I should train my deep learning models online for the lowest cost and least hassle. I wasn’t able to find a good comparison of GPU cloud service providers, so I decided to make my own. Feel free to skip to the pretty charts if you know all about GPUs and TPUs and just want the results. I’m not looking at serving models in this article, but I might in the future. Follow me to make sure you don’t miss out.


Forecasting for data-driven decision making

Accurate forecasts are key components of successful data-driven businesses. We may forecast the need for internal ressources (e.g. call center staffing), key metrics that drive other business decisions (e.g. electricity demand to decide on constructing a new power plant), or customer demand for products we distribute (retail demand forecasting). Time horizons of our forecasts may differ widely from forecasts years in advance to only a few minutes.


Analyzing Experiment Outcomes: Beyond Average Treatment Effects

At Uber, we test most new features and products with the help of experiments in order to understand and quantify their impact on our marketplace. The analysis of experimental results traditionally focuses on calculating average treatment effects (ATEs). Since averages reduce an entire distribution to a single number, however, any heterogeneity in treatment effects will go unnoticed. Instead, we have found that calculating quantile treatment effects (QTEs) allows us to effectively and efficiently characterize the full distribution of treatment effects and thus capture the inherent heterogeneity in treatment effects when thousands of riders and drivers interact within Uber’s marketplace. Besides providing a more nuanced picture of the effect of a new algorithm, this analysis is relevant to our business because people remember negative experiences more strongly than positive ones (see Baumeister et al. (2001)). In this article, we describe what QTEs are, how exactly they provide additional insights beyond ATEs, why they are relevant for a business such as Uber’s, and how we calculate them.


Model evaluation, model selection, and algorithm selection in machine learning

This final article in the series Model evaluation, model selection, and algorithm selection in machine learning presents overviews of several statistical hypothesis testing approaches, with applications to machine learning model and algorithm comparisons. This includes statistical tests based on target predictions for independent test sets (the downsides of using a single test set for model comparisons was discussed in previous articles) as well as methods for algorithm comparisons by fitting and evaluating models via cross-validation. Lastly, this article will introduce nested cross-validation, which has become a common and recommended a method of choice for algorithm comparisons for small to moderately-sized datasets. Then, at the end of this article, I provide a list of my personal suggestions concerning model evaluation, selection, and algorithm selection summarizing the several techniques covered in this series of articles.


Getting to Know Keras for New Data Scientists

For many new data scientists transitioning into AI and deep learning, the Keras framework is an efficient tool. Keras is a powerful and easy-to-use Python library for developing and evaluating deep learning models. In this article, we’ll lay out the welcome mat to the framework. You should walk away with a handful of useful features to keep in mind as you get up to speed. In the words of the developers, ‘Keras is a high-level neural networks API, written in Python and developed with a focus on enabling fast experimentation.’ It has been open sourced since its initial release in March 2015. Its documentation can be found on keras.io with source code on GitHub.


Differences Between Machine Learning & Deep Learning

Starting off, you’ll learn about Artificial Intelligence and then move to machine learning and deep learning. You will further learn how machine learning is different from deep learning, the various kinds of algorithms that fall under these two domains of learning. Finally, you will be introduced to some real-life applications where machine learning and deep learning is being applied.


Simplifying Sentiment Analysis in Python

The promise of machine learning has shown many stunning results in a wide variety of fields. Natural language processing is no exception of it, and it is one of those fields where machine learning has been able to show general artificial intelligence (not entirely but at least partially) achieving some brilliant results for genuinely complicated tasks. Now, NLP (natural language processing) is not a new field and neither is machine learning. But the fusion of both the fields is quite contemporary and only vows to make progress. This is one of those hybrid applications which everyone (with a budget smart-phone) comes across daily. For example, take ‘keyboard word suggestion’ into the account, or intelligent auto-completions; these all are the byproducts of the amalgamation of NLP and Machine Learning, and quite naturally these have become the inseparable parts of our lives.


Machine Learning in Excel With Python

Machine learning is an important topic in lots of industries right now. It’s a fast moving field with lots of active research and receives huge amounts of media attention. This post isn’t intended to be an introduction to machine learning, or a comprehensive overview of the state of the art. Instead it will show how models built using machine learning can be leveraged from within Excel.


More on Bias Corrected Standard Deviation Estimates

This note is just a quick follow-up to our last note on correcting the bias in estimated standard deviations for binomial experiments. For normal deviates there is, of course, a well know scaling correction that returns an unbiased estimate for observed standard deviations.


Searching for the optimal hyper-parameters of an ARIMA model in parallel: the tidy gridsearch approach

In this blog post, I’ll use the data that I cleaned in a previous blog post, which you can download here. If you want to follow along, download the monthly data. In the previous blog post, I used the auto.arima() function to very quickly get a ‘good-enough’ model to predict future monthly total passengers flying from LuxAirport. ‘Good-enough’ models can be all you need in a lot of situations, but perhaps you’d like to have a better model. I will show here how you can get a better model by searching through a grid of hyper-parameters.


A Mathematician’s Perspective on Topological Data Analysis and R

A few years ago, when I first became aware of Topological Data Analysis (TDA), I was really excited by the possibility that the elegant theorems of Algebraic Topology could provide some new insights into the practical problems of data analysis. But time has passed, and the sober assessment of Larry Wasserman seems to describe where things stand.


How to teach an AI to play Games: Deep Reinforcement Learning

If you are excited about Machine Learning, and you’re interested in how it can be applied to Gaming or Optimization, this article is for you. We’ll see the basics of Reinforcement Learning, and more specifically Deep Reinforcement Learning (Neural Networks + Q-Learning) applied to the game Snake. Let’s dive into it!


Neural Net from scratch (using Numpy)

This post is about building a shallow NeuralNetowrk(nn) from scratch for a classification problem using numpy library in Python and also compare the performance against the LogisticRegression (using scikit learn). Building a nn from scratch helps in understanding how nn works in the back-end and it is essential for building effective models. Without delay lets dive into building our simple shallow nn model from scratch.


Feel discouraged by sparse data in your hand? Give Factorization Machine a shot (1)

If you’re a data scientist in industry, have you had experience of facing your customers and telling them the project might not be able to be realized as their expectation because of data sparsity?


A Comprehensive Hands-on Guide to Transfer Learning with Real-World Applications in Deep Learning

Conventional machine learning and deep learning algorithms, so far, have been traditionally designed to work in isolation. These algorithms are trained to solve specific tasks. The models have to be rebuilt from scratch once the feature-space distribution changes. Transfer learning is the idea of overcoming the isolated learning paradigm and utilizing knowledge acquired for one task to solve related ones. In this article, we will do a comprehensive coverage of the concepts, scope and real-world applications of transfer learning and even showcase some hands-on examples. To be more specific, we will be covering the following.


The ABC of Machine Learning

There has been a renewed interest in machine learning in the last few years. This revival seems to be driven by strong fundamentals?-?loads of data being emitted by sensors across the globe, with cheap storage and lowest ever computational costs! The Buzz around Machine Learning has developed keen interest among the techies to get their hands on ML. However, before diving into the ocean of ML here are few basic concepts that you should be familiar with. Keep this handy as you will come across these terms frequently while learning ML.


Implementing Facebook Prophet efficiently

If you have ever worked with time series predictions, I am quite sure you are well aware of the strains and pains that come with them. One moment you think you have cracked the stock market, the next moment you are lying in the bath crying and cursing your inaccurate models (I really don’t recommend that you try and predict the stock market, you will most likely not reap the benefits you think you will). What I am trying to say is that time series predictions are difficult and always require a very specialized data scientist to implement it.


Dash: A Beginner’s Guide

As a data scientist, one of the most integral aspects of our job is to relay and display data to ‘non-data scientists’, in formats that provide visually actionable data. In my opinion, one of the coolest parts of our job is interacting with data, especially when there is visual interaction. At times where we might want to build an interactive application, one of the options available to us is a framework called Dash. Dash is an open source Python framework for building web applications, created and maintained by the people at Plotly. Dash’s web graphics are completely interactive because the framework is built on top of Ploty.js, a JavaScript library written and maintained by Ploty. This means that after importing the Dash framework into a Python file you can build a web application writing strictly in Python with no other languages necessary.