Failure Modes in Machine Learning

In the last two years, more than 200 papers have been written on how Machine Learning (ML) can fail because of adversarial attacks on the algorithms and data; this number balloons if we were to incorporate non-adversarial failure modes. The spate of papers has made it difficult for ML practitioners, let alone engineers, lawyers and policymakers, to keep up with the attacks against and defenses of ML systems. However, as these systems become more pervasive, the need to understand how they fail, whether by the hand of an adversary or due to the inherent design of a system, will only become more pressing. The purpose of this document is to jointly tabulate both the of these failure modes in a single place.


A neural network that transforms a design mock-up into a static website.

Search Optimization for Large Data Sets for GDPR

This post describes our approach to addressing the challenges of cost and scale in scanning for GDPR compliance in Adobe Experience Platform.

How to use a Machine Learning Model to Make Predictions on Streaming Data using PySpark

Picture this – every second, more than 8,500 Tweets are sent, more than 900 photos are uploaded on Instagram, more than 4,200 Skype calls are made, more than 78,000 Google Searches happen, and more than 2 million emails are sent (according to Internet Live Stats). We are generating data at an unprecedented pace and scale right now. What a great time to be working in the data science space! But with great data, comes equally complex challenges. Primarily – how do we collect data at this scale? How do we ensure that our machine learning pipeline continues to churn out results as soon as the data is generated and collected? These are significant challenges the industry is facing and why the concept of Streaming Data is gaining more traction among organizations.

Text Generation with Python

This article is a little bit different compared with the other ones that I have already published. I don’t write this article to share some pieces of codes but to share with you the first article almost totally written with GPT-2. The introduction and the conclusion are written by me and not by the GPT-2 model. The rest of the article is generated by the model with some tricks. The topic of the the article is about Text generation…

Fairness Indicators: Scalable Infrastructure for Fair ML Systems

While industry and academia continue to explore the benefits of using machine learning (ML) to make better products and tackle important problems, algorithms and the datasets on which they are trained also have the ability to reflect or reinforce unfair biases. For example, consistently flagging non-toxic text comments from certain groups as ‘spam’ or ‘high toxicity’ in a moderation system leads to exclusion of those groups from conversation. In 2018, we shared how Google uses AI to make products more useful, highlighting AI principles that will guide our work moving forward. The second principle, ‘Avoid creating or reinforcing unfair bias,’ outlines our commitment to reduce unjust biases and minimize their impacts on people.

Lessons Learned from Developing ML for Healthcare

In an effort to improve guidance for research at the intersection of ML and healthcare, we have written a pair of articles, published in Nature Materials and the Journal of the American Medical Association (JAMA). The first is for ML practitioners to better understand how to develop ML solutions for healthcare, and the other is for doctors who desire a better understanding of whether ML could help improve their clinical work.

AI, Analytics, Machine Learning, Data Science, Deep Learning Technology Main Developments in 2019 and Key Trends for 2020

We asked leading experts – what are the most important developments of 2019 and 2020 key trends in AI, Analytics, Machine Learning, Data Science, and Deep Learning? This blog focuses mainly on technology and deployment.

Interpretability: Cracking open the black box, Part 2

The second part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers post-hoc interpretation that is useful when the model is not transparent.

Towards a new Theory of Learning: Statistical Mechanics of Deep Neural Networks

Here, I am going to sketch out the ideas we are currently researching to develop a new theory of generalization for Deep Neural Networks. We have a lot of work to do, but I think we have made enough progress to present these ideas, informally, to flush out the basics.

Dimensionality reduction method through autoencoders

We’ve already talked about dimensionality reduction long and hard in this blog, usually focusing on PCA. Besides, in my latest post I introduced another way to reduce dimensions based on autoencoders. However, in that time I focused on how to use autoencoders as predictor, while now I’d like to consider them as a dimensionality reduction technique. Just a reminder about how autoencoders work. Its procedure starts compressing the original data into a shortcode ignoring noise. Then, the algorithm uncompresses that code to generate an image as close as possible to the original input.

Outlier Detection (Part 2): Multivariate

Mahalanobis distance | Robust estimates (MCD): Example in R