Reproducible Machine Learning with Jupyter and Quilt

In this guest blog post, Aneesh Karve, Co-founder and CTO of Quilt, demonstrates how Quilt works in conjunction with Domino’s Reproducibility Engine to make Jupyter notebooks portable and reproducible for machine learning.


5 Critical Questions You Need to Ask About Your Sensitive Data

Data privacy regulations, interconnectivity (virtual machines, cloud, IoT, BYOD), and cyber threats are changing the global digital landscape. With this transformation comes inherent risk, and adapting to a data-centric mindset can reduce compliance risk and mitigate damage in the event of a cyber attack. When evaluating your organization’s data strategy, it’s important to ask five critical questions: What data is considered sensitive? Where is it? Who has access to it (and should they)? When is data being transferred? And how is it managed? Answering these basic questions is increasingly difficult due to the exponential growth of electronic data, shadow IT, data sprawl, and other digital challenges. Nevertheless, this inquiry is the indispensable starting point to gain the necessary insight into sensitive data to manage security and regulatory risk. Sensitive data management is not only the cornerstone to mitigating risks, but a means to demonstrate business priorities, corporate ethics, and competitive differentiation. But before crafting any data management strategy, it is critical to first ask and answer the following five questions.


Some Reinforcement Learning: The Greedy and Explore-Exploit Algorithms for the Multi-Armed Bandit Framework in Python

In this article the multi-armed bandit framework problem and a few algorithms to solve the problem is going to be discussed. This problem appeared as a lab assignment in the edX course DAT257x: Reinforcement Learning Explained by Microsoft. The problem description is taken from the assignment itself.


Implementing Autoencoders in Keras: Tutorial

In this tutorial, you’ll learn more about autoencoders and how to build convolutional and denoising autoencoders with the notMNIST dataset in Keras.


What Does GDPR Mean for Machine Learning?

When the General Data Protection Regulation, or GDPR, takes effect May 25, it will bring about sweeping changes to the regulations surrounding how major organizations collect data. The GDPR is the largest-scale modification to data privacy regulations to take place within the past two decades, and it exists to protect the rights of individual consumers. The most pertinent question is how the GDPR will affect both machine learning and modern artificial intelligence—both of which are platforms that require constant access to large stores of data. Will it change the way enterprises use machine learning? Unfortunately, there seems to be a lot of confusion from lawyers, scholars, analysts and regulators — and it’s warranted. The GDPR context is vague and unclear. According to some interpretations of the regulation, all citizens and parties have a “right to explanation” regarding machine learning models and algorithms. In short, it means if and when they are affected by data, they have a right to know how and why the model made a particular decision, or carried out a specific action. This concept is extremely controversial. Many argue against everyone having this right, while others think it’s blatantly obvious and necessary.


Supervised vs. Unsupervised Learning

Within the field of machine learning, there are two main types of tasks: supervised, and unsupervised. The main difference between the two types is that supervised learning is done using a ground truth, or in other words, we have prior knowledge of what the output values for our samples should be. Therefore, the goal of supervised learning is to learn a function that, given a sample of data and desired outputs, best approximates the relationship between input and output observable in the data. Unsupervised learning, on the other hand, does not have labeled outputs, so its goal is to infer the natural structure present within a set of data points.


Design Patterns in R

These notes are inspired by
• a talk by Stuart Sierra on Design Patterns in Functional Programming and
• some thoughts I found on F# for fun an profit
and are reflection on how I use different strategies to solve things in R. Design Pattern seems to be a big word, especially because of its use in object-oriented programming. But in the end I think it is simply the correct label for reoccurring strategies to design software.


Data Science and the Art of Producing Entertainment at Netflix

Netflix has released hundreds of Originals and plans to spend $8 billion over the next year on content. Creators of these stories pour their hearts and souls into turning ideas into joy for our viewers. The sublime art of doing this well is hard to describe, but it necessitates a careful orchestration of creative, business and technical decisions. Here we will focus on the latter two?—?business & technical decisions like planning budgets, finding locations, building sets, and scheduling guest actors that enable the creative act of connecting with viewers.