A Survey of Transfer Learning

Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.

RStudio Connect

RStudio Connect is a new publishing platform for the work your teams create in R. Share Shiny applications, R Markdown reports, Plumber APIs, dashboards, plots, and more in one convenient place. Use push-button publishing from the RStudio IDE, scheduled execution of reports, and flexible security policies to bring the power of data science to your entire enterprise.

AI Strategies – Incremental and Fundamental Improvements

Before starting to develop an AI strategy, make sure your team understands the limits of what is reasonable today, as well as incremental improvements that might be overlooked. Focus should be on your LOB leaders who understand the business. Make sure they are also able to recognize AI opportunities.

Demystifying Generative Adversarial Nets (GANs)

In this tutorial, you will learn what Generative Adversarial Networks (GANs) are without going into the details of the math. After, you will learn how to code a simple GAN which can create digits!

Top 10 Datasets for Deep Learning

1) MNIST Dataset
2) STL-10 Dataset
3) Fashion-MNIST
4) Yelp Open Dataset
5) IMDB Reviews
6) Netflix Prize Dataset
7) Million Song Dataset
8) Free Music Archive (FMA)
9) Google Audioset
10) Arcade Universe

7 Useful Suggestions from Andrew Ng “Machine Learning Yearning”

1. Optimizing and satisficing metrics
2. Choose dev/test sets quickly – don’t be afraid to change them if needed
3. Machine Learning is an iterative process: Don’t expect it to work first time
4. Build your first system quickly and then iterate
5. Evaluate multiple ideas in parallel
6. Consider if cleaning up mislabeled dev/test sets is worthwhile
7. Contemplate splitting dev sets into separate subsets

Solving Data Science tasks with Greenplum DB

Until 2016, the terms “data science” and “data mining” usually meant the use of the Hadoop ecosystem. But the game changed about two years ago. Many enterprises have faced the fact that the Hadoop stack is too heavy to use entirely for enterprise tasks. In addition to the Hadoop distribution itself (with about 20+ separated components), you must also take care of the necessary operation tasks, components version compatibility, etc. Moreover, to use your cluster for business tasks, you will need developers with knowledge of the specifics of Hadoop development, and there are not many of these professionals on the market now. All of these are well-known problems of using Hadoop in enterprise. Maybe that’s why there is no more Hadoop reference neither on the Cloudera landing page nor on the Strata Big Data conference. Nowadays, Hadoop is not used as a single data machine anymore – only some of its components (like Kafka, Spark, or NiFi) are still used in new production environments.

AI for Real-World Systems

Bonsai’s deep reinforcement learning platform provides the tools to build intelligence into complex and dynamic systems. The Bonsai Platform provides access to the most advanced deep reinforcement learning algorithms and compute infrastructure for model development an training. Models can then be flexibly deployed on-premises, on-device, or in the cloud.

How to solve 90% of NLP problems: a step-by-step guide

Whether you are an established company or working to launch a new service, you can always leverage text data to validate, improve, and expand the functionalities of your product. The science of extracting meaning and learning from text data is an active topic of research called Natural Language Processing (NLP).

Google Duplex: An AI System for Accomplishing Real World Tasks Over the Phone

A long-standing goal of human-computer interaction has been to enable people to have a natural conversation with computers, as they would with each other. In recent years, we have witnessed a revolution in the ability of computers to understand and to generate natural speech, especially with the application of deep neural networks (e.g., Google voice search, WaveNet). Still, even with today’s state of the art systems, it is often frustrating having to talk to stilted computerized voices that don’t understand natural language. In particular, automated phone systems are still struggling to recognize simple words and commands. They don’t engage in a conversation flow and force the caller to adjust to the system instead of the system adjusting to the caller. Today we announce Google Duplex, a new technology for conducting natural conversations to carry out “real world” tasks over the phone. The technology is directed towards completing specific tasks, such as scheduling certain types of appointments. For such tasks, the system makes the conversational experience as natural as possible, allowing people to speak normally, like they would to another person, without having to adapt to a machine.

Deep Learning for Electronic Health Records

When patients get admitted to a hospital, they have many questions about what will happen next. When will I be able to go home Will I get better Will I have to come back to the hospital Having precise answers to those questions helps doctors and nurses make care better, safer, and faster — if a patient’s health is deteriorating, doctors could be sent proactively to act before things get worse. Predicting what will happen next is a natural application of machine learning. We wondered if the same types of machine learning that predict traffic during your commute or the next word in a translation from English to Spanish could be used for clinical predictions.

Detecting Breast Cancer with Deep Learning

Breast cancer is the most common invasive cancer in women, and the second main cause of cancer death in women, after lung cancer. In this article I will build a WideResNet based neural network to categorize slide images into two classes, one that contains breast cancer and other that doesn’t using Deep Learning Studio (http://…/)

Torus for Docker-First Data Science

To help data science teams adopt Docker and apply DevOps best practices to streamline machine learning delivery pipelines, we open-sourced a toolkit based on the popular cookiecutter project structure.

Open-Source Machine Learning in Azure

The topic for my talk at the Microsoft Build conference yesterday was ‘Migrating Existing Open Source Machine Learning to Azure’. The idea behind the talk was to show how you can take the open-source tools and workflows you already use for machine learning and data science, and easily transition them to the Azure cloud to take advantage of its capacity and scale. The theme for the talk was ‘no surprises’, and other than the Azure-specific elements I tried to stick to standard OSS tools rather than Microsoft-specific things, to make the process as familiar as possible.

Data Augmentation | How to use Deep Learning when you have Limited Data — Part 2

We have all been there. You have a stellar concept that can be implemented using a machine learning model. Feeling ebullient, you open your web browser and search for relevant data. Chances are, you find a dataset that has around a few hundred images. You recall that most popular datasets have images in the order of tens of thousands (or more). You also recall someone mentioning having a large dataset is crucial for good performance. Feeling disappointed, you wonder; can my “state-of-the-art” neural network perform well with the meagre amount of data I have The answer is, yes! But before we get into the magic of making that happen, we need to reflect upon some basic questions.