Reinforcement Learning Part 3 – Challenges & Considerations

In the first part of this series we described the basics of Reinforcement Learning (RL). In this article we describe how deep learning is augmenting RL and a variety of challenges and considerations that need to be addressed in each implementation.

How to Use the Machine Learning One-Click Install Image on DigitalOcean

Our blog post on the state of artificial intelligence talks about how prevalent AI and machine learning have become. Machine learning, or ML, is a subfield of AI focused on algorithms that learn models from data. ML has become critical not just in developing applications but also in analyzing data to make predictions that inform important business decisions. Because of this, the techniques used in ML are increasingly integrated in the sets of tools that developers use and in the software they write. This Machine Learning One-Click Application image includes a rich set of tools for data pre-processing, analysis, and visualization, as well as several state of the art libraries to help you get started with machine learning and deep learning.

Using python to work with time series data

The python ecosystem contains different packages that can be used to process time series. The following list is by no means exhaustive, feel free to submit a pr if you miss something.

Machine Learning in Production

After days and nights of hard work, going from feature engineering to cross validation, you finally managed to reach the prediction score that you wanted. Is it over? Well, since you did a great job, you decided to create a microservice that is capable of making predictions on demand based on your trained model. Let’s figure out how to do it. This article will discuss different options and then will present the solution that we adopted at ContentSquare to build an architecture for a prediction server.

Data Science 101 (Getting started in NLP): Tokenization tutorial

One common task in NLP (Natural Language Processing) is tokenization. ‘Tokens’ are usually individual words (at least in languages like English) and ‘tokenization’ is taking a text or set of text and breaking it up into its individual words. These tokens are then used as the input for other types of analysis or tasks, like parsing (automatically tagging the syntactic relationship between words).
In this tutorial you’ll learn how to:
• Read text into R
• Select only certain lines
• Tokenize text using the tidytext package
• Calculate token frequency (how often each token shows up in the dataset)
• Write reusable functions to do all of the above and make your work reproducible

PyTorch or TensorFlow?

This is a guide to the main differences I’ve found between PyTorch and TensorFlow. This post is intended to be useful for anyone considering starting a new project or making the switch from one deep learning framework to another. The focus is on programmability and flexibility when setting up the components of the training and deployment deep learning stack. I won’t go into performance (speed / memory usage) trade-offs.: PyTorch is better for rapid prototyping in research, for hobbyists and for small scale projects. TensorFlow is better for large-scale deployments, especially when cross-platform and embedded deployment is a consideration.

Layered Data Visualizations Using R, Plotly, and Displayr

If you have tried to communicate research results and data visualizations using R, there is a good chance you will have come across one of its great limitations. R is painful when you need to create visualizations by layering multiple visual elements on top of each other. In other words, R can be painful if you want to assemble many visual elements, such as charts, images, headings, and backgrounds, into one visualization.

3-D animations with R

R is often used to visualize and animate 2-dimensional data. (Here are just a few examples.) But did you know you can create 3-dimensional animations as well?

The one function call you need to know as a data scientist: h2o.automl

As you all know a large part of the work in predictive modeling is in preparing the data. But once you have done that, ideally you don’t want to spend too much work in trying many different machine learning models. That’s were AutoML from comes in. With one function call you automate the process of training a large, diverse, selection of candidate models. AutoML trains and cross-validates a Random Forest, an Extremely-Randomized Forest, GLM’s, Gradient Boosting Machines (GBMs) and Neural Nets. And then as “bonus” it trains a Stacked Ensemble using all of the models. The function to use in the h2o R interface is: h2o.automl. (There is also a python interface)

Using GRAKN.AI to detect patterns in credit fraud data

The worlds of first order logic and machine learning don’t usually collide. But with increasing sizes of datasets around the web and, more importantly, complex relationships that need to be represented, analysts need ways of applying machine learning techniques to discover patterns in their datasets. Sitting right in the middle of this space is GRAKN.AI, a powerful database technology that allows complex relationships to be represented and further provides an elegant query language, Graql, that allows retrieval of questions involving these relationships. This blog will assume some familiarity with the GRAKN.AI ecosystem including the terms Ontology, Entities, Relations and Roles. In case you’re not entirely familiar with the terms mentioned above, or if you just want some some additional background reading, check out our “What is an Ontology?” post, along with this Intro to GRAKN.AI. As always, for the most up to date info, check our documentation.