Combining CNNs and RNNs – Crazy or Genius?

There are some interesting use cases where combining CNNs and RNN/LSTMs seems to make sense and a number of researchers pursuing this. However, the latest trends in CNNs may make this obsolete.

ebook: Using Deep Learning to Solve Real-World Problems

Read this eBook to learn: How deep learning enables image classification, sentiment analysis, and other advanced analysis techniques and get a a starter workflow for building and training deep learning models.

Multivariate ordinal categorical data generation

An economist contacted me about the ability of simstudy to generate correlated ordinal categorical outcomes. He is trying to generate data as an aide to teaching cost-effectiveness analysis, and is hoping to simulate responses to a quality-of-life survey instrument, the EQ-5D. The particular instrument has five questions related to mobility, self-care, activities, pain, and anxiety. Each item has three possible responses: (1) no problems, (2) some problems, and (3) a lot of problems. Although the instrument has been designed so that each item is orthogonal (independent) from the others, it is impossible to avoid correlation. So, in generating (and analyzing) these kinds of data, it is important to take this into consideration.

BooST (Boosting Smooth Trees) a new Machine Learning Model for Partial Effect Estimation in Nonlinear Regressions

We are happy to introduce our new machine learning method called Boosting Smooth Trees (BooST) (full article here). This model was a joint work with professors Marcelo Medeiros and Álvaro Veiga. The BooST uses a different type of regression tree that allows us to estimate the derivatives of very general nonlinear models. In other words, the model is differentiable and it has an analytical solution. The consequence is that now we can estimate partial effects of a characteristic on the response variable, which provide us much more interpretation than traditional importance measures. The idea behind the BooST is to replace traditional Classification and Regression Trees (CART), which are not differentiable, by Smooth logistic trees. We show that with this adaptation the BooST is a consistent estimator of the model´s derivatives under some assumptions.

Microsoft R Open 3.5.1 now available

Microsoft R Open 3.5.1 has been released, combining the latest R language engine with multi-processor performance and tools for managing R packages reproducibly. You can download Microsoft R Open 3.5.1 for Windows, Mac and Linux from MRAN now. Microsoft R Open is 100% compatible with all R scripts and packages, and works with all your favorite R interfaces and development environments. This update brings a number of minor fixes to the R language engine from the R core team. It also makes available a host of new R packages contributed by the community, including packages for downloading financial data, connecting with analytics systems, applying machine learning algorithms and statistical models, and many more. New R packages are released every day, and you can access packages released after the 1 August 2018 CRAN snapshot used by MRO 3.5.1 using the checkpoint package.

PCA revisited: using principal components for classification of faces

In this post I´m going to apply PCA to a toy problem: the classification of faces. Again I´ll be working on the Olivetti faces dataset. Please visit the previous post PCA revisited to read how to download it. The goal of this post is to fit a simple classification model to predict, given an image, the label to which it belongs. I´m goint to fit two support vector machine models and then compare their accuracy.

New Course: Analyzing Survey Data in R

You’ve taken a survey (or 1000) before, right? Have you ever wondered what goes into designing a survey and how survey responses are turned into actionable insights? Of course you have! In Analyzing Survey Data in R, you will work with surveys from A to Z, starting with common survey design structures, such as clustering and stratification, and will continue through to visualizing and analyzing survey results. You will model survey data from the National Health and Nutrition Examination Survey using R’s survey and tidyverse packages. Following the course, you will be able to successfully interpret survey results and finally find the answers to life’s burning questions!

Hyperparameter Optimization in Machine Learning Models

This tutorial covers what a parameter and a hyperparameter are in a machine learning model along with why it is vital in order to enhance your model´s performance.

AutoKeras: The Killer of Google’s AutoML

Google AI has finally released the beta version of AutoML, a service that some are saying will change the way we do deep learning entirely. Google´s AutoML is a new cloud software suite of Machine Learning tools. It´s based on Google´s state-of-the-art research in image recognition called Neural Architecture Search (NAS). NAS is basically an algorithm that, given your specific dataset, searches for the most optimal neural network to perform a certain task on that dataset. AutoML is then a suite of machine learning tools that will allow one to easily train high-performance deep networks, without requiring the user to have any knowledge of deep learning or AI; all you need is labelled data! Google will use NAS to then find the best network for your specific dataset and task. They´ve already shown how their methods can achieve performance that is far better than that of hand-designed networks.

How to Set Up a Free Data Science Environment on Google Cloud

In this post, we’ll walk through how to set up a data science environment on Google Cloud Platform (GCP). Google buys hundreds of thousands of individual computers, manages them in data centers that are located across the world using custom software, and offers these computers for rent. Because of the economy of scale that cloud hosting companies provide, individuals or teams can affordably access powerful computers with large amount of CPU and memory on demand.

Demystifying Data Science Terminology

The language used by data scientists can be confusing to anyone encountering it for the first time. Ever changing best practices and constantly evolving technologies and methodologies have given rise to a range of nuanced terms used throughout casual data conversation. Unfamiliarity with these terms often leads to disconnected expectations across different parts of a business when undertaking projects involving data and analytics. To make the most out of any data science project, it is important that participants have a shared vocabulary and an understanding of key terms at a level that is required of their role.