modelDown: a website generator for your predictive models

So what about a website generator for predictive models Imagine that you can take a set of predictive models (generated with caret, mlr, glm, xgboost or randomForest, anything) and automagically generate a website with an exploration/documentation for these models. A documentation with archvist hooks to models, with tables and graphs for model performance explainers, conditional model response explainers or explainers for particular predictions.

TensorFlow Hub

TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a TensorFlow graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning. Modules contain variables that have been pre-trained for a task using a large dataset. By reusing a module on a related task, you can:
• train a model with a smaller dataset,
• improve generalization, or
• significantly speed up training.

TensorFlow Day is coming to OSCON

Machine learning (ML) is everywhere in computing and the popular press right now, and the rapid rate of innovation is being driven by open source software. TensorFlow is one of the most popular open source ML frameworks, and the subject of TensorFlow Day at OSCON this year (July 17 in Portland, Oregon). As an open source endeavor, TensorFlow is quite unusual: what´s available on GitHub is really the same code that is used daily in production at Google. And thanks to being open source, it´s now used by a universe of users, from academia to industry, and in places as unexpected as high schools and the arts.

Choosing the right programming language for machine learning algorithms with Apache Spark

Key considerations when deciding on the correct programming language to use

Reinforcement Learning Notebooks

A collection of Reinforcement Learning algorithms from Sutton and Barto’s book and other research papers implemented in Python.

Variance-Covariance Matrix: Stock Price Analysis in R (corpcor, covmat)

The purpose of a variance-covariance matrix is to illustrate the variance of a particular variable (diagonals) while covariance illustrates the covariances between the exhaustive combinations of variables.

Comparison of top data science libraries for Python, R and Scala [Infographic]

Data science is a promising and exciting field, developing rapidly. The area of data science use cases and influence is continuously expanding, and the toolkit to implement these applications is growing fast. Therefore data scientists should be aware of what are the best solutions for the particular tasks. So while many languages can be useful for a data scientist, these three remain the most popular and are developed to implement data science and machine learning solutions. In this post, we have prepared an infographic which shows top 20 libraries in each programming language which are beneficial to data scientists and data engineers work. This selection shows how languages relate to each other as well as which libraries have similar application area. Although there are many specific fields of application of different data science packages, we want to focus on those that are perfectly suited for machine learning, visualization, mathematics and engineering, data manipulation and analysis, and reproducible research.

Modern Graph Query Language – GSQL

This post introduces the prospect of fulfilling the need for a modern graph query language with GSQL

Inside the Mind of a Neural Network with Interactive Code in Tensorflow (Histogram, Activation, Interior/Integral Gradients)

I have been wanting to understand the inner workings of my models, for such a long time. And starting from today, I wish to learn about the topics related to this subject. And for this post I want to cover three topics, histogram of weights, visualizing the activation of neurons, interior / integral gradients.

Configuring Azure and RStudio for text analysis

I just finished teaching Computer-Assisted Content Analysis at the IQMR summer school at Syracuse. With three lecture and three labs the problem every year is getting the right R packages onto people´s machines. In particular, anything that involves compilation – and when you´re using quanteda, readtext, and stm, that´s lots of things – is going to be trouble. Over the years, R and the various operating systems it has to live in have got a lot better about this but ultimately the best solution is… not to do it at all. That is, to run everything on somebody else´s computers, excuse me, ‘in the cloud’. When students access an appropriately provisioned RStudio Server through their browsers they´re good to go from Lab one.

Time Series Deep Learning, Part 2: Predicting Sunspot Frequency with Keras LSTM In R

One of the ways Deep Learning can be used in business is to improve the accuracy of time series forecasts (prediction). We recently showed how a Long Short Term Memory (LSTM) Models developed with the Keras library in R could be used to take advantage of autocorrelation to predict the next 10 years of monthly Sunspots (a solar phenomenon that´s tracked by NASA). In this article, we teamed up with RStudio to take another look at the Sunspots data set, this time implementing some really advanced Deep Learning functionality available with TensorFlow for R. Sigrid Keydana, TF Developer Advocate at RStudio, put together an amazing Deep Learning tutorial using keras for implementing Keras in R and tfruns, a suite of tools for trackingtracking, visualizing, and managing TensorFlow training runs and experiments from R. Sounds amazing, right It is!

Interacting with AWS from R

If there is one realisation in life, it is the fact that you will never have enough CPU or RAM available for your analytics. Luckily for us, cloud computing is becoming cheaper and cheaper each year. One of the more established providers of cloud services is AWS. If you don´t know yet, they provide a free, yes free, option. Their t2.micro instance is a 1 CPU, 500MB machine, which doesn´t sound like much, but I am running a Rstudio and Docker instance on one of these for a small project.

Code for Workshop: Introduction to Machine Learning with R

These are the slides from my workshop: Introduction to Machine Learning with R which I gave at the University of Heidelberg, Germany on June 28th 2018. The entire code accompanying the workshop can be found below the video.

Addendum: Text-to-Speech with the googleLanguageR package

After posting my short blog post about Text-to-speech with R, I got two very useful tips. One was to use the googleLanguageR package, which uses the Google Cloud Text-to-Speech API. And indeed, it was very easy to use and the resulting audio sounded much better than what I tried before! Here´s a short example of how to use the package for TTS …

Best Practices for ML Engineering

This document is intended to help those with a basic knowledge of machine learning get the benefit of Google’s best practices in machine learning. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. If you have taken a class in machine learning, or built or worked on a machine­-learned model, then you have the necessary background to read this document.

Automated Machine Learning vs Automated Data Science

Automated machine learning is continually gaining increased exposure, yet there still seems to be some confusion as to what automated machine learning actually is. Is it the same thing as automated data science Let’s start by looking at what data science and machine learning are, as they are defined independently of one another.

Pushing Ordinary Least Squares to the limit with Xy()

Simulation is mostly about answering particular research questions. Whenever the word simulation appears somewhere in a discussion, everyone knows that this means additional effort. At STATWORX we are using simulations as a first step to proof concepts we are developing. Sometimes such a simulation is simple, in other cases a simulation is plenty of work. Though, research questions are always pretty unique, which complicates the construction of a simulation framework. With the Xy() function I try to establish such a simulation framework for supervised learning tasks by trying to find the right balance between capability and simplicity.