Automatic machine learning for data scientists

JustML provides automatic machine learning model selection, training and deployment in the cloud.

Data Exchange and Marketplace, a New Business Model in Making.

The Internet of Things (IoT) refers to the network of numerous physical devices, also known as the Internet of objects, refers to the networked interconnection of everyday objects (20 billion by 2020, according to Gartner). Such devices will be an integral part of next-generation computing, additionally, these devices will produce astronomical data volume, catapulting us into the world of zettabytes and yottabytes. Data is a new Oil, which is a byproduct of doing operations and for others, same data can be a catalyst to capture newer insights, build AI models and drive innovation.

R 3.5.0 is released

The build system rolled up R-3.5.0.tar.gz (codename ‘Joy in Playing’) this morning. The list below details the changes in this release. You can get the source code from http://…/R-3.5.0.tar.gz or wait for it to be mirrored at a CRAN site nearer to you. Binaries for various platforms will appear in due course.

An Introduction to Greta

I was surprised by greta. I had assumed that the tensorflow and reticulate packages would eventually enable R developers to look beyond deep learning applications and exploit the TensorFlow platform to create all manner of production-grade statistical applications. But I wasn’t thinking Bayesian. After all, Stan is probably everything a Bayesian modeler could want. Stan is a powerful, production-level probability distribution modeling engine with a slick R interface, deep documentation, and a dedicated development team. But greta lets users write TensorFlow-based Bayesian models directly in R! What could be more charming? greta removes the barrier of learning an intermediate modeling language while still promising to deliver high-performance MCMC models that run anywhere TensorFlow can go.

Absolute and Weighted Frequency of Words in Text

In this tutorial, you’ll learn about absolute and weighted word frequency in text mining and how to calculate it with defaultdict and pandas DataFrames.

Qualitative before Quantitative: How Qualitative Methods Support Better Data Science

Have you ever been embarrassed by the first iteration of one of your machine learning projects, where you didn’t include obvious and important features? In the practical hustle and bustle of trying to build models, we can often forget about the observation step in the scientific method and jump straight to hypothesis testing.

Swiftapply – automatically efficient pandas apply operations

Easily apply any function to a pandas dataframe in the fastest available manner. Time is precious. There is absolutely no reason to be wasting it waiting for your function to be applied to your pandas series (1 column) or dataframe (>1 columns). Don’t get me wrong, pandas is an amazing tool for python users, and a majority of the time pandas operations are very quick. Here, I wish to take the pandas apply function under close inspection. This function is incredibly useful, because it lets you easily apply any function that you’ve specified to your pandas series or dataframe. But there is a cost — the apply function essentially acts as a for loop, and a slow one at that. This means that the apply function is a linear operation, processing your function at O(n) complexity.