An Introduction to Bayesian Reasoning

You might be using Bayesian techniques in your data science without knowing it! And if you’re not, then it could enhance the power of your analysis. This blog post, part 1 of 2, will demonstrate how Bayesians employ probability distributions to add information when fitting models, and reason about uncertainty of the model’s fit.

What the heck is Word Embedding

Word Embedding is really all about improving the ability of networks to learn from text data. By representing that data as lower dimensional vectors. These vectors are called Embedding. This technique is used to reduce the dimensionality of text data but these models can also learn some interesting traits about words in a vocabulary.

Better Language Models and Their Implications

We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization – all without task-specific training.

Sustainable Industry: Rinse Over Run – Benchmark

We’re really excited to launch our latest competition! In addition to an interesting, new prize structure, the subject matter is at the intersection of sustainability and industry. Improvements to these kinds of processes can have upside for both a business and the planet. The presence of particles, bacteria, allergens, or other foreign material in a food or beverage product can put consumers at risk. Manufacturers put extra care into ensuring that equipment is properly cleaned between uses to avoid any contamination. At the same time, the cleaning processes require substantial resources in the form of time and cleaning supplies, which are often water and chemical mixtures (e.g. caustic soda, acid, etc.). Given these concerns, the cleaning stations measure turbidity during the cleaning process. Turbidity quanitifies the suspended solids in the liquids that are coming out of the cleaning tank. The goal is to have those liquids be turbidity free, indicating that the equipment is fully clean. Depending on the expected level of turbidity, a cleaning station operator can either extend the final rinse (to eliminate remaining turbidity) or shorten it (saving time and water consumption). The goal of this competition is to predict turbidity in the last rinsing phase in order to help minimize the use of water, energy and time, while ensuring high cleaning standards.

An Introduction to Scala

Welcome to the first article in my multi-part series on Scala. This tutorial is designed to teach introductory Scala as a precursor to learning Apache Spark.

Introducing Uber’s Ludwig

Uber continues its spree of deep learning technology releases. Since last year, the Uber AI Labs team has open sourced different frameworks that enable many of the fundamental building blocks of deep learning solutions. The productivity of the Uber engineering team is nothing short of impressive: Pyro is a framework for probabilistic programming built on top of PyTorch, Horovod is a Tensor-Flow based framework for distributed learning, Manifold focused on visual debugging and interpretability and, of course, Michelangelo is a reference architecture for large scale machine learning solutions. The latest creation of Uber AI Labs is Ludwig, a toolbox for training deep learning models without writing any code. Training is one of the most developer intensive aspects of deep learning applications. Typically, data scientists spend numerous hours experimenting with different deep learning models to better perform about a specific training datasets. This process involves more than just training including several other aspects such as model comparison, evaluation, workload distribution and many others. Given its highly technical nature, the training of deep learning models is an activity typically constrained to data scientists and machine learning experts and includes a significant volume of code. While this problem can be generalized for any machine learning solution it has gotten way worse in deep learning architectures as they typically involve many layers and levels. Simplifying the training processes is the number one factor that can streamline the experimentation phase in deep learning solutions.

The Most Amount of Rain over a 10 Day Period on Record

Townsville, Qld, has been inundated with torrential rain and has broken the record of the largest rainfall over a 10 day period. It has been devastating for the farmers and residents of Townsville. I looked at Townsville’s weather data to understand how significant this event was and if there have been comparable events in the past.

ExFaKT: a framework for explaining facts over knowledge graphs and text

Today’s paper choice focuses on the topical area of fact-checking : how do we know whether a candidate fact, which might for example be harvested from a news article or social media post, is likely to be true? For the first generation of knowledge graphs, fact checking was performed manually by human reviewers, but this clearly doesn’t scale to the volume of information published daily. Automated fact checking methods typically produce a numerical score (probability the fact is true), but these scores are hard to understand and justify without a corresponding explanation.

Introduction to gradient boosting on decision trees with Catboost

Today I would like to share my experience with open source machine learning library, based on gradient boosting on decision trees, developed by Russian search engine company?-?Yandex.

Fondations of Machine Learning, part 5

This post is the nineth (and probably last) one of our series on the history and foundations of econometric and machine learning models. The first fours were on econometrics techniques.

Fondations of Machine Learning, part 4

This post is the eighth one of our series on the history and foundations of econometric and machine learning models. The first fours were on econometrics techniques.

Data-driven Introspection of my Android Mobile usage in R

This is an attempt to see how the data that are collected from us, can also be used for the betterment of us – one’s self. When companies are so interested in collecting our personal data to show a push in Quarterly revenues, Why not use our own Data Science skills and get some useful insight that can help our life.

Accelerating Time Series Analysis with Automated Machine Learning

This IDC Solution Spotlight examines how automated machine learning tools can augment the analysis, modeling, and prediction of time series data to deliver easily understood and actionable insights for businesses in a simple and agile fashion. Get the report now.

Do you know what I mean?

This deceptively simple game demonstrates one of the hardest problems in artificial intelligence. How should a person reason about other people’s reasoning? How should an agent model other agents? The way a person reasons about another person’s thinking is called the theory of mind (ToM). It’s an awareness that other people can also reason and act according to their reasoning. It’s the realization that others have goals and beliefs about the world and that these are dependent on the other agent’s mental state about the world. This mental state is often sufficient to predict behavior and thus is a nice thing to have. Note that these mental states can be formed via false beliefs about the world.