GraphQL, and the end of shared client-side model objects

Promise of GraphQL; the sky is the limit for tooling when a typed schema definition is the foundation being built upon. Tooling of this nature can make a premise that would’ve otherwise seemed prohibitively unwieldy – having a distinct client-side model type for every slight use case variation – not only achievable, but ideal.

The Launch of SingularityNet Beta and the Day Decentralized AI Became Real

Last week will be remembered like a big day for the nascent decentralized artificial intelligence(AI) market. SingularityNet, one of the pioneers of the decentralized AI trend and the platform powering the famous robot Sophia, announced the general availability of its platform on the Ethereum mainnet. The launch becomes the first general purpose platform for decentralized application and is likely to test the viability of this new approach.

Building Intuition for LSTMs

Simplified introduction to a class of neural networks used in sequential modelling.

The Empty Promise of Data Moats

Data has long been lauded as a competitive moat for companies, and that narrative’s been further hyped with the recent wave of AI startups. Network effects have been similarly promoted as a defensible force in building software businesses. So of course, we constantly hear about the combination of the two: ‘data network effects’ (heck, we’ve talked about them at length ourselves). But for enterprise startups – which is where we focus – we now wonder if there’s practical evidence of data network effects at all. Moreover, we suspect that even the more straightforward data scale effect has limited value as a defensive strategy for many companies. This isn’t just an academic question: It has important implications for where founders invest their time and resources. If you’re a startup that assumes the data you’re collecting equals a durable moat, then you might underinvest in the other areas that actually do increase the defensibility of your business long term (verticalization, go-to-market dominance, post-sales account control, the winning brand, etc).

Gamma Process Prior for Semiparametric Survival Analysis

Implementation of Mean Average Precision (mAP) with Non-Maximum Suppression (NMS)

You may think that the toughest part is over after writing your CNN object detection model. What about the metrics to measure how well your object detector is doing? The metric to measure objection detection is mAP. To implement the mAP calculation, the work starts from the predictions from the CNN object detection model.

On the importance of DSLs in ML and AI

Domain-Specific Languages make our life easier while developing AI/ML applications in many different ways. Choosing the right DSL for the job might matter more than the choice of the host language.

Graph Embedding for Deep Learning

There are alot of ways machine learning can be applied to graphs. One of the easiest is to turn graphs into a more digestible format for ML. Graph embedding is an approach that is used to transform nodes, edges, and their features into vector space (a lower dimension) whilst maximally preserving properties like graph structure and information. Graphs are tricky because they can vary in terms of their scale, specificity, and subject. A molecule can be represented as a small, sparse, and static graph, whereas a social network could be represented by a large, dense, and dynamic graph. Ultimately this makes it difficult to find a silver bullet embedding method. The approachess that will be covered each vary in performance on different datasets, but they are the most widely used in Deep Learning.

Automated Research and Beyond: The Evolution of Artificial Intelligence

You will soon be able to explain a research question to a machine and get an answer in return.

Python toolset for statistical comparison of machine learning models and human readers

The most common statistical methods for comparing machine learning models and human readers are p-value and confidence interval. Although receiving some criticism recently, p-value and confidence interval give more insight into results than a raw performance measure, if interpreted correctly, and are required by many journals. This post shows an example python code utilizing bootstrapping for computing confidence intervals and p-values comparing machine learning models and human readers.

Rules of Machine Learning: Best Practices for ML Engineering

This document is intended to help those with a basic knowledge of machine learning get the benefit of Google’s best practices in machine learning. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. If you have taken a class in machine learning, or built or worked on a machine­-learned model, then you have the necessary background to read this document.
Rule #1: Don’t be afraid to launch a product without machine learning.
Rule #2: First, design and implement metrics.
Rule #3: Choose machine learning over a complex heuristic.
Rule #4: Keep the first model simple and get the infrastructure right.
Rule #5: Test the infrastructure independently from the machine learning.
Rule #6: Be careful about dropped data when copying pipelines.
Rule #7: Turn heuristics into features, or handle them externally.
Rule #8: Know the freshness requirements of your system.
Rule #9: Detect problems before exporting models.
Rule #10: Watch for silent failures.
Rule #11: Give feature columns owners and documentation.
Rule #12: Don’t overthink which objective you choose to directly optimize.
Rule #13: Choose a simple, observable and attributable metric for your first objective.
Rule #14: Starting with an interpretable model makes debugging easier.
Rule #15: Separate Spam Filtering and Quality Ranking in a Policy Layer.
Rule #16: Plan to launch and iterate.
Rule #17: Start with directly observed and reported features as opposed to learned features.
Rule #18: Explore with features of content that generalize across contexts.
Rule #19: Use very specific features when you can.
Rule #20: Combine and modify existing features to create new features in human­-understandable ways.
Rule #21: The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have.
Rule #22: Clean up features you are no longer using.
Rule #23: You are not a typical end user.
Rule #24: Measure the delta between models.
Rule #25: When choosing models, utilitarian performance trumps predictive power.
Rule #26: Look for patterns in the measured errors, and create new features.
Rule #27: Try to quantify observed undesirable behavior.
Rule #28: Be aware that identical short-term behavior does not imply identical long-term behavior.
Rule #29: The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time.
Rule #30: Importance-weight sampled data, don’t arbitrarily drop it!
Rule #31: Beware that if you join data from a table at training and serving time, the data in the table may change.
Rule #32: Re-use code between your training pipeline and your serving pipeline whenever possible.
Rule #33: If you produce a model based on the data until January 5th, test the model on the data from January 6th and after.
Rule #34: In binary classification for filtering (such as spam detection or determining interesting emails), make small short-term sacrifices in performance for very clean data.
Rule #35: Beware of the inherent skew in ranking problems.
Rule #36: Avoid feedback loops with positional features.
Rule #37: Measure Training/Serving Skew.
Rule #38: Don’t waste time on new features if unaligned objectives have become the issue.
Rule #39: Launch decisions are a proxy for long-term product goals.
Rule #40: Keep ensembles simple.
Rule #41: When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.
Rule #42: Don’t expect diversity, personalization, or relevance to be as correlated with popularity as you think they are.
Rule #43: Your friends tend to be the same across different products. Your interests tend not to be.

Technical debt for data scientists

Technical debt is the process of avoiding work today by promising to do work tomorrow. A team might identify that there’s a small time window for a particular change to be implemented and the only way they can hit that window is to take shortcuts in the development process. They might soberly calculate that the benefits of getting something done now are worth the costs of fixing it later. This kind of technical debt is similar to taking out a mortgage or small business loan. You don’t have the money to realize an opportunity right now, so you borrow that money even though it’s going to cost more down the road. The lifetime cost of the investment goes up, but at least you get to make the investment. Too often however, data science technical debt is more like a payday loan. We take shortcuts in developing a solution without an understanding of the risks and costs of those shortcuts, and without a realistic plan for how we’re going to pay back the debt. Code is produced, but it’s not tested, documented, or robust to changes in the system. The result is that data science projects become expensive or impossible to maintain as time goes on.

Awesome decision tree research papers

A curated list of decision, classification and regression tree research papers with implementations.

ARIMA/SARIMA vs LSTM with Ensemble learning Insights for Time Series Data

There are five types of traditional time series models most commonly used in epidemic time series forecasting. AR models express the current value of the time series linearly in terms of its previous values and the current residual, whereas MA models express the current value of the time series linearly in terms of its current and previous residual series. ARMA models are a combination of AR and MA models, in which the current value of the time series is expressed linearly in terms of its previous values and in terms of current and previous residual series. The time series defined in AR, MA, and ARMA models are stationary processes, which means that the mean of the series of any of these models and the covariance among its observations do not change with time. For non-stationary time series, transformation of the series to a stationary series has to be performed first. ARIMA model generally fits the non-stationary time series based on the ARMA model, with a differencing process which effectively transforms the non-stationary data into a stationary one. SARIMA models, which combine seasonal differencing with an ARIMA model, are used for time series data modeling with periodic characteristics.

Shallow Neural Networks

When we hear the name Neural Network, we feel that it consist of many and many hidden layers but there is a type of neural network with a few numbers of hidden layers. Shallow neural networks consist of only 1 or 2 hidden layers. Understanding a shallow neural network gives us an insight into what exactly is going on inside a deep neural network. In this post, let us see what is a shallow neural network and its working in a mathematical context. The figure below shows a shallow neural network with 1 hidden layer, 1 input layer and 1 output layer.