Distilled News

Challenges in sentiment analysis: a case for word clouds (for now)

When I think about artificial intelligence, I get into this tricky habit of mixing understanding with capability. I imagine that there are ways we can tell how much a machine knows by what it can produce. However, the interesting thing to me is that machines don’t actually need to understand anything to produce a lot computationally. Popular science blurs this idea with a lot of conversation around things like the turing test. Movies like Ex Machina and Bladerunner give us a sense that perhaps there is some understanding that machines can take on about the world around them. These ideas in our zeitgeist blur into how we perceive tool capability used in industry today. Through some tertiary data science exploration, I will try and unpack one challenge commonly still faced in marketing today – summarizing content without reading all the content.

Review: Shake-Shake Regularization (Image Classification)

In this story, Shake-Shake Regularization (Shake-Shake), by Xavier Gastaldi from London Business School, is briefly reviewed. The motivation of this paper is that data augmentation is applied at the input image, it might be also possible to apply data augmentation techniques to internal representations. It is found in prior art that adding noise to the gradient during training helps training and generalization of complicated neural networks. And Shake-Shake regularization can be seen as an extension of this concept where gradient noise is replaced by a form of gradient augmentation. This is a paper in 2017 ICLR Workshop with over 10 citations. And the long version in 2017 arXiv has got over 100 citations. (Sik-Ho Tsang @ Medium)

Evolving Deep Neural Networks

Many of us have seen Deep learning accomplishing huge success in a variety of fields in recent years, with most of it coming from their ability to automate the frequently tedious and difficult feature engineering phase by learning ‘hierarchical feature extractors’ from data. Also, as architecture design (i.e. the process of creating the shape and functionality of a neural network) happens to be a long and difficult process that has been mainly done manually, innovativeness is limited and most progress has come from old algorithms that have been performing remarkably well with nowadays computing resources and data. Another issue is that Deep Neural Networks are mainly optimized by gradient following algorithms (e.g. SGD, RMSProp), which are a great resource to constraint the search space but is susceptible to get trapped by local optima, saddle points and noisy gradients, especially in dense solution areas such as reinforcement learning . This article reviews how evolutionary algorithms have been proposed and tested as a competitive alternative to address the described problems.

Random Forest vs Neural Network

Which is better: Random Forest or Neural Network? This is a common question, with a very easy answer: it depends šŸ™‚ I will try to show you when it is good to use Random Forest and when to use Neural Network. First of all, Random Forest (RF) and Neural Network (NN) are different types of algorithms. The RF is the ensemble of decision trees. Each decision tree in the ensemble process the sample and predicts the output label (in case of classification). Decision trees in the ensemble are independent. Each can predict the final response. The Neural Network is a network of connected neurons. The neurons cannot operate without other neurons – they are connected. Usually, they are grouped in layers and process data in each layer and pass forward to the next layers. The last layer of neurons is making decisions.

Full Stack Deep Learning Steps and Tools

In this article, we get to know the steps on doing the Full Stack Deep Learning according to the FSDL course on March 2019. First, we need to setup and plan the project. We need to define the goals, metrics, and baseline in this step. Then, we collect the data and label it with available tools. In building the codebase, there are some tools that can maintain the quality of the project that have been described above. Then we do modeling with testing and debugging. After the model met the requirement, finally we know the step and tools for deploying and monitoring the application to the desired interface.

Learning representations for supervised information fusion using tensor decompositions and deep learning methods.

Machine learning is aimed at the automatic extraction of semantic-level information from potentially raw and unstructured data. A key challenge in building intelligent systems lies in the ability to extract and fuse information from multiple sources. In the present thesis, this challenge is addressed by using representation learning, which has been one of the most important innovations in machine learning in the last decade. Representation learning is the basis for modern approaches to natural language processing and artificial neural networks, in particular deep learning, which includes popular models such as convolutional neural networks (CNN) and recurrent neural networks (RNN). It has also been shown that many approaches to tensor decomposition and multi-way models can also be related to representation learning. Tensor decompositions have been applied to a variety of tasks, e.g., knowledge graph modeling and electroencephalography (EEG) data analysis. In this thesis, we focus on machine learning models based on recent representation learning techniques, which can combine information from multiple channels by exploiting their inherent multi-channel data structure.

A novel approach to visualize the categorical data in R

Recently, I came across to the ggalluvial package in R. This package is particularly used to visualize the categorical data. As usual, I will use it with medical data from NHANES. Ggalluvial is a great choice when visualizing more than two variables within the same plot.

Visualising Filters and Feature Maps for Deep Learning

Deep Neural Networks are one of the most powerful class of machine learning models. With enough data, their accuracy in tasks such as Computer Vision and Natural Language Processing (NLP) is unmatched. The only drawback that many scientists will comment on is the fact that these networks are completely black-box. We still have very little knowledge as to how deep networks learn their target patterns so well, especially how all the neurons work together to achieve the final result.

Machine Learning in Agriculture: Applications and Techniques

Recently we have discussed the emerging concept of smart farming that makes agriculture more efficient and effective with the help of high-precision algorithms. The mechanism that drives it is Machine Learning – the scientific field that gives machines the ability to learn without being strictly programmed. It has emerged together with big data technologies and high-performance computing to create new opportunities to unravel, quantify, and understand data intensive processes in agricultural operational environments.

Finding Bayesian Legos

Joe, a good family friend, dropped by earlier this week. As we do often, we discussed the weather (seems to be hotter than normal already here in the Pacific Northwest), the news (mostly about how we are both taking actions to avoid the news), and our kids. Both of us have children that really enjoy playing with LegosĀ®. And with Legos inevitably comes the intense pain of stepping on Legos, usually in the middle of the night or first thing in the morning on the way to make coffee. Stepping on lingering Legos seems to happen despite Joe and I both following after our children, picking up all the Legos we can find that the children left behind. Joe and I keep batting around ways to decrease the chance of stepping on Legos. After some time, I suggest we might be able to use probability and statistics to estimate the probability of there being Legos not removed in our sweeps after the kids. Joe says he’s on board, ‘Anything, my feet cannot take anymore!’. I fire up my favorite tools for estimating probabilities, and Joe and I get started on ways we might be able to estimate the likelihood there are Legos remaining after our sweeps to pick up the Legos missed by our children.

Kubernetes, The Open and Scalable Approach to ML Pipelines

Still waiting for ML training to be over? Tired of running experiments manually? Not sure how to reproduce results? Wasting too much of your time on devops and data wrangling? It’s okay if you’re a hobbyist, but data science models are meant to be incorporated into real business applications. Businesses won’t invest in data science if they don’t see a positive ROI. This calls for the adoption of an ‘engineered’ approach – otherwise it is no more than a glorified science project with data. Engineers use microservices and automated CI/CD (continuous integration and deployment) in modern agile development. You write code, push and it gets tested automatically on some cluster at scale. It then goes into some type of beta/Canary testing phase if it passes the test, and onto production from there. Kubernetes, a cloud-native cluster orchestrator, is the tool now widely used by developers and devops to build an agile application delivery environment.

R Packages worth a look

Interactive Document for Working with Association Rule Mining Analysis (ASSOCShiny)
An interactive document on the topic of association rule mining analysis using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the p …

Poisson Lognormal Models (PLNmodels)
The Poisson-lognormal model and variants can be used for a variety of multivariate problems when count data are at play, including principal component …

Generate tables and plots to get summaries of data (glancedata)
Generate data frames for all the variables with some summaries and also some plots for numerical variables. Several functions from the ‘tidyverse’ and …

Cartography for Statistical Analysis (oceanis)
Creating maps for statistical analysis such as proportional circles, chroropleth, typology and flows. Some functions use ‘shiny’ or ‘leaflet’ technolog …

Log Rotation and Conditional Backups (rotor)
Conditionally rotate or back-up files based on their size or the date of the last backup; inspired by the ‘Linux’ utility ‘logrotate’.

Easily Visualize Data from ‘ERDDAP’ Servers via the ‘rerddap’ Package (plotdap)
Easily visualize and animate ‘tabledap’ and ‘griddap’ objects obtained via the ‘rerddap’ package in a simple one-line command, using either base graphi …

Document worth reading: “Understanding Neural Networks via Feature Visualization: A survey”

A neuroscience method to understanding the brain is to find and study the preferred stimuli that highly activate an individual cell or groups of cells. Recent advances in machine learning enable a family of methods to synthesize preferred stimuli that cause a neuron in an artificial or biological brain to fire strongly. Those methods are known as Activation Maximization (AM) or Feature Visualization via Optimization. In this chapter, we (1) review existing AM techniques in the literature; (2) discuss a probabilistic interpretation for AM; and (3) review the applications of AM in debugging and explaining networks. Understanding Neural Networks via Feature Visualization: A survey

Distilled News

GraphQL, and the end of shared client-side model objects

Promise of GraphQL; the sky is the limit for tooling when a typed schema definition is the foundation being built upon. Tooling of this nature can make a premise that would’ve otherwise seemed prohibitively unwieldy – having a distinct client-side model type for every slight use case variation – not only achievable, but ideal.

The Launch of SingularityNet Beta and the Day Decentralized AI Became Real

Last week will be remembered like a big day for the nascent decentralized artificial intelligence(AI) market. SingularityNet, one of the pioneers of the decentralized AI trend and the platform powering the famous robot Sophia, announced the general availability of its platform on the Ethereum mainnet. The launch becomes the first general purpose platform for decentralized application and is likely to test the viability of this new approach.

Building Intuition for LSTMs

Simplified introduction to a class of neural networks used in sequential modelling.

The Empty Promise of Data Moats

Data has long been lauded as a competitive moat for companies, and that narrative’s been further hyped with the recent wave of AI startups. Network effects have been similarly promoted as a defensible force in building software businesses. So of course, we constantly hear about the combination of the two: ‘data network effects’ (heck, we’ve talked about them at length ourselves). But for enterprise startups – which is where we focus – we now wonder if there’s practical evidence of data network effects at all. Moreover, we suspect that even the more straightforward data scale effect has limited value as a defensive strategy for many companies. This isn’t just an academic question: It has important implications for where founders invest their time and resources. If you’re a startup that assumes the data you’re collecting equals a durable moat, then you might underinvest in the other areas that actually do increase the defensibility of your business long term (verticalization, go-to-market dominance, post-sales account control, the winning brand, etc).

Gamma Process Prior for Semiparametric Survival Analysis

Implementation of Mean Average Precision (mAP) with Non-Maximum Suppression (NMS)

You may think that the toughest part is over after writing your CNN object detection model. What about the metrics to measure how well your object detector is doing? The metric to measure objection detection is mAP. To implement the mAP calculation, the work starts from the predictions from the CNN object detection model.

On the importance of DSLs in ML and AI

Domain-Specific Languages make our life easier while developing AI/ML applications in many different ways. Choosing the right DSL for the job might matter more than the choice of the host language.

Graph Embedding for Deep Learning

There are alot of ways machine learning can be applied to graphs. One of the easiest is to turn graphs into a more digestible format for ML. Graph embedding is an approach that is used to transform nodes, edges, and their features into vector space (a lower dimension) whilst maximally preserving properties like graph structure and information. Graphs are tricky because they can vary in terms of their scale, specificity, and subject. A molecule can be represented as a small, sparse, and static graph, whereas a social network could be represented by a large, dense, and dynamic graph. Ultimately this makes it difficult to find a silver bullet embedding method. The approachess that will be covered each vary in performance on different datasets, but they are the most widely used in Deep Learning.

Automated Research and Beyond: The Evolution of Artificial Intelligence

You will soon be able to explain a research question to a machine and get an answer in return.

Python toolset for statistical comparison of machine learning models and human readers

The most common statistical methods for comparing machine learning models and human readers are p-value and confidence interval. Although receiving some criticism recently, p-value and confidence interval give more insight into results than a raw performance measure, if interpreted correctly, and are required by many journals. This post shows an example python code utilizing bootstrapping for computing confidence intervals and p-values comparing machine learning models and human readers.

Rules of Machine Learning: Best Practices for ML Engineering

This document is intended to help those with a basic knowledge of machine learning get the benefit of Google’s best practices in machine learning. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. If you have taken a class in machine learning, or built or worked on a machineĀ­-learned model, then you have the necessary background to read this document.
Rule #1: Don’t be afraid to launch a product without machine learning.
Rule #2: First, design and implement metrics.
Rule #3: Choose machine learning over a complex heuristic.
Rule #4: Keep the first model simple and get the infrastructure right.
Rule #5: Test the infrastructure independently from the machine learning.
Rule #6: Be careful about dropped data when copying pipelines.
Rule #7: Turn heuristics into features, or handle them externally.
Rule #8: Know the freshness requirements of your system.
Rule #9: Detect problems before exporting models.
Rule #10: Watch for silent failures.
Rule #11: Give feature columns owners and documentation.
Rule #12: Don’t overthink which objective you choose to directly optimize.
Rule #13: Choose a simple, observable and attributable metric for your first objective.
Rule #14: Starting with an interpretable model makes debugging easier.
Rule #15: Separate Spam Filtering and Quality Ranking in a Policy Layer.
Rule #16: Plan to launch and iterate.
Rule #17: Start with directly observed and reported features as opposed to learned features.
Rule #18: Explore with features of content that generalize across contexts.
Rule #19: Use very specific features when you can.
Rule #20: Combine and modify existing features to create new features in humanĀ­-understandable ways.
Rule #21: The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have.
Rule #22: Clean up features you are no longer using.
Rule #23: You are not a typical end user.
Rule #24: Measure the delta between models.
Rule #25: When choosing models, utilitarian performance trumps predictive power.
Rule #26: Look for patterns in the measured errors, and create new features.
Rule #27: Try to quantify observed undesirable behavior.
Rule #28: Be aware that identical short-term behavior does not imply identical long-term behavior.
Rule #29: The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time.
Rule #30: Importance-weight sampled data, don’t arbitrarily drop it!
Rule #31: Beware that if you join data from a table at training and serving time, the data in the table may change.
Rule #32: Re-use code between your training pipeline and your serving pipeline whenever possible.
Rule #33: If you produce a model based on the data until January 5th, test the model on the data from January 6th and after.
Rule #34: In binary classification for filtering (such as spam detection or determining interesting emails), make small short-term sacrifices in performance for very clean data.
Rule #35: Beware of the inherent skew in ranking problems.
Rule #36: Avoid feedback loops with positional features.
Rule #37: Measure Training/Serving Skew.
Rule #38: Don’t waste time on new features if unaligned objectives have become the issue.
Rule #39: Launch decisions are a proxy for long-term product goals.
Rule #40: Keep ensembles simple.
Rule #41: When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.
Rule #42: Don’t expect diversity, personalization, or relevance to be as correlated with popularity as you think they are.
Rule #43: Your friends tend to be the same across different products. Your interests tend not to be.

Technical debt for data scientists

Technical debt is the process of avoiding work today by promising to do work tomorrow. A team might identify that there’s a small time window for a particular change to be implemented and the only way they can hit that window is to take shortcuts in the development process. They might soberly calculate that the benefits of getting something done now are worth the costs of fixing it later. This kind of technical debt is similar to taking out a mortgage or small business loan. You don’t have the money to realize an opportunity right now, so you borrow that money even though it’s going to cost more down the road. The lifetime cost of the investment goes up, but at least you get to make the investment. Too often however, data science technical debt is more like a payday loan. We take shortcuts in developing a solution without an understanding of the risks and costs of those shortcuts, and without a realistic plan for how we’re going to pay back the debt. Code is produced, but it’s not tested, documented, or robust to changes in the system. The result is that data science projects become expensive or impossible to maintain as time goes on.

Awesome decision tree research papers

A curated list of decision, classification and regression tree research papers with implementations.

ARIMA/SARIMA vs LSTM with Ensemble learning Insights for Time Series Data

There are five types of traditional time series models most commonly used in epidemic time series forecasting. AR models express the current value of the time series linearly in terms of its previous values and the current residual, whereas MA models express the current value of the time series linearly in terms of its current and previous residual series. ARMA models are a combination of AR and MA models, in which the current value of the time series is expressed linearly in terms of its previous values and in terms of current and previous residual series. The time series defined in AR, MA, and ARMA models are stationary processes, which means that the mean of the series of any of these models and the covariance among its observations do not change with time. For non-stationary time series, transformation of the series to a stationary series has to be performed first. ARIMA model generally fits the non-stationary time series based on the ARMA model, with a differencing process which effectively transforms the non-stationary data into a stationary one. SARIMA models, which combine seasonal differencing with an ARIMA model, are used for time series data modeling with periodic characteristics.

Shallow Neural Networks

When we hear the name Neural Network, we feel that it consist of many and many hidden layers but there is a type of neural network with a few numbers of hidden layers. Shallow neural networks consist of only 1 or 2 hidden layers. Understanding a shallow neural network gives us an insight into what exactly is going on inside a deep neural network. In this post, let us see what is a shallow neural network and its working in a mathematical context. The figure below shows a shallow neural network with 1 hidden layer, 1 input layer and 1 output layer.

If you did not already know

Kernel Convolution (kervolution) google
Convolutional neural networks (CNNs) have enabled the state-of-the-art performance in many computer vision tasks. However, little effort has been devoted to establishing convolution in non-linear space. Existing works mainly leverage on the activation layers, which can only provide point-wise non-linearity. To solve this problem, a new operation, kervolution (kernel convolution), is introduced to approximate complex behaviors of human perception systems leveraging on the kernel trick. It generalizes convolution, enhances the model capacity, and captures higher order interactions of features, via patch-wise kernel functions, but without introducing additional parameters. Extensive experiments show that kervolutional neural networks (KNN) achieve higher accuracy and faster convergence than baseline CNN. …

Hyperbolic Attention Network google
We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact. …

Recurrent Predictive State Policy Network (RPSP) google
We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially observable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward. The recursive filter leverages predictive state representations (PSRs) (Rosencrantz and Gordon, 2004; Sun et al., 2016) by modeling predictive state– a prediction of the distribution of future observations conditioned on history and future actions. This representation gives rise to a rich class of statistically consistent algorithms (Hefny et al., 2018) to initialize the recursive filter. Predictive state serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying training while still allowing optimal behaviour. Moreover, we use the PSR interpretation during training as well, by incorporating prediction error in the loss function. The entire network (recursive filter and reactive policy) is still differentiable and can be trained using gradient based methods. We optimize our policy using a combination of policy gradient based on rewards (Williams, 1992) and gradient descent based on prediction error. We show the efficacy of RPSP-networks under partial observability on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models, being the overall best performing method. …

Book Memo: “Markov Chains”

From Theory to Implementation and Experimentation
This unique guide to Markov chains approaches the subject along the four convergent lines of mathematics, implementation, simulation, and experimentation. It introduces readers to the art of stochastic modeling, shows how to design computer implementations, and provides extensive worked examples with case studies. Markov Chains: From Theory to Implementation and Experimentation begins with a general introduction to the history of probability theory in which the author uses quantifiable examples to illustrate how probability theory arrived at the concept of discrete-time and the Markov model from experiments involving independent variables. An introduction to simple stochastic matrices and transition probabilities is followed by a simulation of a two-state Markov chain. The notion of steady state is explored in connection with the long-run distribution behavior of the Markov chain. Predictions based on Markov chains with more than two states are examined, followed by a discussion of the notion of absorbing Markov chains. Also covered in detail are topics relating to the average time spent in a state, various chain configurations, and n-state Markov chain simulations used for verifying experiments involving various diagram configurations.

R Packages worth a look

ggplot2′ Based Tool to Facilitate Diagnostic Plots for NLME Models (ggPMX)
At Novartis, we aimed at standardizing the set of diagnostic plots used for modeling activities in order to reduce the overall effort required for gene …

A Tidy Wrapper Around ‘gtrendsR’ (trendyy)
Access Google Trends information. This package provides a tidy wrapper to the ‘gtrendsR’ package. Use four spaces when indenting paragraphs within the …

Plotting, Smoothing and Growth Trait Extraction for Longitudinal Data (growthPheno)
Assists in producing longitudinal or profile plots of measured traits. These allow checks to be made for anomalous data and growth patterns in the data …

Bayesian Inference for Multinomial Models with Inequality Constraints (multinomineq)
Implements Gibbs sampling and Bayes factors for multinomial models with linear inequality constraints on the vector of probability parameters. As speci …

Regression Models and Utilities for Repeated Measures and Panel Data (panelr)
Provides an object type and associated tools for storing and wrangling panel data. Implements several methods for creating regression models that take …

Simulating Pollen Curves from Virtual Taxa with Different Life and Niche Traits (virtualPollen)
Tools to generate virtual environmental drivers with a given temporal autocorrelation, and to simulate pollen curves at annual resolution over millenni …

Book Memo: “Basic Elements of Computational Statistics”

This textbook on computational statistics presents tools and concepts of univariate and multivariate statistical data analysis with a strong focus on applications and implementations in the statistical software R. It covers mathematical, statistical as well as programming problems in computational statistics and contains a wide variety of practical examples. In addition to the numerous R sniplets presented in the text, all computer programs (quantlets) and data sets to the book are available on GitHub and referred to in the book. This enables the reader to fully reproduce as well as modify and adjust all examples to their needs. The book is intended for advanced undergraduate and first-year graduate students as well as for data analysts new to the job who would like a tour of the various statistical tools in a data analysis workshop. The experienced reader with a good knowledge of statistics and programming might skip some sections on univariate models and enjoy the various ma thematical roots of multivariate techniques.

Document worth reading: “Model Selection Techniques — An Overview”

In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of- the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection. Model Selection Techniques — An Overview