Book Memo: “Mathematical Theories of Machine Learning”

Theory and Applications
This book studies mathematical theories of machine learning. The first part of the book explores the optimality and adaptivity of choosing step sizes of gradient descent for escaping strict saddle points in non-convex optimization problems. In the second part, the authors propose algorithms to find local minima in nonconvex optimization and to obtain global minima in some degree from the Newton Second Law without friction. In the third part, the authors study the problem of subspace clustering with noisy and missing data, which is a problem well-motivated by practical applications data subject to stochastic Gaussian noise and/or incomplete data with uniformly missing entries. In the last part, the authors introduce an novel VAR model with Elastic-Net regularization and its equivalent Bayesian model allowing for both a stable sparsity and a group selection.

R Packages worth a look

Implementation of SCORE, SCORE+ and Mixed-SCORE (ScorePlus)
Implementation of community detection algorithm SCORE in the paper J. Jin (2015) <arXiv:1211.5803>, and SCORE+ in J. Jin, Z. Ke and S. Luo (2018) …

Fitting Discrete Distribution Models to Count Data (scModels)
Provides functions for fitting discrete distribution models to count data. Included are the Poisson, the negative binomial and, most importantly, a new …

Interface to the ‘JWSACruncher’ of ‘JDemetra+’ (rjwsacruncher)
JDemetra+’ (<https://…/jdemetra-app> ) is the seasonal adjustment softw …

Creating Visuals for Publication (utile.visuals)
A small set of functions for making visuals for publication in ‘ggplot2’. Key functions include geom_stepconfint() for drawing a step confidence interv …

Distilled News

What is TensorFrames? TensorFlow + Apache Spark

TensorFrames is an open source created by Apache Spark contributors. Its functions and parameters are named the same as in the TensorFlow framework. Under the hood, it is an Apache Spark DSL (domain-specific language) wrapper for Apache Spark DataFrames. It allows us to manipulate the DataFrames with TensorFlow functionality. And no, it is not pandas DataFrame, it is based on Apache Spark DataFrame.


Measuring Performance: AUPRC

The area under the precision-recall curve (AUPRC) is another performance metric that you can use to evaluate a classification model. If your model achieves a perfect AUPRC, it means your model can find all of the positive samples (perfect recall) without accidentally marking any negative samples as positive (perfect precision.) It’s important to consider both recall and precision together, because you could achieve perfect recall (but bad precision) using a naive classifier that marked everything positive, and you could achieve perfect precision (but bad recall) using a naive classifier that marked everything negative.


Introduction to U-Net and Res-Net for Image Segmentation

Human beings are able to see the images by capturing the reflected light rays which is a very complex task. So how can machines be programmed to perform a similar task? Computer sees the images as matrices which need to be processed to get a meaning out of it. Image segmentation is the method to partition the image into various segments with each segment having a different entity. Convolutional Neural Networks are successful for simpler images but haven’t given good results for complex images. This is where other algorithms like U-Net and Res-Net come into play.


Top 10 Neural Network Architectures

10 Neural Network Architectures
• Perceptrons
• Convolutional Neural Networks
• Recurrent Neural Networks
• Long / Short Term Memory
• Gated Recurrent Unit
• Hopfield Network
• Boltzmann Machine
• Deep Belief Networks
• Autoencoders
• Generative Adversarial Network


AI adoption is being fueled by an improved tool ecosystem

In this post, I share slides and notes from a keynote that Roger Chen and I gave at the 2019 Artificial Intelligence conference in New York City. In this short summary, I highlight results from a – survey (AI Adoption in the Enterprise) and describe recent trends in AI. Over the past decade, AI and machine learning (ML) have become extremely active research areas: the web site arxiv.org had an average daily upload of around 100 machine learning papers in 2018. With all the research that has been conducted over the past few years, it’s fair to say that we now have entered the implementation phase for many AI technologies. Companies are beginning to translate research results and developments into products and services.


The Kalman Filter and (Maximum) Likelihood

I first realized the power of the Kalman Filter during Kaggle’s Web Traffic Time Series Forecasting competition, a contest requiring prediction of future web traffic volumes for thousands of Wikipedia pages. In this contest, simple heuristics like ‘median of medians’ were hard to beat and my own modeling choices were mostly ineffective. Of course I blamed my tools, and wondered if anything in the traditional statistical toolbox was up for this task. Then I read a post by a user known only as ‘os,’ 8th place with Kalman filters.


Machine Learning Python Keras Dropout Layer Explained

Machine learning is ultimately used to predict outcomes given a set of features. Therefore, anything we can do to generalize the performance of our model is seen as a net gain. Dropout is a technique used to prevent a model from overfitting. Dropout works by randomly setting the outgoing edges of hidden units (neurons that make up hidden layers) to 0 at each update of the training phase. If you take a look at the Keras documentation for the dropout layer, you’ll see a link to a white paper written by Geoffrey Hinton and friends, which goes into the theory behind dropout.


Robotic Process Automation

A recent finding from KPMG’s Global Sourcing Advisory Pulse Survey ‘Robotic Revolution’, suggests that technology experts believe ‘The opportunities [from RPA] are many – so are the adoption challenges… For most organisations, taking advantage of higher-end RPA opportunities will be easier said than done.’


Anomaly detection in time series with Prophet library

First of all, let’s define what is an anomaly in time series. Anomaly detection problem for time series can be formulated as finding outlier data points relative to some standard or usual signal. While there are plenty of anomaly types, we’ll focus only on the most important ones from a business perspective, such as unexpected spikes, drops, trend changes and level shifts. You can solve this problem in two way: supervised and unsupervised. While the first approach needs some labeled data, second does not, you need the just raw data. On that article, we will focus on the second approach.


What you need to know: The Modern Open-Source Data Science/Machine Learning Ecosystem

As we have done before (see 2017 data science ecosystem, 2018 data science ecosystem), we examine which tools were part of the same answer – the skillset of the user. We note that this does not necessarily mean that all tools were used together on each project, but having knowledge and skills to used both tools X and Y makes it more likely that both X and Y were used together on some projects. The results we see are consistent with this assumption. The top tools show surprising stability – we see essentially the same pattern as last year.


Why correlation might tell us nothing about outliers

We often hear claims à la ‘there is a high correlation between x and y .’ This is especially true with alleged findings about human or social behaviour in psychology, the social sciences or economics. A reported Pearson correlation coefficient of 0.8 indeed seems high in many cases and escapes our critical evaluation of its real meaning. So let’s see what correlation actually means and if it really conveys the information we often believe it does. Inspired by the funny spurious correlation project as well as Nassim Taleb’s medium post and Twitter rants in which he laments psychologists’ (and not only) total ignorance and misuse of probability and statistics, I decided to reproduce his note on how much information the correlation coefficient conveys under the Gaussian distribution.


Monte Carlo Simulation and Statistical Probability Distributions in Python

One method that is very useful for data scientist/data analysts in order to validate methods or data is Monte Carlo simulation. In this article, you learn how to do a Monte Carlo simulation in Python. Furthermore, you learn how to make different Statistical probability distributions in Python.


Recommendation Systems using UV-Decomposition

There are countless examples of recommender systems in use across nearly every industry in existence today. Most people understand, at a high level, what recommender systems attempt to achieve. However, not many people understand how they work at a deeper level. This is what Leskovec, Rajaraman, and Ullman dive into in Chapter 9 of their book Mining of Massive Datasets. A great example of a recommender system in use today is Netflix. Whenever you log into Netflix there are various sections such as ‘Trending Now’ or ‘New Releases’, but there is also a section titled ‘Top Picks for You’. This section uses a complex formula that tries to estimate which movies you would enjoy the most. The formula takes into account previous movies you have enjoyed as well as other movies people like you have also enjoyed.


Graph Algorithms (Part 2)

Graphs are becoming central to machine learning these days, whether you’d like to understand the structure of a social network by predicting potential connections, detecting fraud, understand customer’s behavior of a car rental service or making real-time recommendations for example. In this article, we’ll cover :
• The main graph algorithms
• Illustrations and use-cases
• Examples in Python


Kernel Secrets in Machine Learning Pt. 2

In the first article on the topic, Kernel Secrets in Machine Learning, I explained kernels in the most basic way possible. Before reading further I would advise you to quickly go through the article to get a feel of what a kernel really is if you have not already. Hopefully, you are going to come to this conclusion: A kernel is a similarity measure between two vectors in a mapped space. Good. Now we can check out and discuss some well-known kernels and also how do we combine kernels to produce other kernels. Keep in mind, for the examples that we are going to use, the x’s are one-dimensional vectors for plotting purposes and we fix the value of x’ to 2. Without further ado, let’s start kerneling.


Kernel Secrets in Machine Learning Pt. 1

This post is not about deep learning. But it could be might as well. This is the power of kernels. They are universally applicable in any machine learning algorithm. Why you might ask? I am going to try to answer this question in this article.


How to learn the maths of Data Science using your high school maths knowledge

This post is a part of my forthcoming book on Mathematical foundations of Data Science. In this post, we use the Perceptron algorithm to bridge the gap between high school maths and deep learning.


What is Quantum Computing and How is it Useful for Artificial Intelligence?

After decades of a heavy slog with no promise of success, quantum computing is suddenly buzzing! Nearly two years ago, IBM made a quantum computer available to the world. The 5-quantum-bit (qubit) resource they now call the IBM Q experience. It was more like a toy for researchers than a way of getting any serious number crunching done. But 70,000 users worldwide have registered for it, and the qubit count in this resource has now quadrupled. With so many promises by quantum computing and data science being at the helm currently, are there any offerings by quantum computing for the AI? Let us explore that in this blog!


Machine Learning : Unsupervised – Hierarchical Clustering and Bootstrapping

This article is based on Unsupervised Learning algorithm: Hierarchical Clustering. This is the brief illustration with a practical working example of forming unsupervised hierarchical clusters and testing them to assure that you have formed the right clusters. This is a real-life data world example which can be studied and evaluated as data is provided for personal use and practice. There are variations to each topic in data science but there is a brief basic pattern that can be followed to build models. ‘The Datum’ empowers you to have access to these basic patterns for your lifetime and building upon them as you progress. Consider ‘The Datum’ blogs as your cookbook hand out which will help you learn, refer, and contribute the relevant topics. All listings and models are implemented in R Language using R Studio, and image instances of my work are embedded in this article for your reference.

If you did not already know

Estimation of Distribution Algorithm (EDA) google
Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding the uniform distribution over admissible solutions and ending with the model that generates only the global optima. EDAs belong to the class of evolutionary algorithms. The main difference between EDAs and most conventional evolutionary algorithms is that evolutionary algorithms generate new candidate solutions using an implicit distribution defined by one or more variation operators, whereas EDAs use an explicit probability distribution encoded by a Bayesian network, a multivariate normal distribution, or another model class. Similarly as other evolutionary algorithms, EDAs can be used to solve optimization problems defined over a number of representations from vectors to LISP style S expressions, and the quality of candidate solutions is often evaluated using one or more objective functions.
Level-Based Analysis of the Univariate Marginal Distribution Algorithm


Abnormal Event Detection Network (AED-Net) google
It is challenging to detect the anomaly in crowded scenes for quite a long time. In this paper, a self-supervised framework, abnormal event detection network (AED-Net), which is composed of PCAnet and kernel principal component analysis (kPCA), is proposed to address this problem. Using surveillance video sequences of different scenes as raw data, PCAnet is trained to extract high-level semantics of crowd’s situation. Next, kPCA,a one-class classifier, is trained to determine anomaly of the scene. In contrast to some prevailing deep learning methods,the framework is completely self-supervised because it utilizes only video sequences in a normal situation. Experiments of global and local abnormal event detection are carried out on UMN and UCSD datasets, and competitive results with higher EER and AUC compared to other state-of-the-art methods are observed. Furthermore, by adding local response normalization (LRN) layer, we propose an improvement to original AED-Net. And it is proved to perform better by promoting the framework’s generalization capacity according to the experiments. …

Fixed-Size Ordinally Forgetting Encoding (FOFE) google
Question answering over knowledge base (KB-QA) has recently become a popular research topic in NLP. One popular way to solve the KB-QA problem is to make use of a pipeline of several NLP modules, including entity discovery and linking (EDL) and relation detection. Recent success on KB-QA task usually involves complex network structures with sophisticated heuristics. Inspired by a previous work that builds a strong KB-QA baseline, we propose a simple but general neural model composed of fixed-size ordinally forgetting encoding (FOFE) and deep neural networks, called FOFE-net to solve KB-QA problem at different stages. For evaluation, we use two popular KB-QA datasets, SimpleQuestions and WebQSP, and a newly created dataset, FreebaseQA. The experimental results show that FOFE-net performs well on KB-QA subtasks, entity discovery and linking (EDL) and relation detection, and in turn pushing overall KB-QA system to achieve strong results on all datasets. …

Q-Graph google
Arising user-centric graph applications such as route planning and personalized social network analysis have initiated a shift of paradigms in modern graph processing systems towards multi-query analysis, i.e., processing multiple graph queries in parallel on a shared graph. These applications generate a dynamic number of localized queries around query hotspots such as popular urban areas. However, existing graph processing systems are not yet tailored towards these properties: The employed methods for graph partitioning and synchronization management disregard query locality and dynamism which leads to high query latency. To this end, we propose the system Q-Graph for multi-query graph analysis that considers query locality on three levels. (i) The query-aware graph partitioning algorithm Q-cut maximizes query locality to reduce communication overhead. (ii) The method for synchronization management, called hybrid barrier synchronization, allows for full exploitation of local queries spanning only a subset of partitions. (iii) Both methods adapt at runtime to changing query workloads in order to maintain and exploit locality. Our experiments show that Q-cut reduces average query latency by up to 57 percent compared to static query-agnostic partitioning algorithms. …

Document worth reading: “On the Implicit Assumptions of GANs”

Generative adversarial nets (GANs) have generated a lot of excitement. Despite their popularity, they exhibit a number of well-documented issues in practice, which apparently contradict theoretical guarantees. A number of enlightening papers have pointed out that these issues arise from unjustified assumptions that are commonly made, but the message seems to have been lost amid the optimism of recent years. We believe the identified problems deserve more attention, and highlight the implications on both the properties of GANs and the trajectory of research on probabilistic models. We recently proposed an alternative method that sidesteps these problems. On the Implicit Assumptions of GANs

Distilled News

Applying AutoML to Transformer Architectures

Since it was introduced a few years ago, Google’s Transformer architecture has been applied to challenges ranging from generating fantasy fiction to writing musical harmonies. Importantly, the Transformer’s high performance has demonstrated that feed forward neural networks can be as effective as recurrent neural networks when applied to sequence tasks, such as language modeling and translation. While the Transformer and other feed forward models used for sequence problems are rising in popularity, their architectures are almost exclusively manually designed, in contrast to the computer vision domain where AutoML approaches have found state-of-the-art models that outperform those that are designed by hand. Naturally, we wondered if the application of AutoML in the sequence domain could be equally successful.


Interactive Network Visualization with R

Networks are everywhere. We have social networks like Facebook, competitive product networks or various networks in an organisation. Also, for STATWORX it is a common task to unveil hidden structures and clusters in a network and visualize it for our customers. In the past, we used the tool Gephi to visualize our results in network analysis. Impressed by this outstanding pretty and interactive visualization, our idea was to find a way to do visualizations in the same quality directly in R and present it to our customers in an R Shiny app.


An End to End Introduction to GANs

I bet most of us have seen a lot of AI-generated people faces in recent times, be it in papers or blogs. We have reached a stage where it is becoming increasingly difficult to distinguish between actual human faces and faces that are generated by Artificial Intelligence. In this post, I will help the reader to understand how they can create and build such applications on their own. I will try to keep this post as intuitive as possible for starters while not dumbing it down too much.


Anomaly Detection with LSTM in Keras

I read ‘anomaly’ definitions in every kind of contest, everywhere. In this caos the only truth is the variability of this definition, i.e. anomaly explanation is completely releted to the domain of interest. Detection of this kind of behavior is usefull in every business and the difficultness to detect this observations depends on the field of applications. If you are engaged in a problem of anomaly detection, which involves human activity (like prediction of sales or demand), you can take advantages from fundamental assumptions of human behaviors and plan a more efficient solution. This is exactly what we are doing in this post. We try to predict the Taxi demand in NYC in a critical time period. We formulate easy and important assumptions about human behaviors, which will permit us to detect an easy solution to forecast anomalies. All the dirty job is made by a loyalty LSTM, developed in Keras, which makes predictions and detection of anomalies at the same time!


TD3: Learning To Run With AI

This article looks at one of the most powerful and state of the art algorithms in Reinforcement Learning (RL), Twin Delayed Deep Deterministic Policy Gradients (TD3)( Fujimoto et al., 2018). By the end of this article you should have a solid understanding of what makes TD3 perform so well, be capable of implementing the algorithm yourself and use TD3 to train an agent to successfully run in the HalfCheetah environment.


The Data Fabric, Containers, Kubernetes, Knowledge-Graphs, and more…

In the last article we talked about the building blocks of a knowledge-graph, now we will go a step further and learn the basic concepts, technologies and languages we need to understand to actually build it.


Advanced Ensemble Classifiers

Ensemble is a Latin-derived word which means ‘union of parts’. The regular classifiers that are used often are prone to make errors. As much as these errors are inevitable they can be reduced with the proper construction of a learning classifier. Ensemble learning is a way of generating various base classifiers from which a new classifier is derived which performs better than any constituent classifier. These base classifiers may differ in the algorithm used, hyperparameters, representation or the training set. The key objective of the ensemble methods is to reduce bias and variance.


A Game of Words: Vectorization, Tagging, and Sentiment Analysis

Full disclosure: I haven’t watched or read Game of Thrones, but I am hoping to learn a lot about it by analyzing the text. If you would like more background about the basic text processing, you can read my other article. The text from all 5 books can be found on Kaggle. In this article I will be taking the cleaned text and using it to explain the following concepts:
• Vectorization: Bag-of-Words, TF-IDF, and Skip-Thought Vectors
• After Vectorization
• POS tagging
• Named Entity Recognition (NER)
• Chunking and Chinking
• Sentiment Analysis
• Other NLP packages


Data Science as Software: from Notebooks to Tools [Part 3]

This is the final part of the series of how to go on from Jupyter Notebooks to software solutions in Data Science. Part 1 covered the basics of setting up the working environment and data exploration. Part 2 dived deep into data pre-processing and modelling. Part 3 will deal with how you can move on from Jupyter, front end development and your daily work in the code. The overall agenda of the series is the following:
• Setting up your working environment [Part 1]
• Important modules for data exploration [Part 1]
• Machine Learning Part 1: Data pre-processing [Part 2]
• Machine Learning Part 2: Models [Part 2]
• Moving on from Jupyter [Part 3]
• Shiny stuff: when do we get a front end? [Part 3]
• Your daily work in the code: keeping standards [Part 3]


Machine Learning: Recurrent Neural Networks And Long Short Term Memory (LSTM) Python Keras Example

Recurrent neural networks have a wide array of applications. These include time series analysis, document classification, speech and voice recognition. In contrast to feedforward artificial neural networks, the predictions made by recurrent neural networks are dependent on previous predictions. To elaborate, imagine we decided to follow an exercise routine where, every day, we alternate between lifting weights, swimming and yoga. We could then build a recurrent neural network to predict today’s workout given what we did yesterday. For example, if we lifted weights yesterday then we’d go swimming today. More often than not, the problems you’ll be tackling in the real world will be a function of the current state as well as other inputs. For instance, suppose we signed up for hockey once a week. If we’re playing hockey on the same day that we’re supposed to lift weights then we might decide to skip the gym. Thus, our model now has to differentiate between the case when we attended a yoga class yesterday and we’re not playing hockey as well as the case when we attended a yoga class yesterday and we’re playing hockey today in which case we’d jump directly to swimming.


Learning Like Babies

Convolutional Neural Nets (CNNs), a concept that has achieved the greatest performance for image classification, was inspired by the mammalian visual cortex system. In spite of the drastic progress in automated computer vision systems, most of the success of image classification architectures comes from labeled data. The problem is that most of the real world data is not labeled. According to Yann LeCun, father of CNNs and professor at NYU, the next ‘big thing’ in artificial intelligence is semi-supervised learning – a type of machine learning task that makes use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data. That is why recently a large research effort has been focused on unsupervised learning without leveraging a large amount of expensive supervision.


Rethinking the Data Science Life Cycle for the Enterprise

While the technology and tools used by data scientists have grown dramatically, the data science lifecycle has stagnated. In fact, little has changed between the earliest versions of CRISP-DM created over 20 years ago and the more recent lifecycles offered by leading vendors such as Google, Microsoft, and DataRobot. Most versions of the data science lifecycle still address the same set of tasks: understanding the business problem, understanding domain data, acquiring and engineering data, model development and training, and model deployment and monitoring (see Figure 1). But enterprise needs have evolved as data science has become embedded in most companies. Today, model reproducibility, traceability, verifiability has become a fundamental requirement for data science in large enterprises. Unfortunately, these requirements are omitted or significantly underplayed in leading AI/ML lifecycles.


Deep Learning for Sentiment Analysis

Sentiment Analysis is a classic example of machine learning, which (in a nutshell) refers to: ‘A way of ‘learning’ that enables algorithms to evolve.’ This ‘learning’ means feeding the algorithm with a massive amount of data so that it can adjust itself and continually improve.’ Sentiment analysis is the automated process of understanding an opinion about a given subject from written or spoken language. In a world where we generate 2.5 quintillion bytes of data every day, sentiment analysis has become a key tool for making sense of that data. This has allowed companies to get key insights and automate all kind of processes.


Regression with Regularization Techniques.

The article assumes that you have some brief idea about the regression techniques that could predict the required variable from a stratified and equitable distribution of records in a dataset that are implemented by a statistical approach. Just kidding! All you need is adequate math to be able to understand basic graphs. Before entering the topic, a little brush up…


K-Medoids Clustering Using ATS: Unleashing the Power of Templates

k-medoids clustering is a classical clustering machine learning algorithm. It is a sort of generalization of the k-means algorithm. The only difference is that cluster centers can only be one of the elements of the dataset, this yields an algorithm which can use any type of distance function whereas k-means only provably converges using the L2-norm.


How to stop training a neural-network using callback?

Often, when training a very deep neural network, we want to stop training once the training accuracy reaches a certain desired threshold. Thus, we can achieve what we want (optimal model weights) and avoid wastage of resources (time and computation power). In this brief tutorial, let’s learn how to achieve this in Tensorflow and Keras, using the callback approach, in 4 simple steps.

If you did not already know

WeCURE google
Missing data recovery is an important and yet challenging problem in imaging and data science. Successful models often adopt certain carefully chosen regularization. Recently, the low dimension manifold model (LDMM) was introduced by S.Osher et al. and shown effective in image inpainting. They observed that enforcing low dimensionality on image patch manifold serves as a good image regularizer. In this paper, we observe that having only the low dimension manifold regularization is not enough sometimes, and we need smoothness as well. For that, we introduce a new regularization by combining the low dimension manifold regularization with a higher order Curvature Regularization, and we call this new regularization CURE for short. The key step of solving CURE is to solve a biharmonic equation on a manifold. We further introduce a weighted version of CURE, called WeCURE, in a similar manner as the weighted nonlocal Laplacian (WNLL) method. Numerical experiments for image inpainting and semi-supervised learning show that the proposed CURE and WeCURE significantly outperform LDMM and WNLL respectively. …

Recurrent Convolutional Network (RCN) google
Recently, three dimensional (3D) convolutional neural networks (CNNs) have emerged as dominant methods to capture spatiotemporal representations, by adding to pre-existing 2D CNNs a third, temporal dimension. Such 3D CNNs, however, are anti-causal (i.e., they exploit information from both the past and the future to produce feature representations, thus preventing their use in online settings), constrain the temporal reasoning horizon to the size of the temporal convolution kernel, and are not temporal resolution-preserving for video sequence-to-sequence modelling, as, e.g., in spatiotemporal action detection. To address these serious limitations, we present a new architecture for the causal/online spatiotemporal representation of videos. Namely, we propose a recurrent convolutional network (RCN), which relies on recurrence to capture the temporal context across frames at every level of network depth. Our network decomposes 3D convolutions into (1) a 2D spatial convolution component, and (2) an additional hidden state $1\times 1$ convolution applied across time. The hidden state at any time $t$ is assumed to depend on the hidden state at $t-1$ and on the current output of the spatial convolution component. As a result, the proposed network: (i) provides flexible temporal reasoning, (ii) produces causal outputs, and (iii) preserves temporal resolution. Our experiments on the large-scale large ‘Kinetics’ dataset show that the proposed method achieves superior performance compared to 3D CNNs, while being causal and using fewer parameters. …

Parallelizable Stack Long Short-Term Memory google
Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation. …

Social Relationship Graph Generation Network (SRG-GN) google
Socially-intelligent agents are of growing interest in artificial intelligence. To this end, we need systems that can understand social relationships in diverse social contexts. Inferring the social context in a given visual scene not only involves recognizing objects, but also demands a more in-depth understanding of the relationships and attributes of the people involved. To achieve this, one computational approach for representing human relationships and attributes is to use an explicit knowledge graph, which allows for high-level reasoning. We introduce a novel end-to-end-trainable neural network that is capable of generating a Social Relationship Graph – a structured, unified representation of social relationships and attributes – from a given input image. Our Social Relationship Graph Generation Network (SRG-GN) is the first to use memory cells like Gated Recurrent Units (GRUs) to iteratively update the social relationship states in a graph using scene and attribute context. The neural network exploits the recurrent connections among the GRUs to implement message passing between nodes and edges in the graph, and results in significant improvement over previous methods for social relationship recognition. …

Book Memo: “Managing Your Data Science Projects”

Learn Salesmanship, Presentation, and Maintenance of Completed Models
At first glance, the skills required to work in the data science field appear to be self-explanatory. Do not be fooled. Impactful data science demands an interdisciplinary knowledge of business philosophy, project management, salesmanship, presentation, and more. In Managing Your Data Science Projects, author Robert de Graaf explores important concepts that are frequently overlooked in much of the instructional literature that is available to data scientists new to the field. If your completed models are to be used and maintained most effectively, you must be able to present and sell them within your organization in a compelling way.

R Packages worth a look

Composite Grid Gaussian Processes (CGGP)
Run computer experiments using the adaptive composite grid algorithm with a Gaussian process model. The algorithm works best when running an experiment …

Simple Jenkins Client (jenkins)
Manage jobs and builds on your Jenkins CI server <https://…/>. Create and edit projects, s …

Download and Load Various Text Datasets (textdata)
Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled …

Robust Backfitting (RBF)
A robust backfitting algorithm for additive models based on (robust) local polynomial kernel smoothers. It includes both bounded and re-descending (ker …

Document worth reading: “A Survey of the Recent Architectures of Deep Convolutional Neural Networks”

Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs. A Survey of the Recent Architectures of Deep Convolutional Neural Networks