Distilled News

Using Mixed-Effects Models For Linear Regression

Mixed-effects regression models are a powerful tool for linear regression models when your data contains global and group-level trends. This article walks through an example using fictitious data relating exercise to mood to introduce this concept. R has had an undeserved rough time in the news lately, so this post will use R as a small condolence to the language, though a robust framework exist in Python as well. Mixed-effect models are common in political polling analysis where national-level characteristics are assumed to occur at a state-level while state-level sample sizes may be too small to drive those characteristics on their own. They are also common in scientific experiments where a given effect is assumed to be present among all study individuals which needs to be teased out from a specific effect on a treatment group. In a similar vein, this framework can be helpful in pre/post studies of interventions.

Applying the Universal Machine Learning Workflow to the UCI Mushroom Dataset

This post is intended to demonstrate the universal machine learning workflow as stated by Francois Chollet in Deep Learning with Python.

How To Find Probability From Probability Density Plots

To understand the distribution of data by knowing the actual probability within a range.

An Inutitive Understanding to Fader Networks

A fader network is a neural network that gives you control over a set of attributes of a data item. In this blog, we will talk about fader networks controlling attributes on images as they are the most intuitive to understand.

LSTM Autoencoder for Extreme Rare Event Classification in Keras

Here we will learn the details of data preparation for LSTM models, and build an LSTM Autoencoder for rare-event classification.

DeepFool – A simple and accurate method to fool Deep Neural Networks.

Deep Neural Networks achieve state of the art performances in many tasks but fail miserably on slightly perturbed images, perturbed in a meaningful way (and not randomly).
The DeepFool paper have the following major contributions:
1.Simple and accurate method for computing the robustness of different classifiers to adversarial perturbations.
2.Experiments showing
• DeepFool computes a more optimal adversarial perturbation
• Adversarial Training significantly increases the robustness.

Predicting Wine Quality with Gradient Boosting Machines

This article gives a tutorial for training a Gradient Boosting Machine (GBM) model that predicts the quality of a wine using only the price and a written description of the wine. In many real Data Science projects, it is important to take the projects into the next phase by deploying a live model that others can use to get real-time predictions. In this article, I demonstrate the model training and deployment process using AWS SageMaker to create a live endpoint.

The Hitchhiker’s Guide to Feature Extraction

Good Features are the backbone of any machine learning model. And good feature creation often needs domain knowledge, creativity, and lots of time. In this post, I am going to talk about:
• Various methods of feature creation- Both Automated and manual
• Different Ways to handle categorical features
• Longitude and Latitude features
• Some kaggle tricks
• And some other ideas to think about feature creation.
This post is about useful feature engineering methods and tricks that I have learned and end up using often.

Image Classification with Tensorflow 2.0

In this article, we will address one of the most common AI problems. Here is the scenario: you’re surfing the web at a bar enjoying a beer and some chicken wings when you start to wonder ‘could I write an image classifier using Tensorflow 2.0 and Transfer Learning?’ This article will show an end to end process for accomplishing this. Note, I will not be going into how the model works, that would be a whole other article. This article will show you how to get up and running with Tensorflow 2.0. Once we have that we can go back and play with the params that might make the code more efficient.

An illustrated guide to AI for the handsome corporate executive

A lot of companies are making headways with machine learning (ML) these days. The leaps in image recognition and natural language processing enabled by recent advancements in deep learning and massive increases in available data have put AI* on the corporate radar almost everywhere. ML has outperformed humans and enabled companies to process service requests at a scale quite simply not possible with a 100% human service desk. It has seen successful applications in domains ranging from fraud detection to medical image analysis, from machine translation to personalised recommendations. Since quite a number of the headlines in ML are made by the GAFA lot, it is only natural that you as an executive or service manager are looking towards them when you start out exploring applications of artificial intelligence for your business. Besides the obvious issues in following this approach – your company is probably still a largely offline endeavour – there are some less-discussed issues I want to address in this post.

Handtrack.js: Hand Tracking Interactions in the Browser using Tensorflow.js and 3 lines of code.

A while ago, I was really blown away by results from an experiment using TensorFlow object detection api to track hands in an image. I made the trained model and source code available, and since then it has been used to prototype some rather interesting usecases (a tool to help kids spell, extensions to predict sign language, hand ping pong, etc). However, while many individuals wanted to experiment with the trained model, a large number still had issues setting up Tensorflow (installation, TF version issues, exporting graphs, etc). Luckily, Tensorflow.js addresses several of these installations/distribution issues, as it is optimized to run in the standardized environment of browsers. To this end, I created Handtrack.js as a library to allow developers quickly prototype hand/gesture interactions powered by a trained hand detection model.

Finding Bayesian Legos – Part 2

Last time on Finding Bayesian Legos, my dear friend Joe and I were trying to find a way to help Joe gain confidence that he has indeed picked-up all of the Legos® his children left behind. My first proposal was that we determine how good Joe is at picking up Legos by performing experiments on Joe to determine the probability he would pick a single Lego in any given sweep of the room where the Legos have been left laying. Joe said, ‘No go on this plan! I really dislike picking up Legos. Can’t you find a better way?’

Review: MobileNetV2 – Light Weight Model (Image Classification)

In this story, MobileNetV2, by Google, is briefly reviewed. In the previous version MobileNetV1, Depthwise Separable Convolution is introduced which dramatically reduce the complexity cost and model size of the network, which is suitable to Mobile devices, or any devices with low computational power. In MobileNetV2, a better module is introduced with inverted resiaul structure. Non-linearities in narrow layers are removed this time. With MobileNetV2 as backbone for feature extraction, state-of-the-art performances are also achieved for object detection and semantic segmentation. This is a paper in 2018 CVPR with more than 200 citations. (Sik-Ho Tsang @ Medium)

Text-based Graph Convolutional Network – Bible Book Classification

In this article, I will walk you through the details of text-based Graph Convolutional Network (GCN) and its implementation using PyTorch and standard libraries. The text-based GCN model is an interesting and novel state-of-the-art semi-supervised learning concept that was proposed recently (expanding upon the previous GCN idea by Kipf et al. on non-textual data) which is able to very accurately infer the labels of some unknown textual data given related known labeled textual data. At the highest level, it does so by embedding the entire corpus into a single graph with documents (some labelled and some unlabelled) and words as nodes, with each document-word & word-word edges having some predetermined weights based on their relationships with each other (eg. Tf-idf). A GCN is then trained on this graph with documents nodes that have known labels, and the trained GCN model is then used to infer the labels of unlabelled documents.

ML Approaches for Time Series

In this post I play around with some Machine Learning techniques to analyze time series data and explore their potential use in this case of scenarios. In this first post only the first point of the index is developed. The rest have a separate post which can be accessed from the index.

OCR for Scanned Numbers using Google’s AutoML Vision

In a previous blog post, I detailed the steps I used to create an OCR for scanned numbers, such as the one shown in the figure below, by training a k-Nearest Neighbour (k-NN) machine learning model on a set of features extracted from images (numbers) using Histogram of Oriented Gradients (HOG).

Whats new on arXiv

Ensemble Super-Resolution with A Reference Dataset

By developing sophisticated image priors or designing deep(er) architectures, a variety of image Super-Resolution (SR) approaches have been proposed recently and achieved very promising performance. A natural question that arises is whether these methods can be reformulated into a unifying framework and whether this framework assists in SR reconstruction? In this paper, we present a simple but effective single image SR method based on ensemble learning, which can produce a better performance than that could be obtained from any of SR methods to be ensembled (or called component super-resolvers). Based on the assumption that better component super-resolver should have larger ensemble weight when performing SR reconstruction, we present a Maximum A Posteriori (MAP) estimation framework for the inference of optimal ensemble weights. Specially, we introduce a reference dataset, which is composed of High-Resolution (HR) and Low-Resolution (LR) image pairs, to measure the super-resolution abilities (prior knowledge) of different component super-resolvers. To obtain the optimal ensemble weights, we propose to incorporate the reconstruction constraint, which states that the degenerated HR image should be equal to the LR observation one, as well as the prior knowledge of ensemble weights into the MAP estimation framework. Moreover, the proposed optimization problem can be solved by an analytical solution. We study the performance of the proposed method by comparing with different competitive approaches, including four state-of-the-art non-deep learning based methods, four latest deep learning based methods and one ensemble learning based method, and prove its effectiveness and superiority on three public datasets.

A New Look at an Old Problem: A Universal Learning Approach to Linear Regression

Linear regression is a classical paradigm in statistics. A new look at it is provided via the lens of universal learning. In applying universal learning to linear regression the hypotheses class represents the label y\in {\cal R} as a linear combination of the feature vector x^T\theta where x\in {\cal R}^M, within a Gaussian error. The Predictive Normalized Maximum Likelihood (pNML) solution for universal learning of individual data can be expressed analytically in this case, as well as its associated learnability measure. Interestingly, the situation where the number of parameters M may even be larger than the number of training samples N can be examined. As expected, in this case learnability cannot be attained in every situation; nevertheless, if the test vector resides mostly in a subspace spanned by the eigenvectors associated with the large eigenvalues of the empirical correlation matrix of the training data, linear regression can generalize despite the fact that it uses an “over-parametrized” model. We demonstrate the results with a simulation of fitting a polynomial to data with a possibly large polynomial degree.

Is Volatility Rough ?

Rough volatility models are continuous time stochastic volatility models where the volatility process is driven by a fractional Brownian motion with the Hurst parameter less than half, and have attracted much attention since a seminal paper titled ‘Volatility is rough’ was posted on SSRN in 2014 claiming that they explain a scaling property of realized variance time series. From our point of view, the analysis is not satisfactory because the estimation error of the latent volatility was not taken into account; we show by simulations that it in fact results in a fake scaling property. Motivated by this preliminary finding, we construct a quasi-likelihood estimator for a fractional stochastic volatility model and apply it to realized variance time series to examine whether the volatility is really rough. Our quasi-likelihood is based on a central limit theorem for the realized volatility estimation error and a Whittle-type approximation to the auto-covariance of the log-volatility process. We prove the consistency of our estimator under high frequency asymptotics, and examine by simulations the finite sample performance of our estimator. Our empirical study suggests that the volatility is indeed rough; actually it is even rougher than considered in the literature.

Hierarchical Importance Weighted Autoencoders

Importance weighted variational inference (Burda et al., 2015) uses multiple i.i.d. samples to have a tighter variational lower bound. We believe a joint proposal has the potential of reducing the number of redundant samples, and introduce a hierarchical structure to induce correlation. The hope is that the proposals would coordinate to make up for the error made by one another to reduce the variance of the importance estimator. Theoretically, we analyze the condition under which convergence of the estimator variance can be connected to convergence of the lower bound. Empirically, we confirm that maximization of the lower bound does implicitly minimize variance. Further analysis shows that this is a result of negative correlation induced by the proposed hierarchical meta sampling scheme, and performance of inference also improves when the number of samples increases.

Similarity Grouping-Guided Neural Network Modeling for Maritime Time Series Prediction

Reliable and accurate prediction of time series plays a crucial role in maritime industry, such as economic investment, transportation planning, port planning and design, etc. The dynamic growth of maritime time series has the predominantly complex, nonlinear and non-stationary properties. To guarantee high-quality prediction performance, we propose to first adopt the empirical mode decomposition (EMD) and ensemble EMD (EEMD) methods to decompose the original time series into high- and low-frequency components. The low-frequency components can be easily predicted directly through traditional neural network (NN) methods. It is more difficult to predict high-frequency components due to their properties of weak mathematical regularity. To take advantage of the inherent self-similarities within high-frequency components, these components will be divided into several continuous small (overlapping) segments. The grouped segments with high similarities are then selected to form more proper training datasets for traditional NN methods. This regrouping strategy can assist in enhancing the prediction accuracy of high-frequency components. The final prediction result is obtained by integrating the predicted high- and low-frequency components. Our proposed three-step prediction frameworks benefit from the time series decomposition and similar segments grouping. Experiments on both port cargo throughput and vessel traffic flow have illustrated its superior performance in terms of prediction accuracy and robustness.

Quantifying and Alleviating the Language Prior Problem in Visual Question Answering

Benefiting from the advancement of computer vision, natural language processing and information retrieval techniques, visual question answering (VQA), which aims to answer questions about an image or a video, has received lots of attentions over the past few years. Although some progress has been achieved so far, several studies have pointed out that current VQA models are heavily affected by the \emph{language prior problem}, which means they tend to answer questions based on the co-occurrence patterns of question keywords (e.g., how many) and answers (e.g., 2) instead of understanding images and questions. Existing methods attempt to solve this problem by either balancing the biased datasets or forcing models to better understand images. However, only marginal effects and even performance deterioration are observed for the first and second solution, respectively. In addition, another important issue is the lack of measurement to quantitatively measure the extent of the language prior effect, which severely hinders the advancement of related techniques. In this paper, we make contributions to solve the above problems from two perspectives. Firstly, we design a metric to quantitatively measure the language prior effect of VQA models. The proposed metric has been demonstrated to be effective in our empirical studies. Secondly, we propose a regularization method (i.e., score regularization module) to enhance current VQA models by alleviating the language prior problem as well as boosting the backbone model performance. The proposed score regularization module adopts a pair-wise learning strategy, which makes the VQA models answer the question based on the reasoning of the image (upon this question) instead of basing on question-answer patterns observed in the biased training set. The score regularization module is flexible to be integrated into various VQA models.

Streaming Algorithms for Bin Packing and Vector Scheduling

Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional online and offline cases, but have been less well-studied when the volume of the input is truly massive, and cannot even be read into memory. This is captured by the streaming model of computation, where the aim is to approximate the cost of the solution in one pass over the data, using small space. As a result, streaming algorithms produce concise input summaries that approximately preserve the optimum value. We design the first efficient streaming algorithms for these fundamental problems in combinatorial optimization. For Bin Packing, we provide a streaming asymptotic 1+\varepsilon-approximation with \widetilde{O}\left(\frac{1}{\varepsilon}\right) memory, where \widetilde{O} hides logarithmic factors. Moreover, such a space bound is essentially optimal. Our algorithm implies a streaming d+\varepsilon-approximation for Vector Bin Packing in d dimensions, running in space \widetilde{O}\left(\frac{d}{\varepsilon}\right). For the related Vector Scheduling problem, we show how to construct an input summary in space \widetilde{O}(d^2\cdot m / \varepsilon^2) that preserves the optimum value up to a factor of 2 - \frac{1}{m} +\varepsilon, where m is the number of identical machines.

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (\eg leg as opposed to head of a person), thereby letting the network generalize better and have better object localization capabilities. On the other hand, current methods for regional dropout removes informative pixels on training images by overlaying a patch of either black pixels or random noise. {Such removal is not desirable because it leads to information loss and inefficiency during training.} We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and \mbox{retaining} the regularization effect of regional dropout, CutMix consistently outperforms the state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on the ImageNet weakly-supervised localization task. Moreover, unlike previous augmentation methods, our CutMix-trained ImageNet classifier, when used as a pretrained model, results in consistent performance gains in Pascal detection and MS-COCO image captioning benchmarks. We also show that CutMix improves the model robustness against input corruptions and its out-of-distribution detection performances.

The Price of Fairness for Indivisible Goods

We investigate the efficiency of fair allocations of indivisible goods using the well-studied price of fairness concept. Previous work has focused on classical fairness notions such as envy-freeness, proportionality, and equitability. However, these notions cannot always be satisfied for indivisible goods, leading to certain instances being ignored in the analysis. In this paper, we focus instead on notions with guaranteed existence, including envy-freeness up to one good (EF1), balancedness, maximum Nash welfare (MNW), and leximin. We mostly provide tight or asymptotically tight bounds on the worst-case efficiency loss for allocations satisfying these notions.

Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs

We study the problem of knowledge graph (KG) embedding. A widely-established assumption to this problem is that similar entities are likely to have similar relational roles. However, existing related methods derive KG embeddings mainly based on triple-level learning, which lack the capability of capturing long-term relational dependencies of entities. Moreover, triple-level learning is insufficient for the propagation of semantic information among entities, especially for the case of cross-KG embedding. In this paper, we propose recurrent skipping networks (RSNs), which employ a skipping mechanism to bridge the gaps between entities. RSNs integrate recurrent neural networks (RNNs) with residual learning to efficiently capture the long-term relational dependencies within and between KGs. We design an end-to-end framework to support RSNs on different tasks. Our experimental results showed that RSNs outperformed state-of-the-art embedding-based methods for entity alignment and achieved competitive performance for KG completion.

BayesNAS: A Bayesian Approach for Neural Architecture Search

One-Shot Neural Architecture Search (NAS) is a promising method to significantly reduce search time without any separate training. It can be treated as a Network Compression problem on the architecture parameters from an over-parameterized network. However, there are two issues associated with most one-shot NAS methods. First, dependencies between a node and its predecessors and successors are often disregarded which result in improper treatment over zero operations. Second, architecture parameters pruning based on their magnitude is questionable. In this paper, we employ the classic Bayesian learning approach to alleviate these two issues by modeling architecture parameters using hierarchical automatic relevance determination (HARD) priors. Unlike other NAS methods, we train the over-parameterized network for only one epoch then update the architecture. Impressively, this enabled us to find the architecture in both proxy and proxyless tasks on CIFAR-10 within only 0.2 GPU days using a single GPU. As a byproduct, our approach can be transferred directly to compress convolutional neural networks by enforcing structural sparsity which achieves extremely sparse networks without accuracy deterioration.

Introduction to StarNEig — A Task-based Library for Solving Nonsymmetric Eigenvalue Problems

In this paper, we present the StarNEig library for solving dense non-symmetric (generalized) eigenvalue problems. The library is built on top of the StarPU runtime system and targets both shared and distributed memory machines. Some components of the library support GPUs. The library is currently in an early beta state and only real arithmetic is supported. Support for complex data types is planned for a future release. This paper is aimed for potential users of the library. We describe the design choices and capabilities of the library, and contrast them to existing software such as ScaLAPACK. StarNEig implements a ScaLAPACK compatibility layer that should make it easy for a new user to transition to StarNEig. We demonstrate the performance of the library with a small set of computational experiments.

Modelling Instance-Level Annotator Reliability for Natural Language Labelling Tasks

When constructing models that learn from noisy labels produced by multiple annotators, it is important to accurately estimate the reliability of annotators. Annotators may provide labels of inconsistent quality due to their varying expertise and reliability in a domain. Previous studies have mostly focused on estimating each annotator’s overall reliability on the entire annotation task. However, in practice, the reliability of an annotator may depend on each specific instance. Only a limited number of studies have investigated modelling per-instance reliability and these only considered binary labels. In this paper, we propose an unsupervised model which can handle both binary and multi-class labels. It can automatically estimate the per-instance reliability of each annotator and the correct label for each instance. We specify our model as a probabilistic model which incorporates neural networks to model the dependency between latent variables and instances. For evaluation, the proposed method is applied to both synthetic and real data, including two labelling tasks: text classification and textual entailment. Experimental results demonstrate our novel method can not only accurately estimate the reliability of annotators across different instances, but also achieve superior performance in predicting the correct labels and detecting the least reliable annotators compared to state-of-the-art baselines.

A Review of Keyphrase Extraction

Automated keyphrase extraction is a crucial textual information processing task regarding the most types of digital content management systems. It concerns the selection of representative and characteristic phrases from a document that express all aspects related to its content. This article introduces the task of keyphrase extraction and provides a view of existing work that is well organized and comprehensive. Moreover, it discusses the different evaluation approaches giving meaningful insights and highlighting open issues. Finally, a comparative experimental study for popular unsupervised techniques on five datasets is presented.

Learning to Search Efficiently Using Comparisons

We consider the problem of searching in a set of items by using pairwise comparisons. We aim to locate a target item t by asking an oracle questions of the form ‘Which item from the pair (i,j) is more similar to t?’. We assume a blind setting, where no item features are available to guide the search process; only the oracle sees the features in order to generate an answer. Previous approaches for this problem either assume noiseless answers, or they scale poorly in the number of items, both of which preclude practical applications. In this paper, we present a new scalable learning framework called learn2search that performs efficient comparison-based search on a set of items despite the presence of noise in the answers. Items live in a space of latent features, and we posit a probabilistic model for the oracle comparing two items i and j with respect to a target t. Our algorithm maintains its own representation of the space of items, which it learns incrementally based on past searches. We evaluate the performance of learn2search on both synthetic and real-world data, and show that it learns to search more and more efficiently, over time matching the performance of a scheme with access to the item features.

Multi-View Multiple Clustering

Multiple clustering aims at exploring alternative clusterings to organize the data into meaningful groups from different perspectives. Existing multiple clustering algorithms are designed for single-view data. We assume that the individuality and commonality of multi-view data can be leveraged to generate high-quality and diverse clusterings. To this end, we propose a novel multi-view multiple clustering (MVMC) algorithm. MVMC first adapts multi-view self-representation learning to explore the individuality encoding matrices and the shared commonality matrix of multi-view data. It additionally reduces the redundancy (i.e., enhancing the individuality) among the matrices using the Hilbert-Schmidt Independence Criterion (HSIC), and collects shared information by forcing the shared matrix to be smooth across all views. It then uses matrix factorization on the individual matrices, along with the shared matrix, to generate diverse clusterings of high-quality. We further extend multiple co-clustering on multi-view data and propose a solution called multi-view multiple co-clustering (MVMCC). Our empirical study shows that MVMC (MVMCC) can exploit multi-view data to generate multiple high-quality and diverse clusterings (co-clusterings), with superior performance to the state-of-the-art methods.

Object Detection in 20 Years: A Survey

Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today’s object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century’s time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.

Multi-View Multi-Instance Multi-Label Learning based on Collaborative Matrix Factorization

Multi-view Multi-instance Multi-label Learning(M3L) deals with complex objects encompassing diverse instances, represented with different feature views, and annotated with multiple labels. Existing M3L solutions only partially explore the inter or intra relations between objects (or bags), instances, and labels, which can convey important contextual information for M3L. As such, they may have a compromised performance. In this paper, we propose a collaborative matrix factorization based solution called M3Lcmf. M3Lcmf first uses a heterogeneous network composed of nodes of bags, instances, and labels, to encode different types of relations via multiple relational data matrices. To preserve the intrinsic structure of the data matrices, M3Lcmf collaboratively factorizes them into low-rank matrices, explores the latent relationships between bags, instances, and labels, and selectively merges the data matrices. An aggregation scheme is further introduced to aggregate the instance-level labels into bag-level and to guide the factorization. An empirical study on benchmark datasets show that M3Lcmf outperforms other related competitive solutions both in the instance-level and bag-level prediction.

Partially Specified Space Time Autoregressive Model with Artificial Neural Network

The space time autoregressive model has been widely applied in science, in areas such as economics, public finance, political science, agricultural economics, environmental studies and transportation analyses. The classical space time autoregressive model is a linear model for describing spatial correlation. In this work, we expand the classical model to include related exogenous variables, possibly non-Gaussian, high volatility errors, and a nonlinear neural network component. The nonlinear neural network component allows for more model flexibility, the ability to learn and model nonlinear and complex relationships. We use a maximum likelihood approach for model parameter estimation. We establish consistency and asymptotic normality for these estimators under some standard conditions on the space time model and neural network component. We investigate the quality of the asymptotic approximations for finite samples by means of numerical simulation studies. For illustration, we include a real world application.

Fast Proper Orthogonal Decomposition Using Improved Sampling and Iterative Techniques for Singular Value Decomposition

Proper Orthogonal Decomposition (POD), also known as Principal component analysis (PCA), is a dimensionality reduction technique used to capture the energetically dominant features of datasets, known as eigenfeatures or POD modes. These modes can be obtained by finding a low rank approximation of the data matrix using singular value decomposition (SVD). In this paper, we explore random sampling techniques, which is one approach to obtain an approximate low rank description of the dataset in a computationally efficient manner. We analyse the performance of two algorithms proposed by [1], namely LTSVD and CTSVD, and discuss their advantages and limitations. We modify the two algorithms to improve the runtime of the methods and prove the equivalence of our modified algorithms with LTSVD and CTSVD. The modifications we propose are independent of sampling probability distributions and can be used to improve runtime whenever sampling is done with replacement. We use the modified algorithms along with a previously proposed merging technique to obtain the SVD of large matrices that do not fit in memory. We also propose an iterative algorithm to improve the approximation of the POD modes and the subspace spanned by the modes. Unlike the previous methods for multiple rounds of sampling, we obtain an updated approximation to the POD modes in each iteration and stop when the modes or subspace spanned by the modes have converged. The performance of our proposed solutions is analysed using four datasets of various sizes for single and multi-threaded execution. In all cases, we obtain a significant speedup over using a truncated SVD. The speedup of our modified LTSVD and CTSVD algorithms with respect to the existing algorithms depends on the error parameters. For low values of error parameters, we get upto 2-3x speedup. The results of the iterative algorithms have significantly better accuracies.

Federated Multi-task Hierarchical Attention Model for Sensor Analytics

Sensors are an integral part of modern Internet of Things (IoT) applications. There is a critical need for the analysis of heterogeneous multivariate temporal data obtained from the individual sensors of these systems. In this paper we particularly focus on the problem of the scarce amount of training data available per sensor. We propose a novel federated multi-task hierarchical attention model (FATHOM) that jointly trains classification/regression models from multiple sensors. The attention mechanism of the proposed model seeks to extract feature representations from the input and learn a shared representation focused on time dimensions across multiple sensors. The underlying temporal and non-linear relationships are modeled using a combination of attention mechanism and long-short term memory (LSTM) networks. We find that our proposed method outperforms a wide range of competitive baselines in both classification and regression settings on activity recognition and environment monitoring datasets. We further provide visualization of feature representations learned by our model at the input sensor level and central time level.

R Packages worth a look

Toolkit to Validate New Data for a Predictive Model (recorder)
A lightweight toolkit to validate new observations when computing their predictions with a predictive model. The validation process consists of two ste …

Simplifies Exploratory Data Analysis (explore)
Interactive data exploration with one line of code or use a easy to remember set of tidy functions. Introduces two main verbs for data analysis. descri …

Amazon Web Services Networking & Content Delivery APIs (paws.networking)
Interface to Amazon Web Services networking and content delivery APIs, including ‘Route 53’ Domain Name System service, ‘CloudFront’ content delivery, …

Structural Equation Modeling Tables (semTable)
For confirmatory factor analysis (‘CFA’) and structural equation models (‘SEM’) estimated with the ‘lavaan’ package, this package provides functions to …

A Monte Carlo Based Heterogeneity Test for Meta-Analysis (mc.heterogeneity)
Implements a Monte Carlo Based Heterogeneity Test for standardized mean differences (d), Fisher-transformed Pearson’s correlations (r), and natural-log …

Plot ForestGEO Data (fgeo.plot)
To help you access, transform, analyze, and visualize ForestGEO data, we developed a collection of R packages (<<a href="

Book Memo: “Probabilistic Data Structures and Algorithms for Big Data Applications”

A technical book about popular space-efficient data structures and fast algorithms that are extremely useful in modern Big Data applications. The purpose of this book is to introduce technology practitioners, including software architects and developers, as well as technology decision makers to probabilistic data structures and algorithms. Reading this book, you will get a theoretical and practical understanding of probabilistic data structures and learn about their common uses.

If you did not already know

Neural Attentive Item Similarity Model (NAIS) google
Item-to-item collaborative filtering (aka. item-based CF) has been long used for building recommender systems in industrial settings, owing to its interpretability and efficiency in real-time personalization. It builds a user’s profile as her historically interacted items, recommending new items that are similar to the user’s profile. As such, the key to an item-based CF method is in the estimation of item similarities. Early approaches use statistical measures such as cosine similarity and Pearson coefficient to estimate item similarities, which are less accurate since they lack tailored optimization for the recommendation task. In recent years, several works attempt to learn item similarities from data, by expressing the similarity as an underlying model and estimating model parameters by optimizing a recommendation-aware objective function. While extensive efforts have been made to use shallow linear models for learning item similarities, there has been relatively less work exploring nonlinear neural network models for item-based CF. In this work, we propose a neural network model named Neural Attentive Item Similarity model (NAIS) for item-based CF. The key to our design of NAIS is an attention network, which is capable of distinguishing which historical items in a user profile are more important for a prediction. Compared to the state-of-the-art item-based CF method Factored Item Similarity Model (FISM), our NAIS has stronger representation power with only a few additional parameters brought by the attention network. Extensive experiments on two public benchmarks demonstrate the effectiveness of NAIS. This work is the first attempt that designs neural network models for item-based CF, opening up new research possibilities for future developments of neural recommender systems. …

Verity google
Integrity and security of the data in database systems are typically maintained with access control policies and firewalls. However, insider attacks — where someone with an intimate knowledge of the system and administrative privileges tampers with the data — pose a unique challenge. Measures like append only logging prove to be insufficient because an attacker with administrative privileges can alter logs and login records to eliminate the trace of attack, thus making insider attacks hard to detect. In this paper, we propose Verity — first of a kind system to the best of our knowledge. Verity serves as a dataless framework by which any blockchain network can be used to store fixed-length metadata about tuples from any SQL database, without complete migration of the database. Verity uses a formalism for parsing SQL queries and query results to check the respective tuples’ integrity using blockchains to detect insider attacks. We have implemented our technique using Hyperledger Fabric, Composer REST API, and SQLite database. Using TPC-H data and SQL queries of varying complexity and types, our experiments demonstrate that any overhead of integrity checking remains constant per tuple in a query’s results, and scales linearly. …

Spider google
We present Spider, a large-scale, complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables, covering 138 different domains. We define a new complex and cross-domain semantic parsing and text-to-SQL task where different complex SQL queries and databases appear in train and test sets. In this way, the task requires the model to generalize well to both new SQL queries and new database schemas. Spider is distinct from most of the previous semantic parsing tasks because they all use a single database and the exact same programs in the train set and the test set. We experiment with various state-of-the-art models and the best model achieves only 14.3% exact matching accuracy on a database split setting. This shows that Spider presents a strong challenge for future research. Our dataset and task are publicly available at https://…/spider.

Distilled News

Recurrence in biological and artificial neural networks

Recurrence is an overloaded term in the context of neural networks, with disparate colloquial meanings in the machine learning and the neuroscience communities. The difference is narrowing, however, as the artificial neural networks (ANNs) used for practical applications are increasingly sophisticated and more like biological neural networks (BNNs) in some ways (yet still vastly different on the whole). In this post we’ll highlight the historic differences in the use of term recurrence within these two communities, highlight some fairly recent deep learning ANN models that creep towards the neuroscience, point to some neuroscience studies that shine light on the function of recurrence, and speculate on future advancements.

Free Book: Classification and Regression In a Weekend

This tutorial began as a series of weekend workshops created by Ajit Jaokar and Dan Howarth. The idea was to work with a specific (longish) program such that we explore as much of it as possible in one weekend. This book is an attempt to take this idea online. The best way to use this book is to work with the Python code as much as you can. The code has comments. But you can extend the comments by the concepts explained here.

Master Your Hypothesis Test

A tutorial on Power, Bootstrapping, Sample Selection, and Outcome Analysis.

Time Series Analysis, Visualization & Forecasting with LSTM

Statistics normality test, Dickey-Fuller test for stationarity, Long short-term memory.

Using Data Cubes with R

Data cubes are a popular way to display multidimensional data. This makes the method suitable for big data. Giving the incredible growth of data it is natural that the method have become increasingly popular. In this article you learn to use R for data cubes.

Understanding Objective Functions in Deep Learning

Data has consumed our day to day lives. The amount of data that’s is available in the web or from other variety of sources is more than enough to get an idea about any entity. The past few years has seen exponential rise in the volume which has resulted into the adaptation of the term Big Data. Most of these generated data are unstructured and could up in any format. Previously computers were not equipped to understand such unstructured data but modern computers coupled with some programs are able to mind such data and extract relevant information from it which has certainly helped many business. Machine Learning is the study of predictive analytics where the structured or unstructured data are analysed and new results are predicted after the model is trained to learn the patterns from historical data. There are several pre-programmed Machine Learning algorithms which helps in building the model and the choice of the algorithm to be used completely depends on the problem statement, the architecture and the relationship among the variables. However, the traditional state-of-the-art Machine Learning algorithms like Support Vector Machines, Logistic Regression, Random Forest, etc., often lacks efficiency when the size of the data increases. This problem is resolved by the advent of Deep Learning which is a sub-field of Machine Learning. The idea behind Deep Learning is more or less akin to our brain. The neural networks in Deep Learning works almost similarly to the neurons in the human brain.

Time Series Forecasting with TensorFlow.js

Pull stock prices from online API and perform predictions using Recurrent Neural Network & Long Short Term Memory (LSTM) with TensorFlow.js framework.

7 Steps to Mastering SQL for Data Science – 2019 Edition

Follow these updated 7 steps to go from SQL data science newbie to practitioner in a hurry. We consider only the necessary concepts and skills, and provide quality resources for each.

Automating Trading and Market Making With Artificial Intelligence

The goal is to capture information in a market’s order books and use that information to predict market movement/direction. That prediction can enable repricing of orders and more efficient market making. Such an approach allows the market maker to provide liquidity whilst making profits at the same time. Market makers are essential to modern markets. They provide the markets with necessary liquidity and make sure the bid/ask spread is reasonably narrow to allow efficient purchasing. This can be taken a step further by market makers that do more than simply provide a constant bidding and asking price. Some market makers trade at higher frequencies and constantly take advantage of inefficiencies as well as small swings in asset prices.

One of the most fascinating advancements in the world of machine learning, is the development of abilities to teach a machine how to understand human communication. This very arm of machine learning is called as Natural Language Processing. This post is an attempt at explaining the basics of Natural Language Processing and how a rapid progress has been made in it with the advancements of deep learning and neural networks.

Recommendation Systems in the Real world

An overview of the process of designing and building a recommendation system pipeline.

PyCharm for Data Scientists

I have recently started using PyCharm as an alternative to Spyder, and am loving it. This article talks about some of the features of PyCharm that made me completely transition to PyCharm from Spyder. The below features are in comparison with Spyder, and not general IDEs.

60+ useful graph visualization libraries

We outline 60+ graph visualization libraries that allow users to build applications to display and interact with network representations of data.

Confidence Intervals in One Picture

Confidence intervals (CIs) tell you how much uncertainty a statistic has. The intervals are connected to confidence levels and the two terms are easily confused, especially if you’re new to statistics. Confidence Intervals in One Picture is an intro to CIs, and explains how each part interacts with margins of error and where the different components come from.

Synchronous Kernels-only Competitions: Real ML in Real Time

We are pleased to share that we now support a general synchronous Kernels-only (KO) format: when you submit a Kernel, Kaggle will run the code against both the public test set and a withheld private test set in real time. To kick things off, you’re invited to join Instant Gratification, our first synchronous Kernels-only competition using our new framework.

Document worth reading: “When deep learning meets security”

Deep learning is an emerging research field that has proven its effectiveness towards deploying more efficient intelligent systems. Security, on the other hand, is one of the most essential issues in modern communication systems. Recently many papers have shown that using deep learning models can achieve promising results when applied to the security domain. In this work, we provide an overview for the recent studies that apply deep learning techniques to the field of security. When deep learning meets security

Whats new on arXiv

Fast and Secure Distributed Learning in High Dimension

Modern machine learning is distributed and the work of several machines is typically aggregated by \emph{averaging} which is the optimal rule in terms of speed, offering a speedup of n (with respect to using a single machine) when n processes are learning together. Distributing data and models poses however fundamental vulnerabilities, be they to software bugs, asynchrony, or worse, to malicious attackers controlling some machines or injecting misleading data in the network. Such behavior is best modeled as Byzantine failures, and averaging does not tolerate a single one from a worker. Krum, the first provably Byzantine resilient aggregation rule for SGD only uses one worker per step, which hampers its speed of convergence, especially in best case conditions when none of the workers is actually Byzantine. An idea, coined multi-Krum, of using m different workers per step was mentioned, without however any proof neither on its Byzantine resilience nor on its slowdown. More recently, it was shown that in high dimensional machine learning, guaranteeing convergence is not a sufficient condition for \emph{strong} Byzantine resilience. A improvement on Krum, coined Bulyan, was proposed and proved to guarantee stronger resilience. However, Bulyan suffers from the same weakness of Krum: using only one worker per step. This adds up to the aforementioned open problem and leaves the crucial need for both fast and strong Byzantine resilience unfulfilled. The present paper proposes using Bulyan over Multi-Krum (we call it Multi-Bulyan), a combination for which we provide proofs of strong Byzantine resilience, as well as an {\frac{m}{n}} slowdown, compared to averaging, the fastest (but non Byzantine resilient) rule for distributed machine learning, finally we prove that Multi-Bulyan inherits the O(d) merits of both multi-Krum and Bulyan.

Modeling Combinatorial Evolution in Time Series Prediction

Time series modeling aims to capture the intrinsic factors underpinning observed data and its evolution. However, most existing studies ignore the evolutionary relations among these factors, which are what cause the combinatorial evolution of a given time series. In this paper, we propose to represent time-varying relations among intrinsic factors of time series data by means of an evolutionary state graph structure. Accordingly, we propose the Evolutionary Graph Recurrent Networks (EGRN) to learn representations of these factors, along with the given time series, using a graph neural network framework. The learned representations can then be applied to time series classification tasks. From our experiment results, based on six real-world datasets, it can be seen that our approach clearly outperforms ten state-of-the-art baseline methods (e.g. +5% in terms of accuracy, and +15% in terms of F1 on average). In addition, we demonstrate that due to the graph structure’s improved interpretability, our method is also able to explain the logical causes of the predicted events.

Capturing Evolution Genes for Time Series Data

The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments’ distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results.

Hyperparameter Estimation in Bayesian MAP Estimation: Parameterizations and Consistency

The Bayesian formulation of inverse problems is attractive for three primary reasons: it provides a clear modelling framework; means for uncertainty quantification; and it allows for principled learning of hyperparameters. The posterior distribution may be explored by sampling methods, but for many problems it is computationally infeasible to do so. In this situation maximum a posteriori (MAP) estimators are often sought. Whilst these are relatively cheap to compute, and have an attractive variational formulation, a key drawback is their lack of invariance under change of parameterization. This is a particularly significant issue when hierarchical priors are employed to learn hyperparameters. In this paper we study the effect of the choice of parameterization on MAP estimators when a conditionally Gaussian hierarchical prior distribution is employed. Specifically we consider the centred parameterization, the natural parameterization in which the unknown state is solved for directly, and the noncentred parameterization, which works with a whitened Gaussian as the unknown state variable, and arises when considering dimension-robust MCMC algorithms; MAP estimation is well-defined in the nonparametric setting only for the noncentred parameterization. However, we show that MAP estimates based on the noncentred parameterization are not consistent as estimators of hyperparameters; conversely, we show that limits of finite-dimensional centred MAP estimators are consistent as the dimension tends to infinity. We also consider empirical Bayesian hyperparameter estimation, show consistency of these estimates, and demonstrate that they are more robust with respect to noise than centred MAP estimates. An underpinning concept throughout is that hyperparameters may only be recovered up to measure equivalence, a well-known phenomenon in the context of the Ornstein-Uhlenbeck process.

Digital Passport: A Novel Technological Strategy for Intellectual Property Protection of Convolutional Neural Networks

In order to prevent deep neural networks from being infringed by unauthorized parties, we propose a generic solution which embeds a designated digital passport into a network, and subsequently, either paralyzes the network functionalities for unauthorized usages or maintain its functionalities in the presence of a verified passport. Such a desired network behavior is successfully demonstrated in a number of implementation schemes, which provide reliable, preventive and timely protections against tens of thousands of fake-passport deceptions. Extensive experiments also show that the deep neural network performance under unauthorized usages deteriorate significantly (e.g. with 33% to 82% reductions of CIFAR10 classification accuracies), while networks endorsed with valid passports remain intact.

Statistical inference with anchored Bayesian mixture of regressions models: A case study analysis of allometric data

We present a case study in which we use a mixture of regressions model to improve on an ill-fitting simple linear regression model relating log brain mass to log body mass for 100 placental mammalian species. The slope of this regression model is of particular scientific interest because it corresponds to a constant that governs a hypothesized allometric power law relating brain mass to body mass. A specific line of investigation is to determine whether the regression parameters vary across subgroups of related species. We model these data using an anchored Bayesian mixture of regressions model, which modifies the standard Bayesian Gaussian mixture by pre-assigning small subsets of observations to given mixture components with probability one. These observations (called anchor points) break the relabeling invariance typical of exchangeable model specifications (the so-called label-switching problem). A careful choice of which observations to pre-classify to which mixture components is key to the specification of a well-fitting anchor model. In the article we compare three strategies for the selection of anchor points. The first assumes that the underlying mixture of regressions model holds and assigns anchor points to different components to maximize the information about their labeling. The second makes no assumption about the relationship between x and y and instead identifies anchor points using a bivariate Gaussian mixture model. The third strategy begins with the assumption that there is only one mixture regression component and identifies anchor points that are representative of a clustering structure based on case-deletion importance sampling weights. We compare the performance of the three strategies on the allometric data set and use auxiliary taxonomic information about the species to evaluate the model-based classifications estimated from these models.

Large-Scale Spectrum Occupancy Learning via Tensor Decomposition and LSTM Networks

A new paradigm for large-scale spectrum occupancy learning based on long short-term memory (LSTM) recurrent neural networks is proposed. Studies have shown that spectrum usage is a highly correlated time series. Moreover, there is a correlation for occupancy of spectrum between different frequency channels. Therefore, revealing all these correlations using learning and prediction of one-dimensional time series is not a trivial task. In this paper, we introduce a new framework for representing the spectrum measurements in a tensor format. Next, a time-series prediction method based on CANDECOMP/PARFAC (CP) tensor decomposition and LSTM recurrent neural networks is proposed. The proposed method is computationally efficient and is able to capture different types of correlation within the measured spectrum. Moreover, it is robust against noise and missing entries of sensed spectrum. The superiority of the proposed method is evaluated over a large-scale synthetic dataset in terms of prediction accuracy and computational efficiency.

Enabling Explainable Fusion in Deep Learning with Fuzzy Integral Neural Networks

Information fusion is an essential part of numerous engineering systems and biological functions, e.g., human cognition. Fusion occurs at many levels, ranging from the low-level combination of signals to the high-level aggregation of heterogeneous decision-making processes. While the last decade has witnessed an explosion of research in deep learning, fusion in neural networks has not observed the same revolution. Specifically, most neural fusion approaches are ad hoc, are not understood, are distributed versus localized, and/or explainability is low (if present at all). Herein, we prove that the fuzzy Choquet integral (ChI), a powerful nonlinear aggregation function, can be represented as a multi-layer network, referred to hereafter as ChIMP. We also put forth an improved ChIMP (iChIMP) that leads to a stochastic gradient descent-based optimization in light of the exponential number of ChI inequality constraints. An additional benefit of ChIMP/iChIMP is that it enables eXplainable AI (XAI). Synthetic validation experiments are provided and iChIMP is applied to the fusion of a set of heterogeneous architecture deep models in remote sensing. We show an improvement in model accuracy and our previously established XAI indices shed light on the quality of our data, model, and its decisions.

Prediction and outlier detection: a distribution-free prediction set with a balanced objective

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction set) that constructs a prediction set C(x) which tries to optimize out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers x, for which the method returns no prediction (corresponding to C(x) equal to the empty set). BCOPS combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also develop a variant of BCOPS in the online setting where we optimize the misclassification loss averaged over a proxy of the out-of-sample distribution. We also describe new methods for the evaluation of out-of-sample performance with mismatched data. We prove asymptotic consistency and efficiency of the proposed methods under suitable assumptions and illustrate our methods on real data examples.

Learning Robotic Manipulation through Visual Planning and Acting

Planning for robotic manipulation requires reasoning about the changes a robot can affect on objects. When such interactions can be modelled analytically, as in domains with rigid objects, efficient planning algorithms exist. However, in both domestic and industrial domains, the objects of interest can be soft, or deformable, and hard to model analytically. For such cases, we posit that a data-driven modelling approach is more suitable. In recent years, progress in deep generative models has produced methods that learn to `imagine’ plausible images from data. Building on the recent Causal InfoGAN generative model, in this work we learn to imagine goal-directed object manipulation directly from raw image data of self-supervised interaction of the robot with the object. After learning, given a goal observation of the system, our model can generate an imagined plan — a sequence of images that transition the object into the desired goal. To execute the plan, we use it as a reference trajectory to track with a visual servoing controller, which we also learn from the data as an inverse dynamics model. In a simulated manipulation task, we show that separating the problem into visual planning and visual tracking control is more sample efficient and more interpretable than alternative data-driven approaches. We further demonstrate our approach on learning to imagine and execute in 3 environments, the final of which is deformable rope manipulation on a PR2 robot.

Knowledge Graph Convolutional Networks for Recommender Systems with Label Smoothness Regularization

Knowledge graphs capture interlinked information between entities and they represent an attractive source of structured information that can be harnessed for recommender systems. However, existing recommender engines use knowledge graphs by manually designing features, do not allow for end-to-end training, or provide poor scalability. Here we propose Knowledge Graph Convolutional Networks (KGCN), an end-to-end trainable framework that harnesses item relationships captured by the knowledge graph to provide better recommendations. Conceptually, KGCN computes user-specific item embeddings by first applying a trainable function that identifies important knowledge graph relations for a given user and then transforming the knowledge graph into a user-specific weighted graph. Then, KGCN applies a graph convolutional neural network that computes an embedding of an item node by propagating and aggregating knowledge graph neighborhood information. Moreover, to provide better inductive bias KGCN uses label smoothness (LS), which provides regularization over edge weights and we prove that it is equivalent to label propagation scheme on a graph. Finally, We unify KGCN and LS regularization, and present a scalable minibatch implementation for KGCN-LS model. Experiments show that KGCN-LS outperforms strong baselines in four datasets. KGCN-LS also achieves great performance in sparse scenarios and is highly scalable with respect to the knowledge graph size.

Controlled Natural Languages and Default Reasoning

Controlled natural languages (CNLs) are effective languages for knowledge representation and reasoning. They are designed based on certain natural languages with restricted lexicon and grammar. CNLs are unambiguous and simple as opposed to their base languages. They preserve the expressiveness and coherence of natural languages. In this report, we focus on a class of CNLs, called machine-oriented CNLs, which have well-defined semantics that can be deterministically translated into formal languages, such as Prolog, to do logical reasoning. Over the past 20 years, a number of machine-oriented CNLs emerged and have been used in many application domains for problem solving and question answering. However, few of them support non-monotonic inference. In our work, we propose non-monotonic extensions of CNL to support defeasible reasoning. In the first part of this report, we survey CNLs and compare three influential systems: Attempto Controlled English (ACE), Processable English (PENG), and Computer-processable English (CPL). We compare their language design, semantic interpretations, and reasoning services. In the second part of this report, we first identify typical non-monotonicity in natural languages, such as defaults, exceptions and conversational implicatures. Then, we propose their representation in CNL and the corresponding formalizations in a form of defeasible reasoning known as Logic Programming with Defaults and Argumentation Theory (LPDA).

Hadamard Matrix Guided Online Hashing

Online image hashing has received increasing research attention recently, which receives large-scale data in a streaming manner to update the hash functions on-the-fly. Its key challenge lies in the difficulty in balancing the learning timeliness and model accuracy. To this end, most works exploit a supervised setting, i.e., using class labels to boost the hashing performance, which defects in two aspects: First, large amount of training batches are required to learn up-to-date hash functions, which however largely increase the learning complexity. Second, strong constraints, e.g., orthogonal or similarity preserving, are used, which are however typically relaxed and lead to large accuracy drop. To handle the above challenges, in this paper, a novel supervised online hashing scheme termed Hadamard Matrix Guided Online Hashing (HMOH) is proposed. Our key innovation lies in the construction and usage of Hadamard matrix, which is an orthogonal binary matrix and is built via Sylvester method. To release the need of strong constraints, we regard each column of Hadamard matrix as the target code for each class label, which by nature satisfies several desired properties of hashing codes. To accelerate the online training, the LSH is first adopted to align the length of target code and the to-be-learned binary code. And then, we treat the learning of hash functions as a set of binary classification problems to fit the assigned target code. Finally, we propose to ensemble the learned models in all rounds to maximally preserve the information of past streaming data. The superior accuracy and efficiency of the proposed method are demonstrated through extensive experiments on three widely-used datasets comparing to various state-of-the-art methods.

Structural Equation Modeling using Computation Graphs

Structural equation modeling (SEM) is evolving as available data is becoming more complex, reaching the limits of what traditional estimation approaches can achieve. As SEM expands to ever larger, more complex applications, the estimation challenge grows and currently available methods will be insufficient. To overcome this challenge in SEM, we see an opportunity to use existing solutions from the field of deep learning, which has been pioneering methods for estimation of complex models for decades. To this end, this paper introduces computation graphs, a flexible method of specifying objective functions. When combined with state-of-the-art optimizers, we argue that our computation graph approach is capable not only of estimating SEM models, but also of rapidly extending them — without the need of bespoke software development for each new extension. We show that several SEM improvements follow naturally from our approach; not only existing extensions such as least absolute deviation estimation and penalized regression models, but also novel extensions such as spike-and-slab penalties for sparse factor analysis. By applying computation graphs to SEM, we hope to greatly accelerate the process of developing SEM techniques, paving the way for novel applications. The accompanying R package tensorsem is under active development.

Stability Properties of Graph Neural Networks

Data stemming from networks exhibit an irregular support, whereby each data element is related by arbitrary pairwise relationships determined by the network. Graph neural networks (GNNs) have emerged as information processing architectures that exploit the particularities of this underlying support. The use of nonlinearities in GNNs, coupled with the fact that filters are learned from data, raises mathematical challenges that have precluded the development of theoretical results that would give insight in the reasons for the remarkable performance of GNNs. In this work, we prove the property of stability, that states that a small change in the support of the data leads to a small (bounded) change in the output of the GNN. More specifically, we prove that the bound on the output difference of the GNN computed on one graph or another, is proportional to the difference between the graphs and the design parameters of the GNN, as long as the trained filters are integral Lipschitz. We exploit this result to provide some insights in the crucial effect that nonlinearities have in obtaining an architecture that is both stable and selective, a feat that is impossible to achieve if using only linear filters.

GraphSE$^2$: An Encrypted Graph Database for Privacy-Preserving Social Search

In this paper, we propose GraphSE^2, an encrypted graph database for online social network services to address massive data breaches. GraphSE^2 preserves the functionality of social search, a key enabler for quality social network services, where social search queries are conducted on a large-scale social graph and meanwhile perform set and computational operations on user-generated contents. To enable efficient privacy-preserving social search, GraphSE^2 provides an encrypted structural data model to facilitate parallel and encrypted graph data access. It is also designed to decompose complex social search queries into atomic operations and realise them via interchangeable protocols in a fast and scalable manner. We build GraphSE^2 with various queries supported in the Facebook graph search engine and implement a full-fledged prototype. Extensive evaluations on Azure Cloud demonstrate that GraphSE^2 is practical for querying a social graph with a million of users.

Interpret Federated Learning with Shapley Values

Federated Learning is introduced to protect privacy by distributing training data into multiple parties. Each party trains its own model and a meta-model is constructed from the sub models. In this way the details of the data are not disclosed in between each party. In this paper we investigate the model interpretation methods for Federated Learning, specifically on the measurement of feature importance of vertical Federated Learning where feature space of the data is divided into two parties, namely host and guest. For host party to interpret a single prediction of vertical Federated Learning model, the interpretation results, namely the feature importance, are very likely to reveal the protected data from guest party. We propose a method to balance the model interpretability and data privacy in vertical Federated Learning by using Shapley values to reveal detailed feature importance for host features and a unified importance value for federated guest features. Our experiments indicate robust and informative results for interpreting Federated Learning models.

Segregation Network for Multi-Class Novelty Detection

The problem of multiple class novelty detection is gaining increasing importance due to the large availability of multimedia data and the increasing requirement of the classification models to work in an open set scenario. To this end, novelty detection tries to answer this important question: given a test example should we even try to classify it? In this work, we design a novel deep learning framework, termed Segregation Network, which is trained using the mixup technique. We construct interpolated points using convex combinations of pairs of training data and use our novel loss function for prediction of its constituent classes. During testing, for each input query, mixed samples with the known class prototypes are generated and passed through the proposed network. The output of the network reveals the constituent classes which can be used to determine whether the incoming data is from the known class set or not. Our algorithm is trained using just the data from the known classes and does not require any auxiliary dataset or attributes. Extensive evaluation on two benchmark datasets namely Caltech-256 and Stanford Dogs and comparison with the state-of-the-art justifies the effectiveness of the proposed framework.

Novel Algorithms based on Majorization Minimization for Nonnegative Matrix Factorization

Matrix decomposition is ubiquitous and has applications in various fields like speech processing, data mining and image processing to name a few. Under matrix decomposition, nonnegative matrix factorization is used to decompose a nonnegative matrix into a product of two nonnegative matrices which gives some meaningful interpretation of the data. Thus, nonnegative matrix factorization has an edge over the other decomposition techniques. In this paper, we propose two novel iterative algorithms based on Majorization Minimization (MM)-in which we formulate a novel upper bound and minimize it to get a closed form solution at every iteration. Since the algorithms are based on MM, it is ensured that the proposed methods will be monotonic. The proposed algorithms differ in the updating approach of the two nonnegative matrices. The first algorithm-Iterative Nonnegative Matrix Factorization (INOM) sequentially updates the two nonnegative matrices while the second algorithm-Parallel Iterative Nonnegative Matrix Factorization (PARINOM) parallely updates them. We also prove that the proposed algorithms converge to the stationary point of the problem. Simulations were conducted to compare the proposed methods with the existing ones and was found that the proposed algorithms performs better than the existing ones in terms of computational speed and convergence. KeyWords: Nonnegative matrix factorization, Majorization Minimization, Big Data, Parallel, Multiplicative Update

Boosting Generative Models by Leveraging Cascaded Meta-Models

Deep generative models are effective methods of modeling data. However, it is not easy for a single generative model to faithfully capture the distributions of complex data such as images. In this paper, we propose an approach for boosting generative models, which cascades meta-models together to produce a stronger model. Any hidden variable meta-model (e.g., RBM and VAE) which supports likelihood evaluation can be leveraged. We derive a decomposable variational lower bound of the boosted model, which allows each meta-model to be trained separately and greedily. Besides, our framework can be extended to semi-supervised boosting, where the boosted model learns a joint distribution of data and labels. Finally, we combine our boosting framework with the multiplicative boosting framework, which further improves the learning power of generative models.

Deep Learning: a new definition of artificial neuron with double weight

Deep learning is a subset of a broader family of machine learning methods based on learning data representations. These models are inspired by human biological nervous systems, even if there are various differences pertaining to the structural and functional properties of biological brains. The elementary constituents of deep learning models are neurons, which can be considered as functions that receive inputs and produce an output that is a weighted sum of the inputs fed through an activation function. Several models of neurons were proposed in the course of the years that are all based on learnable parameters called weights. In this paper we present a new type of artificial neuron, the double-weight neuron,characterized by additional learnable weights that lead to a more complex and accurate system. We tested a feed-forward and convolutional neural network consisting of double-weight neurons on the MNIST dataset, and we tested a convolution network on the CIFAR-10 dataset. For MNIST we find a \approx 4\% and \approx 1\% improved classification accuracy, respectively, when compared to a standard feed-forward and convolutional neural network built with the same sets of hyperparameters. For CIFAR-10 we find a \approx 12\% improved classification accuracy. We thus conclude that this novel artificial neuron can be considered as a valuable alternative to common ones.

Understanding eWhoring

In this paper, we describe a new type of online fraud, referred to as ‘eWhoring’ by offenders. This crime script analysis provides an overview of the ‘eWhoring’ business model, drawing on more than 6,500 posts crawled from an online underground forum. This is an unusual fraud type, in that offenders readily share information about how it is committed in a way that is almost prescriptive. There are economic factors at play here, as providing information about how to make money from ‘eWhoring’ can increase the demand for the types of images that enable it to happen. We find that sexualised images are typically stolen and shared online. While some images are shared for free, these can quickly become ‘saturated’, leading to the demand for (and trade in) more exclusive ‘packs’. These images are then sold to unwitting customers who believe they have paid for a virtual sexual encounter. A variety of online services are used for carrying out this fraud type, including email, video, dating sites, social media, classified advertisements, and payment platforms. This analysis reveals potential interventions that could be applied to each stage of the crime commission process to prevent and disrupt this crime type.

Dissecting Graph Neural Networks on Graph Classification

Graph Neural Nets (GNNs) have received increasing attentions, partially due to their superior performance in many node and graph classification tasks. However, there is a lack of understanding on what they are learning and how sophisticated the learned graph functions are. In this work, we first propose Graph Feature Network (GFN), a simple lightweight neural net defined on a set of graph augmented features. We then propose a dissection of GNNs on graph classification into two parts: 1) the graph filtering, where graph-based neighbor aggregations are performed, and 2) the set function, where a set of hidden node features are composed for prediction. To test the importance of these two parts separately, we prove and leverage the connection that GFN can be derived by linearizing graph filtering part of GNN. Empirically we perform evaluations on common graph classification benchmarks. To our surprise, we find that, despite the simplification, GFN could match or exceed the best accuracies produced by recently proposed GNNs, with a fraction of computation cost. Our results provide new perspectives on both the functions that GNNs learned and the current benchmarks for evaluating them.

Language in Our Time: An Empirical Analysis of Hashtags

Hashtags in online social networks have gained tremendous popularity during the past five years. The resulting large quantity of data has provided a new lens into modern society. Previously, researchers mainly rely on data collected from Twitter to study either a certain type of hashtags or a certain property of hashtags. In this paper, we perform the first large-scale empirical analysis of hashtags shared on Instagram, the major platform for hashtag-sharing. We study hashtags from three different dimensions including the temporal-spatial dimension, the semantic dimension, and the social dimension. Extensive experiments performed on three large-scale datasets with more than 7 million hashtags in total provide a series of interesting observations. First, we show that the temporal patterns of hashtags can be categorized into four different clusters, and people tend to share fewer hashtags at certain places and more hashtags at others. Second, we observe that a non-negligible proportion of hashtags exhibit large semantic displacement. We demonstrate hashtags that are more uniformly shared among users, as quantified by the proposed hashtag entropy, are less prone to semantic displacement. In the end, we propose a bipartite graph embedding model to summarize users’ hashtag profiles, and rely on these profiles to perform friendship prediction. Evaluation results show that our approach achieves an effective prediction with AUC (area under the ROC curve) above 0.8 which demonstrates the strong social signals possessed in hashtags.

Deep Layered LMS Predictor

In this study, we present a new approach to design a Least Mean Squares (LMS) predictor. This approach exploits the concept of deep neural networks and their supremacy in terms of performance and accuracy. The new LMS predictor is implemented as a deep neural network using multiple non linear LMS filters. The network consists of multiple layers with nonlinear activation functions, where each neuron in the hidden layers corresponds to a certain FIR filter output which goes through nonlinearity. The output of the last layer is the prediction. We hypothesize that this approach will outperform the traditional adaptive filters.

Explainable AI for Trees: From Local Explanations to Global Understanding

Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model’s performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.

VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository

Researchers currently rely on ad hoc datasets to train automated visualization tools and evaluate the effectiveness of visualization designs. These exemplars often lack the characteristics of real-world datasets, and their one-off nature makes it difficult to compare different techniques. In this paper, we present VizNet: a large-scale corpus of over 31 million datasets compiled from open data repositories and online visualization galleries. On average, these datasets comprise 17 records over 3 dimensions and across the corpus, we find 51% of the dimensions record categorical data, 44% quantitative, and only 5% temporal. VizNet provides the necessary common baseline for comparing visualization design techniques, and developing benchmark models and algorithms for automating visual analysis. To demonstrate VizNet’s utility as a platform for conducting online crowdsourced experiments at scale, we replicate a prior study assessing the influence of user task and data distribution on visual encoding effectiveness, and extend it by considering an additional task: outlier detection. To contend with running such studies at scale, we demonstrate how a metric of perceptual effectiveness can be learned from experimental results, and show its predictive power across test datasets.

Kyrix: Interactive Visual Data Exploration at Scale

Scalable interactive visual data exploration is crucial in many domains due to increasingly large datasets generated at rapid rates. Details-on-demand provides a useful interaction paradigm for exploring large datasets, where users start at an overview, find regions of interest, zoom in to see detailed views, zoom out and then repeat. This paradigm is the primary user interaction mode of widely-used systems such as Google Maps, Aperture Tiles and ForeCache. These earlier systems, however, are highly customized with hardcoded visual representations and optimizations. A more general framework is needed to facilitate the development of visual data exploration systems at scale. In this paper, we present Kyrix, an end-to-end system for developing scalable details-on-demand data exploration applications. Kyrix provides developers with a declarative model for easy specification of general visualizations. Behind the scenes, Kyrix utilizes a suite of performance optimization techniques to achieve a response time within 500ms for various user interactions. We also report results from a performance study which shows that a novel dynamic fetching scheme adopted by Kyrix outperforms tile-based fetching used in earlier systems.

Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards

Intrinsic rewards are introduced to simulate how human intelligence works, which are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-level performance under this very challenging setting of intrinsically-motivated play. In this work, we propose a novel megalomania-driven intrinsic reward (mega-reward) which, to our knowledge, is the first approach that achieves comparable human-level performance in intrinsically-motivated play. The intuition of mega-rewards comes from the observation that infants’ intelligence develops when they try to gain more control on entities in an environment; therefore, mega-reward aims to maximize the control capabilities of agents on given entities in a given environment. To formalize mega-reward, a relational transition model is proposed to bridge the gaps between direct and latent control. Experimental studies show that mega-reward can (i) greatly outperform all state-of-the-art intrinsic reward approaches, (ii) generally achieves the same level of performance as Ex-PPO and professional human-level scores; and (iii) has also superior performance when it is incorporated with extrinsic reward.

Predictive Ensemble Learning with Application to Scene Text Detection

Deep learning based approaches have achieved significant progresses in different tasks like classification, detection, segmentation, and so on. Ensemble learning is widely known to further improve performance by combining multiple complementary models. It is easy to apply ensemble learning for classification tasks, for example, based on averaging, voting, or other methods. However, for other tasks (like object detection) where the outputs are varying in quantity and unable to be simply compared, the ensemble of multiple models become difficult. In this paper, we propose a new method called Predictive Ensemble Learning (PEL), based on powerful predictive ability of deep neural networks, to directly predict the best performing model among a pool of base models for each test example, thus transforming ensemble learning to a traditional classification task. Taking scene text detection as the application, where no suitable ensemble learning strategy exists, PEL can significantly improve the performance, compared to either individual state-of-the-art models, or the fusion of multiple models by non-maximum suppression. Experimental results show the possibility and potential of PEL in predicting different models’ performance based only on a query example, which can be extended for ensemble learning in many other complex tasks.

Learning to Convolve: A Generalized Weight-Tying Approach

Recent work (Cohen & Welling, 2016) has shown that generalizations of convolutions, based on group theory, provide powerful inductive biases for learning. In these generalizations, filters are not only translated but can also be rotated, flipped, etc. However, coming up with exact models of how to rotate a 3 x 3 filter on a square pixel-grid is difficult. In this paper, we learn how to transform filters for use in the group convolution, focussing on roto-translation. For this, we learn a filter basis and all rotated versions of that filter basis. Filters are then encoded by a set of rotation invariant coefficients. To rotate a filter, we switch the basis. We demonstrate we can produce feature maps with low sensitivity to input rotations, while achieving high performance on MNIST and CIFAR-10.

On Graph Classification Networks, Datasets and Baselines

Graph classification receives a great deal of attention from the non-Euclidean machine learning community. Recent advances in graph coarsening have enabled the training of deeper networks and produced new state-of-the-art results in many benchmark tasks. We examine how these architectures train and find that performance is highly-sensitive to initialisation and depends strongly on jumping-knowledge structures. We then show that, despite the great complexity of these models, competitive performance is achieved by the simplest of models — structure-blind MLP, single-layer GCN and fixed-weight GCN — and propose these be included as baselines in future.

Document worth reading: “Machine Learning for Data-Driven Movement Generation: a Review of the State of the Art”

The rise of non-linear and interactive media such as video games has increased the need for automatic movement animation generation. In this survey, we review and analyze different aspects of building automatic movement generation systems using machine learning techniques and motion capture data. We cover topics such as high-level movement characterization, training data, features representation, machine learning models, and evaluation methods. We conclude by presenting a discussion of the reviewed literature and outlining the research gaps and remaining challenges for future work. Machine Learning for Data-Driven Movement Generation: a Review of the State of the Art