Book Memo: “Probabilistic Data Structures and Algorithms for Big Data Applications”

A technical book about popular space-efficient data structures and fast algorithms that are extremely useful in modern Big Data applications. The purpose of this book is to introduce technology practitioners, including software architects and developers, as well as technology decision makers to probabilistic data structures and algorithms. Reading this book, you will get a theoretical and practical understanding of probabilistic data structures and learn about their common uses.

If you did not already know

Neural Attentive Item Similarity Model (NAIS) google
Item-to-item collaborative filtering (aka. item-based CF) has been long used for building recommender systems in industrial settings, owing to its interpretability and efficiency in real-time personalization. It builds a user’s profile as her historically interacted items, recommending new items that are similar to the user’s profile. As such, the key to an item-based CF method is in the estimation of item similarities. Early approaches use statistical measures such as cosine similarity and Pearson coefficient to estimate item similarities, which are less accurate since they lack tailored optimization for the recommendation task. In recent years, several works attempt to learn item similarities from data, by expressing the similarity as an underlying model and estimating model parameters by optimizing a recommendation-aware objective function. While extensive efforts have been made to use shallow linear models for learning item similarities, there has been relatively less work exploring nonlinear neural network models for item-based CF. In this work, we propose a neural network model named Neural Attentive Item Similarity model (NAIS) for item-based CF. The key to our design of NAIS is an attention network, which is capable of distinguishing which historical items in a user profile are more important for a prediction. Compared to the state-of-the-art item-based CF method Factored Item Similarity Model (FISM), our NAIS has stronger representation power with only a few additional parameters brought by the attention network. Extensive experiments on two public benchmarks demonstrate the effectiveness of NAIS. This work is the first attempt that designs neural network models for item-based CF, opening up new research possibilities for future developments of neural recommender systems. …

Verity google
Integrity and security of the data in database systems are typically maintained with access control policies and firewalls. However, insider attacks — where someone with an intimate knowledge of the system and administrative privileges tampers with the data — pose a unique challenge. Measures like append only logging prove to be insufficient because an attacker with administrative privileges can alter logs and login records to eliminate the trace of attack, thus making insider attacks hard to detect. In this paper, we propose Verity — first of a kind system to the best of our knowledge. Verity serves as a dataless framework by which any blockchain network can be used to store fixed-length metadata about tuples from any SQL database, without complete migration of the database. Verity uses a formalism for parsing SQL queries and query results to check the respective tuples’ integrity using blockchains to detect insider attacks. We have implemented our technique using Hyperledger Fabric, Composer REST API, and SQLite database. Using TPC-H data and SQL queries of varying complexity and types, our experiments demonstrate that any overhead of integrity checking remains constant per tuple in a query’s results, and scales linearly. …

Spider google
We present Spider, a large-scale, complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables, covering 138 different domains. We define a new complex and cross-domain semantic parsing and text-to-SQL task where different complex SQL queries and databases appear in train and test sets. In this way, the task requires the model to generalize well to both new SQL queries and new database schemas. Spider is distinct from most of the previous semantic parsing tasks because they all use a single database and the exact same programs in the train set and the test set. We experiment with various state-of-the-art models and the best model achieves only 14.3% exact matching accuracy on a database split setting. This shows that Spider presents a strong challenge for future research. Our dataset and task are publicly available at https://…/spider.

Distilled News

Recurrence in biological and artificial neural networks

Recurrence is an overloaded term in the context of neural networks, with disparate colloquial meanings in the machine learning and the neuroscience communities. The difference is narrowing, however, as the artificial neural networks (ANNs) used for practical applications are increasingly sophisticated and more like biological neural networks (BNNs) in some ways (yet still vastly different on the whole). In this post we’ll highlight the historic differences in the use of term recurrence within these two communities, highlight some fairly recent deep learning ANN models that creep towards the neuroscience, point to some neuroscience studies that shine light on the function of recurrence, and speculate on future advancements.

Free Book: Classification and Regression In a Weekend

This tutorial began as a series of weekend workshops created by Ajit Jaokar and Dan Howarth. The idea was to work with a specific (longish) program such that we explore as much of it as possible in one weekend. This book is an attempt to take this idea online. The best way to use this book is to work with the Python code as much as you can. The code has comments. But you can extend the comments by the concepts explained here.

Master Your Hypothesis Test

A tutorial on Power, Bootstrapping, Sample Selection, and Outcome Analysis.

Time Series Analysis, Visualization & Forecasting with LSTM

Statistics normality test, Dickey-Fuller test for stationarity, Long short-term memory.

Using Data Cubes with R

Data cubes are a popular way to display multidimensional data. This makes the method suitable for big data. Giving the incredible growth of data it is natural that the method have become increasingly popular. In this article you learn to use R for data cubes.

Understanding Objective Functions in Deep Learning

Data has consumed our day to day lives. The amount of data that’s is available in the web or from other variety of sources is more than enough to get an idea about any entity. The past few years has seen exponential rise in the volume which has resulted into the adaptation of the term Big Data. Most of these generated data are unstructured and could up in any format. Previously computers were not equipped to understand such unstructured data but modern computers coupled with some programs are able to mind such data and extract relevant information from it which has certainly helped many business. Machine Learning is the study of predictive analytics where the structured or unstructured data are analysed and new results are predicted after the model is trained to learn the patterns from historical data. There are several pre-programmed Machine Learning algorithms which helps in building the model and the choice of the algorithm to be used completely depends on the problem statement, the architecture and the relationship among the variables. However, the traditional state-of-the-art Machine Learning algorithms like Support Vector Machines, Logistic Regression, Random Forest, etc., often lacks efficiency when the size of the data increases. This problem is resolved by the advent of Deep Learning which is a sub-field of Machine Learning. The idea behind Deep Learning is more or less akin to our brain. The neural networks in Deep Learning works almost similarly to the neurons in the human brain.

Time Series Forecasting with TensorFlow.js

Pull stock prices from online API and perform predictions using Recurrent Neural Network & Long Short Term Memory (LSTM) with TensorFlow.js framework.

7 Steps to Mastering SQL for Data Science – 2019 Edition

Follow these updated 7 steps to go from SQL data science newbie to practitioner in a hurry. We consider only the necessary concepts and skills, and provide quality resources for each.

Automating Trading and Market Making With Artificial Intelligence

The goal is to capture information in a market’s order books and use that information to predict market movement/direction. That prediction can enable repricing of orders and more efficient market making. Such an approach allows the market maker to provide liquidity whilst making profits at the same time. Market makers are essential to modern markets. They provide the markets with necessary liquidity and make sure the bid/ask spread is reasonably narrow to allow efficient purchasing. This can be taken a step further by market makers that do more than simply provide a constant bidding and asking price. Some market makers trade at higher frequencies and constantly take advantage of inefficiencies as well as small swings in asset prices.

One of the most fascinating advancements in the world of machine learning, is the development of abilities to teach a machine how to understand human communication. This very arm of machine learning is called as Natural Language Processing. This post is an attempt at explaining the basics of Natural Language Processing and how a rapid progress has been made in it with the advancements of deep learning and neural networks.

Recommendation Systems in the Real world

An overview of the process of designing and building a recommendation system pipeline.

PyCharm for Data Scientists

I have recently started using PyCharm as an alternative to Spyder, and am loving it. This article talks about some of the features of PyCharm that made me completely transition to PyCharm from Spyder. The below features are in comparison with Spyder, and not general IDEs.

60+ useful graph visualization libraries

We outline 60+ graph visualization libraries that allow users to build applications to display and interact with network representations of data.

Confidence Intervals in One Picture

Confidence intervals (CIs) tell you how much uncertainty a statistic has. The intervals are connected to confidence levels and the two terms are easily confused, especially if you’re new to statistics. Confidence Intervals in One Picture is an intro to CIs, and explains how each part interacts with margins of error and where the different components come from.

Synchronous Kernels-only Competitions: Real ML in Real Time

We are pleased to share that we now support a general synchronous Kernels-only (KO) format: when you submit a Kernel, Kaggle will run the code against both the public test set and a withheld private test set in real time. To kick things off, you’re invited to join Instant Gratification, our first synchronous Kernels-only competition using our new framework.

Document worth reading: “When deep learning meets security”

Deep learning is an emerging research field that has proven its effectiveness towards deploying more efficient intelligent systems. Security, on the other hand, is one of the most essential issues in modern communication systems. Recently many papers have shown that using deep learning models can achieve promising results when applied to the security domain. In this work, we provide an overview for the recent studies that apply deep learning techniques to the field of security. When deep learning meets security

Whats new on arXiv

Fast and Secure Distributed Learning in High Dimension

Modern machine learning is distributed and the work of several machines is typically aggregated by \emph{averaging} which is the optimal rule in terms of speed, offering a speedup of n (with respect to using a single machine) when n processes are learning together. Distributing data and models poses however fundamental vulnerabilities, be they to software bugs, asynchrony, or worse, to malicious attackers controlling some machines or injecting misleading data in the network. Such behavior is best modeled as Byzantine failures, and averaging does not tolerate a single one from a worker. Krum, the first provably Byzantine resilient aggregation rule for SGD only uses one worker per step, which hampers its speed of convergence, especially in best case conditions when none of the workers is actually Byzantine. An idea, coined multi-Krum, of using m different workers per step was mentioned, without however any proof neither on its Byzantine resilience nor on its slowdown. More recently, it was shown that in high dimensional machine learning, guaranteeing convergence is not a sufficient condition for \emph{strong} Byzantine resilience. A improvement on Krum, coined Bulyan, was proposed and proved to guarantee stronger resilience. However, Bulyan suffers from the same weakness of Krum: using only one worker per step. This adds up to the aforementioned open problem and leaves the crucial need for both fast and strong Byzantine resilience unfulfilled. The present paper proposes using Bulyan over Multi-Krum (we call it Multi-Bulyan), a combination for which we provide proofs of strong Byzantine resilience, as well as an {\frac{m}{n}} slowdown, compared to averaging, the fastest (but non Byzantine resilient) rule for distributed machine learning, finally we prove that Multi-Bulyan inherits the O(d) merits of both multi-Krum and Bulyan.

Modeling Combinatorial Evolution in Time Series Prediction

Time series modeling aims to capture the intrinsic factors underpinning observed data and its evolution. However, most existing studies ignore the evolutionary relations among these factors, which are what cause the combinatorial evolution of a given time series. In this paper, we propose to represent time-varying relations among intrinsic factors of time series data by means of an evolutionary state graph structure. Accordingly, we propose the Evolutionary Graph Recurrent Networks (EGRN) to learn representations of these factors, along with the given time series, using a graph neural network framework. The learned representations can then be applied to time series classification tasks. From our experiment results, based on six real-world datasets, it can be seen that our approach clearly outperforms ten state-of-the-art baseline methods (e.g. +5% in terms of accuracy, and +15% in terms of F1 on average). In addition, we demonstrate that due to the graph structure’s improved interpretability, our method is also able to explain the logical causes of the predicted events.

Capturing Evolution Genes for Time Series Data

The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments’ distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results.

Hyperparameter Estimation in Bayesian MAP Estimation: Parameterizations and Consistency

The Bayesian formulation of inverse problems is attractive for three primary reasons: it provides a clear modelling framework; means for uncertainty quantification; and it allows for principled learning of hyperparameters. The posterior distribution may be explored by sampling methods, but for many problems it is computationally infeasible to do so. In this situation maximum a posteriori (MAP) estimators are often sought. Whilst these are relatively cheap to compute, and have an attractive variational formulation, a key drawback is their lack of invariance under change of parameterization. This is a particularly significant issue when hierarchical priors are employed to learn hyperparameters. In this paper we study the effect of the choice of parameterization on MAP estimators when a conditionally Gaussian hierarchical prior distribution is employed. Specifically we consider the centred parameterization, the natural parameterization in which the unknown state is solved for directly, and the noncentred parameterization, which works with a whitened Gaussian as the unknown state variable, and arises when considering dimension-robust MCMC algorithms; MAP estimation is well-defined in the nonparametric setting only for the noncentred parameterization. However, we show that MAP estimates based on the noncentred parameterization are not consistent as estimators of hyperparameters; conversely, we show that limits of finite-dimensional centred MAP estimators are consistent as the dimension tends to infinity. We also consider empirical Bayesian hyperparameter estimation, show consistency of these estimates, and demonstrate that they are more robust with respect to noise than centred MAP estimates. An underpinning concept throughout is that hyperparameters may only be recovered up to measure equivalence, a well-known phenomenon in the context of the Ornstein-Uhlenbeck process.

Digital Passport: A Novel Technological Strategy for Intellectual Property Protection of Convolutional Neural Networks

In order to prevent deep neural networks from being infringed by unauthorized parties, we propose a generic solution which embeds a designated digital passport into a network, and subsequently, either paralyzes the network functionalities for unauthorized usages or maintain its functionalities in the presence of a verified passport. Such a desired network behavior is successfully demonstrated in a number of implementation schemes, which provide reliable, preventive and timely protections against tens of thousands of fake-passport deceptions. Extensive experiments also show that the deep neural network performance under unauthorized usages deteriorate significantly (e.g. with 33% to 82% reductions of CIFAR10 classification accuracies), while networks endorsed with valid passports remain intact.

Statistical inference with anchored Bayesian mixture of regressions models: A case study analysis of allometric data

We present a case study in which we use a mixture of regressions model to improve on an ill-fitting simple linear regression model relating log brain mass to log body mass for 100 placental mammalian species. The slope of this regression model is of particular scientific interest because it corresponds to a constant that governs a hypothesized allometric power law relating brain mass to body mass. A specific line of investigation is to determine whether the regression parameters vary across subgroups of related species. We model these data using an anchored Bayesian mixture of regressions model, which modifies the standard Bayesian Gaussian mixture by pre-assigning small subsets of observations to given mixture components with probability one. These observations (called anchor points) break the relabeling invariance typical of exchangeable model specifications (the so-called label-switching problem). A careful choice of which observations to pre-classify to which mixture components is key to the specification of a well-fitting anchor model. In the article we compare three strategies for the selection of anchor points. The first assumes that the underlying mixture of regressions model holds and assigns anchor points to different components to maximize the information about their labeling. The second makes no assumption about the relationship between x and y and instead identifies anchor points using a bivariate Gaussian mixture model. The third strategy begins with the assumption that there is only one mixture regression component and identifies anchor points that are representative of a clustering structure based on case-deletion importance sampling weights. We compare the performance of the three strategies on the allometric data set and use auxiliary taxonomic information about the species to evaluate the model-based classifications estimated from these models.

Large-Scale Spectrum Occupancy Learning via Tensor Decomposition and LSTM Networks

A new paradigm for large-scale spectrum occupancy learning based on long short-term memory (LSTM) recurrent neural networks is proposed. Studies have shown that spectrum usage is a highly correlated time series. Moreover, there is a correlation for occupancy of spectrum between different frequency channels. Therefore, revealing all these correlations using learning and prediction of one-dimensional time series is not a trivial task. In this paper, we introduce a new framework for representing the spectrum measurements in a tensor format. Next, a time-series prediction method based on CANDECOMP/PARFAC (CP) tensor decomposition and LSTM recurrent neural networks is proposed. The proposed method is computationally efficient and is able to capture different types of correlation within the measured spectrum. Moreover, it is robust against noise and missing entries of sensed spectrum. The superiority of the proposed method is evaluated over a large-scale synthetic dataset in terms of prediction accuracy and computational efficiency.

Enabling Explainable Fusion in Deep Learning with Fuzzy Integral Neural Networks

Information fusion is an essential part of numerous engineering systems and biological functions, e.g., human cognition. Fusion occurs at many levels, ranging from the low-level combination of signals to the high-level aggregation of heterogeneous decision-making processes. While the last decade has witnessed an explosion of research in deep learning, fusion in neural networks has not observed the same revolution. Specifically, most neural fusion approaches are ad hoc, are not understood, are distributed versus localized, and/or explainability is low (if present at all). Herein, we prove that the fuzzy Choquet integral (ChI), a powerful nonlinear aggregation function, can be represented as a multi-layer network, referred to hereafter as ChIMP. We also put forth an improved ChIMP (iChIMP) that leads to a stochastic gradient descent-based optimization in light of the exponential number of ChI inequality constraints. An additional benefit of ChIMP/iChIMP is that it enables eXplainable AI (XAI). Synthetic validation experiments are provided and iChIMP is applied to the fusion of a set of heterogeneous architecture deep models in remote sensing. We show an improvement in model accuracy and our previously established XAI indices shed light on the quality of our data, model, and its decisions.

Prediction and outlier detection: a distribution-free prediction set with a balanced objective

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction set) that constructs a prediction set C(x) which tries to optimize out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers x, for which the method returns no prediction (corresponding to C(x) equal to the empty set). BCOPS combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also develop a variant of BCOPS in the online setting where we optimize the misclassification loss averaged over a proxy of the out-of-sample distribution. We also describe new methods for the evaluation of out-of-sample performance with mismatched data. We prove asymptotic consistency and efficiency of the proposed methods under suitable assumptions and illustrate our methods on real data examples.

Learning Robotic Manipulation through Visual Planning and Acting

Planning for robotic manipulation requires reasoning about the changes a robot can affect on objects. When such interactions can be modelled analytically, as in domains with rigid objects, efficient planning algorithms exist. However, in both domestic and industrial domains, the objects of interest can be soft, or deformable, and hard to model analytically. For such cases, we posit that a data-driven modelling approach is more suitable. In recent years, progress in deep generative models has produced methods that learn to `imagine’ plausible images from data. Building on the recent Causal InfoGAN generative model, in this work we learn to imagine goal-directed object manipulation directly from raw image data of self-supervised interaction of the robot with the object. After learning, given a goal observation of the system, our model can generate an imagined plan — a sequence of images that transition the object into the desired goal. To execute the plan, we use it as a reference trajectory to track with a visual servoing controller, which we also learn from the data as an inverse dynamics model. In a simulated manipulation task, we show that separating the problem into visual planning and visual tracking control is more sample efficient and more interpretable than alternative data-driven approaches. We further demonstrate our approach on learning to imagine and execute in 3 environments, the final of which is deformable rope manipulation on a PR2 robot.

Knowledge Graph Convolutional Networks for Recommender Systems with Label Smoothness Regularization

Knowledge graphs capture interlinked information between entities and they represent an attractive source of structured information that can be harnessed for recommender systems. However, existing recommender engines use knowledge graphs by manually designing features, do not allow for end-to-end training, or provide poor scalability. Here we propose Knowledge Graph Convolutional Networks (KGCN), an end-to-end trainable framework that harnesses item relationships captured by the knowledge graph to provide better recommendations. Conceptually, KGCN computes user-specific item embeddings by first applying a trainable function that identifies important knowledge graph relations for a given user and then transforming the knowledge graph into a user-specific weighted graph. Then, KGCN applies a graph convolutional neural network that computes an embedding of an item node by propagating and aggregating knowledge graph neighborhood information. Moreover, to provide better inductive bias KGCN uses label smoothness (LS), which provides regularization over edge weights and we prove that it is equivalent to label propagation scheme on a graph. Finally, We unify KGCN and LS regularization, and present a scalable minibatch implementation for KGCN-LS model. Experiments show that KGCN-LS outperforms strong baselines in four datasets. KGCN-LS also achieves great performance in sparse scenarios and is highly scalable with respect to the knowledge graph size.

Controlled Natural Languages and Default Reasoning

Controlled natural languages (CNLs) are effective languages for knowledge representation and reasoning. They are designed based on certain natural languages with restricted lexicon and grammar. CNLs are unambiguous and simple as opposed to their base languages. They preserve the expressiveness and coherence of natural languages. In this report, we focus on a class of CNLs, called machine-oriented CNLs, which have well-defined semantics that can be deterministically translated into formal languages, such as Prolog, to do logical reasoning. Over the past 20 years, a number of machine-oriented CNLs emerged and have been used in many application domains for problem solving and question answering. However, few of them support non-monotonic inference. In our work, we propose non-monotonic extensions of CNL to support defeasible reasoning. In the first part of this report, we survey CNLs and compare three influential systems: Attempto Controlled English (ACE), Processable English (PENG), and Computer-processable English (CPL). We compare their language design, semantic interpretations, and reasoning services. In the second part of this report, we first identify typical non-monotonicity in natural languages, such as defaults, exceptions and conversational implicatures. Then, we propose their representation in CNL and the corresponding formalizations in a form of defeasible reasoning known as Logic Programming with Defaults and Argumentation Theory (LPDA).

Hadamard Matrix Guided Online Hashing

Online image hashing has received increasing research attention recently, which receives large-scale data in a streaming manner to update the hash functions on-the-fly. Its key challenge lies in the difficulty in balancing the learning timeliness and model accuracy. To this end, most works exploit a supervised setting, i.e., using class labels to boost the hashing performance, which defects in two aspects: First, large amount of training batches are required to learn up-to-date hash functions, which however largely increase the learning complexity. Second, strong constraints, e.g., orthogonal or similarity preserving, are used, which are however typically relaxed and lead to large accuracy drop. To handle the above challenges, in this paper, a novel supervised online hashing scheme termed Hadamard Matrix Guided Online Hashing (HMOH) is proposed. Our key innovation lies in the construction and usage of Hadamard matrix, which is an orthogonal binary matrix and is built via Sylvester method. To release the need of strong constraints, we regard each column of Hadamard matrix as the target code for each class label, which by nature satisfies several desired properties of hashing codes. To accelerate the online training, the LSH is first adopted to align the length of target code and the to-be-learned binary code. And then, we treat the learning of hash functions as a set of binary classification problems to fit the assigned target code. Finally, we propose to ensemble the learned models in all rounds to maximally preserve the information of past streaming data. The superior accuracy and efficiency of the proposed method are demonstrated through extensive experiments on three widely-used datasets comparing to various state-of-the-art methods.

Structural Equation Modeling using Computation Graphs

Structural equation modeling (SEM) is evolving as available data is becoming more complex, reaching the limits of what traditional estimation approaches can achieve. As SEM expands to ever larger, more complex applications, the estimation challenge grows and currently available methods will be insufficient. To overcome this challenge in SEM, we see an opportunity to use existing solutions from the field of deep learning, which has been pioneering methods for estimation of complex models for decades. To this end, this paper introduces computation graphs, a flexible method of specifying objective functions. When combined with state-of-the-art optimizers, we argue that our computation graph approach is capable not only of estimating SEM models, but also of rapidly extending them — without the need of bespoke software development for each new extension. We show that several SEM improvements follow naturally from our approach; not only existing extensions such as least absolute deviation estimation and penalized regression models, but also novel extensions such as spike-and-slab penalties for sparse factor analysis. By applying computation graphs to SEM, we hope to greatly accelerate the process of developing SEM techniques, paving the way for novel applications. The accompanying R package tensorsem is under active development.

Stability Properties of Graph Neural Networks

Data stemming from networks exhibit an irregular support, whereby each data element is related by arbitrary pairwise relationships determined by the network. Graph neural networks (GNNs) have emerged as information processing architectures that exploit the particularities of this underlying support. The use of nonlinearities in GNNs, coupled with the fact that filters are learned from data, raises mathematical challenges that have precluded the development of theoretical results that would give insight in the reasons for the remarkable performance of GNNs. In this work, we prove the property of stability, that states that a small change in the support of the data leads to a small (bounded) change in the output of the GNN. More specifically, we prove that the bound on the output difference of the GNN computed on one graph or another, is proportional to the difference between the graphs and the design parameters of the GNN, as long as the trained filters are integral Lipschitz. We exploit this result to provide some insights in the crucial effect that nonlinearities have in obtaining an architecture that is both stable and selective, a feat that is impossible to achieve if using only linear filters.

GraphSE$^2$: An Encrypted Graph Database for Privacy-Preserving Social Search

In this paper, we propose GraphSE^2, an encrypted graph database for online social network services to address massive data breaches. GraphSE^2 preserves the functionality of social search, a key enabler for quality social network services, where social search queries are conducted on a large-scale social graph and meanwhile perform set and computational operations on user-generated contents. To enable efficient privacy-preserving social search, GraphSE^2 provides an encrypted structural data model to facilitate parallel and encrypted graph data access. It is also designed to decompose complex social search queries into atomic operations and realise them via interchangeable protocols in a fast and scalable manner. We build GraphSE^2 with various queries supported in the Facebook graph search engine and implement a full-fledged prototype. Extensive evaluations on Azure Cloud demonstrate that GraphSE^2 is practical for querying a social graph with a million of users.

Interpret Federated Learning with Shapley Values

Federated Learning is introduced to protect privacy by distributing training data into multiple parties. Each party trains its own model and a meta-model is constructed from the sub models. In this way the details of the data are not disclosed in between each party. In this paper we investigate the model interpretation methods for Federated Learning, specifically on the measurement of feature importance of vertical Federated Learning where feature space of the data is divided into two parties, namely host and guest. For host party to interpret a single prediction of vertical Federated Learning model, the interpretation results, namely the feature importance, are very likely to reveal the protected data from guest party. We propose a method to balance the model interpretability and data privacy in vertical Federated Learning by using Shapley values to reveal detailed feature importance for host features and a unified importance value for federated guest features. Our experiments indicate robust and informative results for interpreting Federated Learning models.

Segregation Network for Multi-Class Novelty Detection

The problem of multiple class novelty detection is gaining increasing importance due to the large availability of multimedia data and the increasing requirement of the classification models to work in an open set scenario. To this end, novelty detection tries to answer this important question: given a test example should we even try to classify it? In this work, we design a novel deep learning framework, termed Segregation Network, which is trained using the mixup technique. We construct interpolated points using convex combinations of pairs of training data and use our novel loss function for prediction of its constituent classes. During testing, for each input query, mixed samples with the known class prototypes are generated and passed through the proposed network. The output of the network reveals the constituent classes which can be used to determine whether the incoming data is from the known class set or not. Our algorithm is trained using just the data from the known classes and does not require any auxiliary dataset or attributes. Extensive evaluation on two benchmark datasets namely Caltech-256 and Stanford Dogs and comparison with the state-of-the-art justifies the effectiveness of the proposed framework.

Novel Algorithms based on Majorization Minimization for Nonnegative Matrix Factorization

Matrix decomposition is ubiquitous and has applications in various fields like speech processing, data mining and image processing to name a few. Under matrix decomposition, nonnegative matrix factorization is used to decompose a nonnegative matrix into a product of two nonnegative matrices which gives some meaningful interpretation of the data. Thus, nonnegative matrix factorization has an edge over the other decomposition techniques. In this paper, we propose two novel iterative algorithms based on Majorization Minimization (MM)-in which we formulate a novel upper bound and minimize it to get a closed form solution at every iteration. Since the algorithms are based on MM, it is ensured that the proposed methods will be monotonic. The proposed algorithms differ in the updating approach of the two nonnegative matrices. The first algorithm-Iterative Nonnegative Matrix Factorization (INOM) sequentially updates the two nonnegative matrices while the second algorithm-Parallel Iterative Nonnegative Matrix Factorization (PARINOM) parallely updates them. We also prove that the proposed algorithms converge to the stationary point of the problem. Simulations were conducted to compare the proposed methods with the existing ones and was found that the proposed algorithms performs better than the existing ones in terms of computational speed and convergence. KeyWords: Nonnegative matrix factorization, Majorization Minimization, Big Data, Parallel, Multiplicative Update

Boosting Generative Models by Leveraging Cascaded Meta-Models

Deep generative models are effective methods of modeling data. However, it is not easy for a single generative model to faithfully capture the distributions of complex data such as images. In this paper, we propose an approach for boosting generative models, which cascades meta-models together to produce a stronger model. Any hidden variable meta-model (e.g., RBM and VAE) which supports likelihood evaluation can be leveraged. We derive a decomposable variational lower bound of the boosted model, which allows each meta-model to be trained separately and greedily. Besides, our framework can be extended to semi-supervised boosting, where the boosted model learns a joint distribution of data and labels. Finally, we combine our boosting framework with the multiplicative boosting framework, which further improves the learning power of generative models.

Deep Learning: a new definition of artificial neuron with double weight

Deep learning is a subset of a broader family of machine learning methods based on learning data representations. These models are inspired by human biological nervous systems, even if there are various differences pertaining to the structural and functional properties of biological brains. The elementary constituents of deep learning models are neurons, which can be considered as functions that receive inputs and produce an output that is a weighted sum of the inputs fed through an activation function. Several models of neurons were proposed in the course of the years that are all based on learnable parameters called weights. In this paper we present a new type of artificial neuron, the double-weight neuron,characterized by additional learnable weights that lead to a more complex and accurate system. We tested a feed-forward and convolutional neural network consisting of double-weight neurons on the MNIST dataset, and we tested a convolution network on the CIFAR-10 dataset. For MNIST we find a \approx 4\% and \approx 1\% improved classification accuracy, respectively, when compared to a standard feed-forward and convolutional neural network built with the same sets of hyperparameters. For CIFAR-10 we find a \approx 12\% improved classification accuracy. We thus conclude that this novel artificial neuron can be considered as a valuable alternative to common ones.

Understanding eWhoring

In this paper, we describe a new type of online fraud, referred to as ‘eWhoring’ by offenders. This crime script analysis provides an overview of the ‘eWhoring’ business model, drawing on more than 6,500 posts crawled from an online underground forum. This is an unusual fraud type, in that offenders readily share information about how it is committed in a way that is almost prescriptive. There are economic factors at play here, as providing information about how to make money from ‘eWhoring’ can increase the demand for the types of images that enable it to happen. We find that sexualised images are typically stolen and shared online. While some images are shared for free, these can quickly become ‘saturated’, leading to the demand for (and trade in) more exclusive ‘packs’. These images are then sold to unwitting customers who believe they have paid for a virtual sexual encounter. A variety of online services are used for carrying out this fraud type, including email, video, dating sites, social media, classified advertisements, and payment platforms. This analysis reveals potential interventions that could be applied to each stage of the crime commission process to prevent and disrupt this crime type.

Dissecting Graph Neural Networks on Graph Classification

Graph Neural Nets (GNNs) have received increasing attentions, partially due to their superior performance in many node and graph classification tasks. However, there is a lack of understanding on what they are learning and how sophisticated the learned graph functions are. In this work, we first propose Graph Feature Network (GFN), a simple lightweight neural net defined on a set of graph augmented features. We then propose a dissection of GNNs on graph classification into two parts: 1) the graph filtering, where graph-based neighbor aggregations are performed, and 2) the set function, where a set of hidden node features are composed for prediction. To test the importance of these two parts separately, we prove and leverage the connection that GFN can be derived by linearizing graph filtering part of GNN. Empirically we perform evaluations on common graph classification benchmarks. To our surprise, we find that, despite the simplification, GFN could match or exceed the best accuracies produced by recently proposed GNNs, with a fraction of computation cost. Our results provide new perspectives on both the functions that GNNs learned and the current benchmarks for evaluating them.

Language in Our Time: An Empirical Analysis of Hashtags

Hashtags in online social networks have gained tremendous popularity during the past five years. The resulting large quantity of data has provided a new lens into modern society. Previously, researchers mainly rely on data collected from Twitter to study either a certain type of hashtags or a certain property of hashtags. In this paper, we perform the first large-scale empirical analysis of hashtags shared on Instagram, the major platform for hashtag-sharing. We study hashtags from three different dimensions including the temporal-spatial dimension, the semantic dimension, and the social dimension. Extensive experiments performed on three large-scale datasets with more than 7 million hashtags in total provide a series of interesting observations. First, we show that the temporal patterns of hashtags can be categorized into four different clusters, and people tend to share fewer hashtags at certain places and more hashtags at others. Second, we observe that a non-negligible proportion of hashtags exhibit large semantic displacement. We demonstrate hashtags that are more uniformly shared among users, as quantified by the proposed hashtag entropy, are less prone to semantic displacement. In the end, we propose a bipartite graph embedding model to summarize users’ hashtag profiles, and rely on these profiles to perform friendship prediction. Evaluation results show that our approach achieves an effective prediction with AUC (area under the ROC curve) above 0.8 which demonstrates the strong social signals possessed in hashtags.

Deep Layered LMS Predictor

In this study, we present a new approach to design a Least Mean Squares (LMS) predictor. This approach exploits the concept of deep neural networks and their supremacy in terms of performance and accuracy. The new LMS predictor is implemented as a deep neural network using multiple non linear LMS filters. The network consists of multiple layers with nonlinear activation functions, where each neuron in the hidden layers corresponds to a certain FIR filter output which goes through nonlinearity. The output of the last layer is the prediction. We hypothesize that this approach will outperform the traditional adaptive filters.

Explainable AI for Trees: From Local Explanations to Global Understanding

Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model’s performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.

VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository

Researchers currently rely on ad hoc datasets to train automated visualization tools and evaluate the effectiveness of visualization designs. These exemplars often lack the characteristics of real-world datasets, and their one-off nature makes it difficult to compare different techniques. In this paper, we present VizNet: a large-scale corpus of over 31 million datasets compiled from open data repositories and online visualization galleries. On average, these datasets comprise 17 records over 3 dimensions and across the corpus, we find 51% of the dimensions record categorical data, 44% quantitative, and only 5% temporal. VizNet provides the necessary common baseline for comparing visualization design techniques, and developing benchmark models and algorithms for automating visual analysis. To demonstrate VizNet’s utility as a platform for conducting online crowdsourced experiments at scale, we replicate a prior study assessing the influence of user task and data distribution on visual encoding effectiveness, and extend it by considering an additional task: outlier detection. To contend with running such studies at scale, we demonstrate how a metric of perceptual effectiveness can be learned from experimental results, and show its predictive power across test datasets.

Kyrix: Interactive Visual Data Exploration at Scale

Scalable interactive visual data exploration is crucial in many domains due to increasingly large datasets generated at rapid rates. Details-on-demand provides a useful interaction paradigm for exploring large datasets, where users start at an overview, find regions of interest, zoom in to see detailed views, zoom out and then repeat. This paradigm is the primary user interaction mode of widely-used systems such as Google Maps, Aperture Tiles and ForeCache. These earlier systems, however, are highly customized with hardcoded visual representations and optimizations. A more general framework is needed to facilitate the development of visual data exploration systems at scale. In this paper, we present Kyrix, an end-to-end system for developing scalable details-on-demand data exploration applications. Kyrix provides developers with a declarative model for easy specification of general visualizations. Behind the scenes, Kyrix utilizes a suite of performance optimization techniques to achieve a response time within 500ms for various user interactions. We also report results from a performance study which shows that a novel dynamic fetching scheme adopted by Kyrix outperforms tile-based fetching used in earlier systems.

Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards

Intrinsic rewards are introduced to simulate how human intelligence works, which are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-level performance under this very challenging setting of intrinsically-motivated play. In this work, we propose a novel megalomania-driven intrinsic reward (mega-reward) which, to our knowledge, is the first approach that achieves comparable human-level performance in intrinsically-motivated play. The intuition of mega-rewards comes from the observation that infants’ intelligence develops when they try to gain more control on entities in an environment; therefore, mega-reward aims to maximize the control capabilities of agents on given entities in a given environment. To formalize mega-reward, a relational transition model is proposed to bridge the gaps between direct and latent control. Experimental studies show that mega-reward can (i) greatly outperform all state-of-the-art intrinsic reward approaches, (ii) generally achieves the same level of performance as Ex-PPO and professional human-level scores; and (iii) has also superior performance when it is incorporated with extrinsic reward.

Predictive Ensemble Learning with Application to Scene Text Detection

Deep learning based approaches have achieved significant progresses in different tasks like classification, detection, segmentation, and so on. Ensemble learning is widely known to further improve performance by combining multiple complementary models. It is easy to apply ensemble learning for classification tasks, for example, based on averaging, voting, or other methods. However, for other tasks (like object detection) where the outputs are varying in quantity and unable to be simply compared, the ensemble of multiple models become difficult. In this paper, we propose a new method called Predictive Ensemble Learning (PEL), based on powerful predictive ability of deep neural networks, to directly predict the best performing model among a pool of base models for each test example, thus transforming ensemble learning to a traditional classification task. Taking scene text detection as the application, where no suitable ensemble learning strategy exists, PEL can significantly improve the performance, compared to either individual state-of-the-art models, or the fusion of multiple models by non-maximum suppression. Experimental results show the possibility and potential of PEL in predicting different models’ performance based only on a query example, which can be extended for ensemble learning in many other complex tasks.

Learning to Convolve: A Generalized Weight-Tying Approach

Recent work (Cohen & Welling, 2016) has shown that generalizations of convolutions, based on group theory, provide powerful inductive biases for learning. In these generalizations, filters are not only translated but can also be rotated, flipped, etc. However, coming up with exact models of how to rotate a 3 x 3 filter on a square pixel-grid is difficult. In this paper, we learn how to transform filters for use in the group convolution, focussing on roto-translation. For this, we learn a filter basis and all rotated versions of that filter basis. Filters are then encoded by a set of rotation invariant coefficients. To rotate a filter, we switch the basis. We demonstrate we can produce feature maps with low sensitivity to input rotations, while achieving high performance on MNIST and CIFAR-10.

On Graph Classification Networks, Datasets and Baselines

Graph classification receives a great deal of attention from the non-Euclidean machine learning community. Recent advances in graph coarsening have enabled the training of deeper networks and produced new state-of-the-art results in many benchmark tasks. We examine how these architectures train and find that performance is highly-sensitive to initialisation and depends strongly on jumping-knowledge structures. We then show that, despite the great complexity of these models, competitive performance is achieved by the simplest of models — structure-blind MLP, single-layer GCN and fixed-weight GCN — and propose these be included as baselines in future.

Document worth reading: “Machine Learning for Data-Driven Movement Generation: a Review of the State of the Art”

The rise of non-linear and interactive media such as video games has increased the need for automatic movement animation generation. In this survey, we review and analyze different aspects of building automatic movement generation systems using machine learning techniques and motion capture data. We cover topics such as high-level movement characterization, training data, features representation, machine learning models, and evaluation methods. We conclude by presenting a discussion of the reviewed literature and outlining the research gaps and remaining challenges for future work. Machine Learning for Data-Driven Movement Generation: a Review of the State of the Art

Document worth reading: “AI-Powered Text Generation for Harmonious Human-Machine Interaction: Current State and Future Directions”

In the last two decades, the landscape of text generation has undergone tremendous changes and is being reshaped by the success of deep learning. New technologies for text generation ranging from template-based methods to neural network-based methods emerged. Meanwhile, the research objectives have also changed from generating smooth and coherent sentences to infusing personalized traits to enrich the diversification of newly generated content. With the rapid development of text generation solutions, one comprehensive survey is urgent to summarize the achievements and track the state of the arts. In this survey paper, we present the general systematical framework, illustrate the widely utilized models and summarize the classic applications of text generation. AI-Powered Text Generation for Harmonious Human-Machine Interaction: Current State and Future Directions

Distilled News

Classification and Regression Analysis with Decision Trees

A decision tree is a supervised machine learning model used to predict a target by learning decision rules from features. As the name suggests, we can think of this model as breaking down our data by making a decision based on asking a series of questions.

How to Decide Between Amazon SageMaker and Microsoft Azure Machine Learning Studio

I recently published a walk-thru of Microsoft Azure Machine Learning Studio (Studio) https://…tudio-clarifies-data-science-8e8d3e6ed64e and was favorably impressed with the simplicity and power. But there are other tools that also claim to make machine learning easier and speed model development. I am wondering how they compare? So, this week, I am taking a look at Amazon SageMaker (SageMaker) and how it compares to Studio.

Uber datasets in BigQuery: Driving times around SF (and your city too)

Uber keeps adding new cities to their public data program – let’s load them into BigQuery. We’ll take advantage of the latest new features: Native GIS functions, partitioning, clustering, and fast dashboards with BI Engine.

Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model

Speech-to-speech translation systems have been developed over the past several decades with the goal of helping people who speak different languages to communicate with each other. Such systems have usually been broken into three separate components: automatic speech recognition to transcribe the source speech as text, machine translation to translate the transcribed text into the target language, and text-to-speech synthesis (TTS) to generate speech in the target language from the translated text. Dividing the task into such a cascade of systems has been very successful, powering many commercial speech-to-speech translation products, including Google Translate.

Building Recommender systems with Azure Machine Learning service

Recommendation systems are used in a variety of industries, from retail to news and media. If you’ve ever used a streaming service or ecommerce site that has surfaced recommendations for you based on what you’ve previously watched or purchased, you’ve interacted with a recommendation system. With the availability of large amounts of data, many businesses are turning to recommendation systems as a critical revenue driver. However, finding the right recommender algorithms can be very time consuming for data scientists. This is why Microsoft has provided a GitHub repository with Python best practice examples to facilitate the building and evaluation of recommendation systems using Azure Machine Learning services.

The Inherent Insecurity in Neural Networks and Machine Learning Based Applications

Deep neural networks are inherently fuzzy. Each type of neural network (Traditional, Convolutional, Recurrent, etc.) have a set of weight connections (W41,W42, … W87 parameters) that are randomly initialized and updated as data is pumped through the system and errors are back propagated to correct the weight connection values. After training, these weights approximate a function that fits the input and output of the trained data. However, the distribution of weight values is not perfect and can only generalize based on inputs and outputs that the neural network has seen. The problem with neural networks is that they will never be perfect and they fail gracefully (not letting you know that they failed incorrectly – often times classifying with high confidence). Ideally, you want a system to notify you when there is a failure. With neural networks if you feed it a random set of static images, it may provide incorrect output object classifications with high confidence.

IBM’s Quest to Solve the Continual Learning Problem and Build Neural Networks Without Amnesia

I often joke that neural networks suffers from a continuous amnesia problem in the sense that they every time they are retrained they lost the knowledge accumulated in previous iterations. Building neural networks that can learn incrementally without forgetting is one of the existential challenges facing the current generation of deep learning solutions. Recently, researchers from IBM published a paper proposing a method for continual learning proposing that allow the implementation of neural networks that can build incremental knowledge. Neural networks have achieved impressive milestones in the last few years from beating Go to multi-player games. However, neural network architectures remain constrained to very specific domains and unable to transition its knowledge into new areas. Furthermore, current neural network models are only effective if trained over large stationary distributions of data and struggle when training over changing non-stationary distributions of data. In other words, neural networks can effectively solve many tasks when trained from scratch and continually sample from all tasks many times until training has converged. Meanwhile, they struggle when training incrementally if there is a time dependence to the data received. Paradoxically, most real world AI scenarios are based on incremental, and not stationary, knowledge. Throughout the history of artificial intelligence(AI), there have been several theories and proposed models to deal with the continual learning challenge.

We Need to Talk, AI

A Comic Essay on Artificial Intelligence

SingularityNET’s AI Team Experiments With Generative Capsule Networks

The SingularityNET AI team has been experimenting with Generative Capsule Networks (CapsNets), which have the potential to structure SingularityNET’s disparate data and services. Generative CapsNets have demonstrated the capability to generalize the process of generating shifted images for never encountered shifts far outside the range of the training set. CapsNets have shown better performance in the task of one-shot transferring the capability to reconstruct rotated images from some classes to others, and to extrapolate outside the range of the training set. Models with better generalization and transfer learning performance would greatly benefit SingularityNET, which nodes will be heavily reused for different tasks and trained on user data, which might be not well-prepared.

Time Series Forecasting – A Getting Started Guide

When I started writing this post I thought of just explaining how to do predictions with a ‘simple’ time series (aka univariate time series). The challenging part of the project I was in, however, was the fact that the prediction needed to be made in conjunction with multiple variables. For this reason, I decided to bring this guide a little bit closer to reality and use a multivariate time series.

Tutorial for Using Confidence Intervals & Bootstrapping

In this tutorial I will attempt to show how the use of bootstrapping and confidence intervals can help with highlighting statistically significant differences between sample distributions.

The Power of Visualization

This article focuses on the importance of visualization with data. The amount and complexity of information produced in science, engineering, business, and everyday human activity is increasing at staggering rates. Good visualizations not only present a visual interpretation of data, but do so by improving comprehension, communication, and decision making. The importance of visualization is a topic taught to almost every data scientist in an entry-level course at university but is mastered by very few individuals. It is often regarded as obvious or unimportant due to its inherently subjective nature. In this article, I hope to dispel some of those thoughts and show you that visualization is incredibly important, not just in the field of data science, but for communicating any form of information. I will aim to show the reader, through multiple examples, the impact a well-designed visualization can have at communicating an idea or piece of information. In addition, I will discuss the best practices for making effective visualizations, and how one can go about developing their own visualizations and the resources that are available to go about doing this. I hope you enjoy this visual journey and learn something in the process.

Multivariate Outlier Detection

I was given 3 GB of Machine Generated data being fed by 120 sensors (5 records every second) in an excel format. The task in hand was to mine out interesting patterns, if any, from the data. I fed the data in R in my local machine and performed various descriptive and exploratory analysis to have some insights. Customer was also looking for some low cost maintenance mechanisms for their machines. So I thought if I could study the outliers and provide some information about system health. This could also be monitored real time using dashboards and if possible could forecast at a near future time point for early alarm and predictive maintenance. So this became a case of outlier detection in 120 dimensional space. Now, as I studied, values in around 90 columns were found to be constant over the entire time period and were contributing nothing towards system noise. So I dropped them.

From Word Embeddings to Pretrained Language Models – A New Age in NLP – Part 2

For words to be processed by machine learning models, they need some form of numeric representation that models can use in their calculation. This is part 2 of a two part series where I look at how the word to vector representation methodologies have evolved over time. If you haven’t read Part 1 of this series, I recommend checking that out first!

From Word Embeddings to Pretrained Language Models – A New Age in NLP – Part 1

For words to be processed by machine learning models, they need some form of numeric representation that models can use in their calculation. This is part 1 of a 2 part series where I look at how the word to vector representation methodologies have evolved over time. Part 2 can be found here.

Book Memo: “Probabilistic Data Structures and Algorithms for Big Data Applications”

A technical book about popular space-efficient data structures and fast algorithms that are extremely useful in modern Big Data applications. The purpose of this book is to introduce technology practitioners, including software architects and developers, as well as technology decision makers to probabilistic data structures and algorithms. Reading this book, you will get a theoretical and practical understanding of probabilistic data structures and learn about their common uses.