# Whats new on arXiv

Artificial Intelligence (AI) systems sometimes make errors and will make errors in the future, from time to time. These errors are usually unexpected, and can lead to dramatic consequences. Intensive development of AI and its practical applications makes the problem of errors more important. Total re-engineering of the systems can create new errors and is not always possible due to the resources involved. The important challenge is to develop fast methods to correct errors without damaging existing skills. We formulated the technical requirements to the ‘ideal’ correctors. Such correctors include binary classifiers, which separate the situations with high risk of errors from the situations where the AI systems work properly. Surprisingly, for essentially high-dimensional data such methods are possible: simple linear Fisher discriminant can separate the situations with errors from correctly solved tasks even for exponentially large samples. The paper presents the probabilistic basis for fast non-destructive correction of AI systems. A series of new stochastic separation theorems is proven. These theorems provide new instruments for fast non-iterative correction of errors of legacy AI systems. The new approaches become efficient in high-dimensions, for correction of high-dimensional systems in high-dimensional world (i.e. for processing of essentially high-dimensional data by large systems).
Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.
We use topological data analysis as a tool to analyze the fit of mathematical models to experimental data. This study is built on data obtained from motion tracking groups of aphids in [Nilsen et al., PLOS One, 2013] and two random walk models that were proposed to describe the data. One model incorporates social interactions between the insects, and the second model is a control model that excludes these interactions. We compare data from each model to data from experiment by performing statistical tests based on three different sets of measures. First, we use time series of order parameters commonly used in collective motion studies. These order parameters measure the overall polarization and angular momentum of the group, and do not rely on a priori knowledge of the models that produced the data. Second, we use order parameter time series that do rely on a priori knowledge, namely average distance to nearest neighbor and percentage of aphids moving. Third, we use computational persistent homology to calculate topological signatures of the data. Analysis of the a priori order parameters indicates that the interactive model better describes the experimental data than the control model does. The topological approach performs as well as these a priori order parameters and better than the other order parameters, suggesting the utility of the topological approach in the absence of specific knowledge of mechanisms underlying the data.
Transfer learning is one of the subjects undergoing intense study in the area of machine learning. In object recognition and object detection there are known experiments for the transferability of parameters, but not for neural networks which are suitable for object-detection in real time embedded applications, such as the SqueezeDet neural network. We use transfer learning to accelerate the training of SqueezeDet to a new group of classes. Also, experiments are conducted to study the transferability and co-adaptation phenomena introduced by the transfer learning process. To accelerate training, we propose a new implementation of the SqueezeDet training which provides a faster pipeline for data processing and achieves $1.8$ times speedup compared to the initial implementation. Finally, we created a mechanism for automatic hyperparamer optimization using an empirical method.
Temporal segmentation of long videos is an important problem, that has largely been tackled through supervised learning, often requiring large amounts of annotated training data. In this paper, we tackle the problem of self-supervised temporal segmentation of long videos that alleviate the need for any supervision. We introduce a self-supervised, predictive learning framework that draws inspiration from cognitive psychology to segment long, visually complex videos into individual, stable segments that share the same semantics. We also introduce a new adaptive learning paradigm that helps reduce the effect of catastrophic forgetting in recurrent neural networks. Extensive experiments on three publicly available datasets – Breakfast Actions, 50 Salads, and INRIA Instructional Videos datasets show the efficacy of the proposed approach. We show that the proposed approach is able to outperform weakly-supervised and other unsupervised learning approaches by up to 24% and have competitive performance compared to fully supervised approaches. We also show that the proposed approach is able to learn highly discriminative features that help improve action recognition when used in a representation learning paradigm.
Academic literature on machine learning modeling fails to address how to make machine learning models work for enterprises. For example, existing machine learning processes cannot address how to define business use cases for an AI application, how to convert business requirements from offering managers into data requirements for data scientists, and how to continuously improve AI applications in term of accuracy and fairness, and how to customize general purpose machine learning models with industry, domain, and use case specific data to make them more accurate for specific situations etc. Making AI work for enterprises requires special considerations, tools, methods and processes. In this paper we present a maturity framework for machine learning model lifecycle management for enterprises. Our framework is a re-interpretation of the software Capability Maturity Model (CMM) for machine learning model development process. We present a set of best practices from our personal experience of building large scale real-world machine learning models to help organizations achieve higher levels of maturity independent of their starting point.
The doctrinal paradox is analysed from a probabilistic point of view assuming a simple parametric model for the committee’s behaviour. The well known issue-by-issue and case-by-case majority rules are compared in this model, by means of the concepts of false positive rate (FPR), false negative rate (FNR) and Receiver Operating Characteristics (ROC) space. We introduce also a new rule that we call path-by-path, which is somehow halfway between the other two. Under our model assumptions, the issue-by-issue rule is shown to be the best of the three according to an optimality criterion based in ROC maps, for all values of the model parameters (committee size and competence of its members), when equal weight is given to FPR an FNR. For unequal weights, the relative goodness of the rules depends on the values of the competence and the weights, in a way which is precisely described. The results are illustrated with some numerical examples.
Artificial intelligence systems are being increasingly deployed due to their potential to increase the efficiency, scale, consistency, fairness, and accuracy of decisions. However, as many of these systems are opaque in their operation, there is a growing demand for such systems to provide explanations for their decisions. Conventional approaches to this problem attempt to expose or discover the inner workings of a machine learning model with the hope that the resulting explanations will be meaningful to the consumer. In contrast, this paper suggests a new approach to this problem. It introduces a simple, practical framework, called Teaching Explanations for Decisions (TED), that provides meaningful explanations that match the mental model of the consumer. We illustrate the generality and effectiveness of this approach with two different examples, resulting in highly accurate explanations with no loss of prediction accuracy for these two examples.
Social goods, such as healthcare, smart city, and information networks, often produce ordered event data in continuous time. The generative processes of these event data can be very complex, requiring flexible models to capture their dynamics. Temporal point processes offer an elegant framework for modeling event data without discretizing the time. However, the existing maximum-likelihood-estimation (MLE) learning paradigm requires hand-crafting the intensity function beforehand and cannot directly monitor the goodness-of-fit of the estimated model in the process of training. To alleviate the risk of model-misspecification in MLE, we propose to generate samples from the generative model and monitor the quality of the samples in the process of training until the samples and the real data are indistinguishable. We take inspiration from reinforcement learning (RL) and treat the generation of each event as the action taken by a stochastic policy. We parameterize the policy as a flexible recurrent neural network and gradually improve the policy to mimic the observed event distribution. Since the reward function is unknown in this setting, we uncover an analytic and nonparametric form of the reward function using an inverse reinforcement learning formulation. This new RL framework allows us to derive an efficient policy gradient algorithm for learning flexible point process models, and we show that it performs well in both synthetic and real data.
Nearest Neighbors Algorithm is a Lazy Learning Algorithm, in which the algorithm tries to approximate the predictions with the help of similar existing vectors in the training dataset. The predictions made by the K-Nearest Neighbors algorithm is based on averaging the target values of the spatial neighbors. The selection process for neighbors in the Hermitian space is done with the help of distance metrics such as Euclidean distance, Minkowski distance, Mahalanobis distance etc. A majority of the metrics such as Euclidean distance are scale variant, meaning that the results could vary for different range of values used for the features. Standard techniques used for the normalization of scaling factors are feature scaling method such as Z-score normalization technique, Min-Max scaling etc. Scaling methods uniformly assign equal weights to all the features, which might result in a non-ideal situation. This paper proposes a novel method to assign weights to individual feature with the help of out of bag errors obtained from constructing multiple decision tree models.
In stream processing, stream join is one of the critical sources of performance bottlenecks. The sliding-window-based stream join provides a precise result but consumes considerable computational resources. The current solutions lack support for the join predicates on large windows. These algorithms and their hardware accelerators are either limited to equi-join or use a nested loop join to process all the requests. In this paper, we present a new algorithm called PanJoin which has high throughput on large windows and supports both equi-join and non-equi-join. PanJoin implements three new data structures to reduce computations during the probing phase of stream join. We also implement the most hardware-friendly data structure, called BI-Sort, on FPGA. Our evaluation shows that PanJoin outperforms several recently proposed stream join methods by more than 1000x, and it also adapts well to highly skewed data.
We propose a general theoretical method for analyzing the risk bound in the presence of adversaries. In particular, we try to fit the adversarial learning problem into the minimax framework. We first show that the original adversarial learning problem could be reduced to a minimax statistical learning problem by introducing a transport map between distributions. Then we prove a risk bound for this minimax problem in terms of covering numbers. In contrast to previous minimax bounds in \cite{lee,far}, our bound is informative when the radius of the ambiguity set is small. Our method could be applied to multi-class classification problems and commonly-used loss functions such as hinge loss and ramp loss. As two illustrative examples, we derive the adversarial risk bounds for kernel-SVM and deep neural networks. Our results indicate that a stronger adversary might have a negative impact on the complexity of the hypothesis class and the existence of margin could serve as a defense mechanism to counter adversarial attacks.
As the first robotic platforms slowly approach our everyday life, we can imagine a near future where service robots will be easily accessible by non-expert users through vocal interfaces. The capability of managing natural language would indeed speed up the process of integrating such platform in the ordinary life. Semantic parsing is a fundamental task of the Natural Language Understanding process, as it allows extracting the meaning of a user utterance to be used by a machine. In this paper, we present a preliminary study to semantically parse user vocal commands for a House Service robot, using a multi-layer Long-Short Term Memory neural network with attention mechanism. The system is trained on the Human Robot Interaction Corpus, and it is preliminarily compared with previous approaches.
Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components. The current state of the art for automated anomaly detection employs Machine Learning methods or statistical regression models in a supervised fashion, meaning that the detection tool is trained to distinguish among a fixed set of behaviour classes (healthy and unhealthy states). We propose a novel approach for anomaly detection in High Performance Computing systems based on a Machine (Deep) Learning technique, namely a type of neural network called autoencoder. The key idea is to train a set of autoencoders to learn the normal (healthy) behaviour of the supercomputer nodes and, after training, use them to identify abnormal conditions. This is different from previous approaches which where based on learning the abnormal condition, for which there are much smaller datasets (since it is very hard to identify them to begin with). We test our approach on a real supercomputer equipped with a fine-grained, scalable monitoring infrastructure that can provide large amount of data to characterize the system behaviour. The results are extremely promising: after the training phase to learn the normal system behaviour, our method is capable of detecting anomalies that have never been seen before with a very good accuracy (values ranging between 88% and 96%).

# If you did not already know

automated CLAUse DETectEr (Claudette)
Machine Learning Powered Analysis of Consumer Contracts and Privacy Policies. CLAUDETTE – ‘automated CLAUse DETectEr’ – is an interdisciplinary research project hosted at the Law Department of the European University Institute, led by professors Giovanni Sartor and Hans-W. Micklitz, in cooperation with engineers from University of Bologna and University of Modena and Reggio Emilia. The research objective is to test to what extent is it possible to automate reading and legal assessment of online consumer contracts and privacy policies, to evaluate their compliance with EU´s unfair contractual terms law and personal data protection law (GDPR), using machine learning and grammar-based approaches. The idea arose out of bewilderment. Having read dozens of terms of service and of privacy policies of online platforms, we came to conclusion that despite substantive law in place, and despite enforcers´ competence for abstract control, providers of online services still tend to use unfair and unlawful clauses in these documents. Hence, the idea to automate parts of enforcement process by delegating certain tasks to machines. On one hand, we believe that relying on automation can increase quality and effectiveness of legal work of enforcers. On the other, we want to empower consumers themselves, by giving them tools to quickly assess whether what they agree to online is fair and/or lawful. …

Dfuntest
New ideas in distributed systems (algorithms or protocols) are commonly tested by simulation, because experimenting with a prototype deployed on a realistic platform is cumbersome. However, a prototype not only measures performance but also verifies assumptions about the underlying system. We developed dfuntest – a testing framework for distributed applications that defines abstractions and test structure, and automates experiments on distributed platforms. Dfuntest aims to be jUnit’s analogue for distributed applications; a framework that enables the programmer to write robust and flexible scenarios of experiments. Dfuntest requires minimal bindings that specify how to deploy and interact with the application. Dfuntest’s abstractions allow execution of a scenario on a single machine, a cluster, a cloud, or any other distributed infrastructure, e.g. on PlanetLab. A scenario is a procedure; thus, our framework can be used both for functional tests and for performance measurements. We show how to use dfuntest to deploy our DHT prototype on 60 PlanetLab nodes and verify whether the prototype maintains a correct topology. …

MapReduce for C (MR4C)
MR4C is an implementation framework that allows you to run native code within the Hadoop execution framework. Pairing the performance and flexibility of natively developed algorithms with the unfettered scalability and throughput inherent in Hadoop, MR4C enables large-scale deployment of advanced data processing applications. …

# Whats new on arXiv

Item-based Collaborative Filtering(short for ICF) has been widely adopted in recommender systems in industry, owing to its strength in user interest modeling and ease in online personalization. By constructing a user’s profile with the items that the user has consumed, ICF recommends items that are similar to the user’s profile. With the prevalence of machine learning in recent years, significant processes have been made for ICF by learning item similarity (or representation) from data. Nevertheless, we argue that most existing works have only considered linear and shallow relationship between items, which are insufficient to capture the complicated decision-making process of users. In this work, we propose a more expressive ICF solution by accounting for the nonlinear and higher-order relationship among items. Going beyond modeling only the second-order interaction (e.g. similarity) between two items, we additionally consider the interaction among all interacted item pairs by using nonlinear neural networks. Through this way, we can effectively model the higher-order relationship among items, capturing more complicated effects in user decision-making. For example, it can differentiate which historical itemsets in a user’s profile are more important in affecting the user to make a purchase decision on an item. We treat this solution as a deep variant of ICF, thus term it as DeepICF. To justify our proposal, we perform empirical studies on two public datasets from MovieLens and Pinterest. Extensive experiments verify the highly positive effect of higher-order item interaction modeling with nonlinear neural networks. Moreover, we demonstrate that by more fine-grained second-order interaction modeling with attention network, the performance of our DeepICF method can be further improved.
Learning representation on graph plays a crucial role in numerous tasks of pattern recognition. Different from grid-shaped images/videos, on which local convolution kernels can be lattices, however, graphs are fully coordinate-free on vertices and edges. In this work, we propose a Gaussian-induced convolution (GIC) framework to conduct local convolution filtering on irregular graphs. Specifically, an edge-induced Gaussian mixture model is designed to encode variations of subgraph region by integrating edge information into weighted Gaussian models, each of which implicitly characterizes one component of subgraph variations. In order to coarsen a graph, we derive a vertex-induced Gaussian mixture model to cluster vertices dynamically according to the connection of edges, which is approximately equivalent to the weighted graph cut. We conduct our multi-layer graph convolution network on several public datasets of graph classification. The extensive experiments demonstrate that our GIC is effective and can achieve the state-of-the-art results.
Matrix factorization (MF) has been widely used to discover the low-rank structure and to predict the missing entries of data matrix. In many real-world learning systems, the data matrix can be very high-dimensional but sparse. This poses an imbalanced learning problem, since the scale of missing entries is usually much larger than that of observed entries, but they cannot be ignored due to the valuable negative signal. For efficiency concern, existing work typically applies a uniform weight on missing entries to allow a fast learning algorithm. However, this simplification will decrease modeling fidelity, resulting in suboptimal performance for downstream applications. In this work, we weight the missing data non-uniformly, and more generically, we allow any weighting strategy on the missing data. To address the efficiency challenge, we propose a fast learning method, for which the time complexity is determined by the number of observed entries in the data matrix, rather than the matrix size. The key idea is two-fold: 1) we apply truncated SVD on the weight matrix to get a more compact representation of the weights, and 2) we learn MF parameters with element-wise alternating least squares (eALS) and memorize the key intermediate variables to avoid repeating computations that are unnecessary. We conduct extensive experiments on two recommendation benchmarks, demonstrating the correctness, efficiency, and effectiveness of our fast eALS method.
I describe an optimal control view of adversarial machine learning, where the dynamical system is the machine learner, the input are adversarial actions, and the control costs are defined by the adversary’s goals to do harm and be hard to detect. This view encompasses many types of adversarial machine learning, including test-item attacks, training-data poisoning, and adversarial reward shaping. The view encourages adversarial machine learning researcher to utilize advances in control theory and reinforcement learning.
Knowledge graph embedding has been an active research topic for knowledge base completion, with progressive improvement from the initial TransE, TransH, DistMult et al to the current state-of-the-art ConvE. ConvE uses 2D convolution over embeddings and multiple layers of nonlinear features to model knowledge graphs. The model can be efficiently trained and scalable to large knowledge graphs. However, there is no structure enforcement in the embedding space of ConvE. The recent graph convolutional network (GCN) provides another way of learning graph node embedding by successfully utilizing graph connectivity structure. In this work, we propose a novel end-to-end Structure-Aware Convolutional Networks (SACN) that take the benefit of GCN and ConvE together. SACN consists of an encoder of a weighted graph convolutional network (WGCN), and a decoder of a convolutional network called Conv-TransE. WGCN utilizes knowledge graph node structure, node attributes and relation types. It has learnable weights that collect adaptive amount of information from neighboring graph nodes, resulting in more accurate embeddings of graph nodes. In addition, the node attributes are added as the nodes and are easily integrated into the WGCN. The decoder Conv-TransE extends the state-of-the-art ConvE to be translational between entities and relations while keeps the state-of-the-art performance as ConvE. We demonstrate the effectiveness of our proposed SACN model on standard FB15k-237 and WN18RR datasets, and present about 10% relative improvement over the state-of-the-art ConvE in terms of HITS@1, HITS@3 and HITS@10.
Generating paraphrases, that is, different variations of a sentence conveying the same meaning, is an important yet challenging task in NLP. Automatically generating paraphrases has its utility in many NLP tasks like question answering, information retrieval, conversational systems to name a few. In this paper, we introduce iterative refinement of generated paraphrases within VAE based generation framework. Current sequence generation models lack the capability to (1) make improvements once the sentence is generated; (2) rectify errors made while decoding. We propose a technique to iteratively refine the output using multiple decoders, each one attending on the output sentence generated by the previous decoder. We improve current state of the art results significantly – with over 9% and 28% absolute increase in METEOR scores on Quora question pairs and MSCOCO datasets respectively. We also show qualitatively through examples that our re-decoding approach generates better paraphrases compared to a single decoder by rectifying errors and making improvements in paraphrase structure, inducing variations and introducing new but semantically coherent information.
Genetic Programming (GP) is an evolutionary computation technique to solve problems in an automated, domain-independent way. Rather than identifying the optimum of a function as in more traditional evolutionary optimization, the aim of GP is to evolve computer programs with a given functionality. A population of programs is evolved using variation operators inspired by Darwinian evolution (crossover and mutation) and natural selection principles to guide the search process towards better programs. While many GP applications have produced human competitive results, the theoretical understanding of what problem characteristics and algorithm properties allow GP to be effective is comparatively limited. Compared to traditional evolutionary algorithms for function optimization, GP applications are further complicated by two additional factors: the variable length representation of candidate programs, and the difficulty of evaluating their quality efficiently. Such difficulties considerably impact the runtime analysis of GP where space complexity also comes into play. As a result initial complexity analyses of GP focused on restricted settings such as evolving trees with given structures or estimating the quality of solutions using only a small polynomial number of input/output examples. However, the first runtime analyses concerning GP applications for evolving proper functions with defined input/output behavior have recently appeared. In this chapter, we present an overview of the state-of-the-art.
Cybersecurity attacks in Cloud data centres are increasing alongside the growth of the Cloud services market. Existing research proposes a number of anomaly detection systems for detecting such attacks. However, these systems encounter a number of challenges, specifically due to the unknown behaviour of the attacks and the occurrence of genuine Cloud workload spikes, which must be distinguished from attacks. In this paper, we discuss these challenges and investigate the issues with the existing Cloud anomaly detection approaches. Then, we propose a Real-time Anomaly Detection System (RADS) for Cloud data centres, which uses a one class classification algorithm and a window-based time series analysis to address the challenges. Specifically, RADS can detect VM-level anomalies occurring due to DDoS and cryptomining attacks. We evaluate the performance of RADS by running lab-based experiments and by using real-world Cloud workload traces. Evaluation results demonstrate that RADS can achieve 90-95% accuracy with a low false positive rate of 0-3%. The results further reveal that RADS experiences fewer false positives when using its window-based time series analysis in comparison to using state-of-the-art average or entropy based analysis.
Binary classification problems can be naturally modeled as bipartite graphs, where we attempt to classify right nodes based on their left adjacencies. We consider the case of labeled bipartite graphs in which some labels and edges are not trustworthy. Our goal is to reduce noise by identifying and fixing these labels and edges. We first propose a geometric technique for generating random graph instances with untrustworthy labels and analyze the resulting graph properties. We focus on generating graphs which reflect real-world data, where degree and label frequencies follow power law distributions. We review several algorithms for the problem of detection and correction, proposing novel extensions and making observations specific to the bipartite case. These algorithms range from math programming algorithms to discrete combinatorial algorithms to Bayesian approximation algorithms to machine learning algorithms. We compare the performance of all these algorithms using several metrics and, based on our observations, identify the relative strengths and weaknesses of each individual algorithm.
An interpretable generative model for handwritten digits synthesis is proposed in this work. Modern image generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), are trained by backpropagation (BP). The training process is complex and the underlying mechanism is difficult to explain. We propose an interpretable multi-stage PCA method to achieve the same goal and use handwritten digit images synthesis as an illustrative example. First, we derive principal-component-analysis-based (PCA-based) transform kernels at each stage based on the covariance of its inputs. This results in a sequence of transforms that convert input images of correlated pixels to spectral vectors of uncorrelated components. In other words, it is a whitening process. Then, we can synthesize an image based on random vectors and multi-stage transform kernels through a coloring process. The generative model is a feedforward (FF) design since no BP is used in model parameter determination. Its design complexity is significantly lower, and the whole design process is explainable. Finally, we design an FF generative model using the MNIST dataset, compare synthesis results with those obtained by state-of-the-art GAN and VAE methods, and show that the proposed generative model achieves comparable performance.
The literature on the reproducibility crisis presents several putative causes for the proliferation of irreproducible results, including HARKing, p-hacking and publication bias. Without a theory of reproducibility, however, it is difficult to determine whether these putative causes can explain most irreproducible results. Drawing from an historically informed conception of science that is open and collaborative, we identify the components of an idealized experiment and analyze these components as a precursor to develop such a theory. Openness, we suggest, has long been intuitively proposed as a solution to irreproducibility. However, this intuition has not been validated in a theoretical framework. Our concern is that the under-theorizing of these concepts can lead to flawed inferences about the (in)validity of experimental results or integrity of individual scientists. We use probabilistic arguments and examine how openness of experimental components relates to reproducibility of results. We show that there are some impediments to obtaining reproducible results that precede many of the causes often cited in literature on the reproducibility crisis. For example, even if erroneous practices such as HARKing, p-hacking, and publication bias were absent at the individual and system level, reproducibility may still not be guaranteed.
The paper proposes an on-line monitoring framework for continuous real-time safety/security in learning-based control systems (specifically application to a unmanned ground vehicle). We monitor validity of mappings from sensor inputs to actuator commands, controller-focused anomaly detection (CFAM), and from actuator commands to sensor inputs, system-focused anomaly detection (SFAM). CFAM is an image conditioned energy based generative adversarial network (EBGAN) in which the energy based discriminator distinguishes between proper and anomalous actuator commands. SFAM is based on an action condition video prediction framework to detect anomalies between predicted and observed temporal evolution of sensor data. We demonstrate the effectiveness of the approach on our autonomous ground vehicle for indoor environments and on Udacity dataset for outdoor environments.
Incorporating knowledge graph into recommender systems has attracted increasing attention in recent years. By exploring the interlinks within a knowledge graph, the connectivity between users and items can be discovered as paths, which provide rich and complementary information to user-item interactions. Such connectivity not only reveals the semantics of entities and relations, but also helps to comprehend a user’s interest. However, existing efforts have not fully explored this connectivity to infer user preferences, especially in terms of modeling the sequential dependencies within and holistic semantics of a path. In this paper, we contribute a new model named Knowledge-aware Path Recurrent Network (KPRN) to exploit knowledge graph for recommendation. KPRN can generate path representations by composing the semantics of both entities and relations. By leveraging the sequential dependencies within a path, we allow effective reasoning on paths to infer the underlying rationale of a user-item interaction. Furthermore, we design a new weighted pooling operation to discriminate the strengths of different paths in connecting a user with an item, endowing our model with a certain level of explainability. We conduct extensive experiments on two datasets about movie and music, demonstrating significant improvements over state-of-the-art solutions Collaborative Knowledge Base Embedding and Neural Factorization Machine.
Interactive Machine Learning (IML) is an iterative learning process that tightly couples a human with a machine learner, which is widely used by researchers and practitioners to effectively solve a wide variety of real-world application problems. Although recent years have witnessed the proliferation of IML in the field of visual analytics, most recent surveys either focus on a specific area of IML or aim to summarize a visualization field that is too generic for IML. In this paper, we systematically review the recent literature on IML and classify them into a task-oriented taxonomy built by us. We conclude the survey with a discussion of open challenges and research opportunities that we believe are inspiring for future work in IML.
In the world of targeted learning, cross-validated targeted maximum likelihood estimators, CV-TMLE \parencite{Zheng:2010aa}, has a distinct advantage over TMLE \parencite{Laan:2006aa} in that one less condition is required of CV-TMLE in order to achieve asymptotic efficiency in the nonparametric or semiparametric settings. CV-TMLE as originally formulated, consists of averaging usually 10 (for 10-fold cross-validation) parameter estimates, each of which is performed on a validation set separate from where the initial fit was trained. The targeting step is usually performed as a pooled regression over all validation folds but in each fold, we separately evaluate any means as well as the parameter estimate. One nice thing about CV-TMLE, is that we average 10 plug-in estimates so the plug-in quality of preserving the natural parameter bounds is respected. Our adjustment of this procedure also preserves the plug-in characteristic as well as avoids the donsker condtion. The advantage of our procedure is the implementation of the targeting is identical to that of a regular TMLE, once all the validation set initial predictions have been formed. In short, we stack the validation set predictions and pretend as if we have a regular TMLE, which is not necessarily quite a plug-in estimator on each fold but overall will perform asymptotically the same and might have some slight advantage, a subject for future research. In the case of average treatment effect, treatment specific mean and mean outcome under a stochastic intervention, the procedure overlaps exactly with the originally formulated CV-TMLE with a pooled regression for the targeting.
Anomaly detection using dimensionality reduction has been an essential technique for monitoring multidimensional data. Although deep learning-based methods have been well studied for their remarkable detection performance, their interpretability is still a problem. In this paper, we propose a novel algorithm for estimating the dimensions contributing to the detected anomalies by using variational autoencoders (VAEs). Our algorithm is based on an approximative probabilistic model that considers the existence of anomalies in the data, and by maximizing the log-likelihood, we estimate which dimensions contribute to determining data as an anomaly. The experiments results with benchmark datasets show that our algorithm extracts the contributing dimensions more accurately than baseline methods.
Concepts, which represent a group of different instances sharing common properties, are essential information in knowledge representation. Most conventional knowledge embedding methods encode both entities (concepts and instances) and relations as vectors in a low dimensional semantic space equally, ignoring the difference between concepts and instances. In this paper, we propose a novel knowledge graph embedding model named TransC by differentiating concepts and instances. Specifically, TransC encodes each concept in knowledge graph as a sphere and each instance as a vector in the same semantic space. We use the relative positions to model the relations between concepts and instances (i.e., instanceOf), and the relations between concepts and sub-concepts (i.e., subClassOf). We evaluate our model on both link prediction and triple classification tasks on the dataset based on YAGO. Experimental results show that TransC outperforms state-of-the-art methods, and captures the semantic transitivity for instanceOf and subClassOf relation. Our codes and datasets can be obtained from h…/ github.com/davidlvxin/TransC.
Data similarity (or distance) computation is a fundamental research topic which underpins many high-level applications based on similarity measures in machine learning and data mining. However, in large-scale real-world scenarios, the exact similarity computation has become daunting due to ‘3V’ nature (volume, velocity and variety) of big data. In such cases, the hashing techniques have been verified to efficiently conduct similarity estimation in terms of both theory and practice. Currently, MinHash is a popular technique for efficiently estimating the Jaccard similarity of binary sets and furthermore, weighted MinHash is generalized to estimate the generalized Jaccard similarity of weighted sets. This review focuses on categorizing and discussing the existing works of weighted MinHash algorithms. In this review, we mainly categorize the Weighted MinHash algorithms into quantization-based approaches, ‘active index’-based ones and others, and show the evolution and inherent connection of the weighted MinHash algorithms, from the integer weighted MinHash algorithms to real-valued weighted MinHash ones (particularly the Consistent Weighted Sampling scheme). Also, we have developed a python toolbox for the algorithms, and released it in our github. Based on the toolbox, we experimentally conduct a comprehensive comparative study of the standard MinHash algorithm and the weighted MinHash ones.
Recent work has shown that exploiting relations between labels improves the performance of multi-label classification. We propose a novel framework based on generative adversarial networks (GANs) to model label dependency. The discriminator learns to model label dependency by discriminating real and generated label sets. To fool the discriminator, the classifier, or generator, learns to generate label sets with dependencies close to real data. Extensive experiments and comparisons on two large-scale image classification benchmark datasets (MS-COCO and NUS-WIDE) show that the discriminator improves generalization ability for different kinds of models
We suggest a new methodology for analysis and approximate computations of the Partition Functions (PF) of Graphical Models (GM) in the Normal Factor Graph representation that combines the gauge transformation (GT) technique from (Chertkov, Chernyak 2006) with the technique developed in (Straszak, Vishnoi 2017) based on the recent progress in the field of real stable polynomials. We show that GTs (while keeping PF invariant) allow representation of PF as a sum of polynomials of variables associated with edges of the graph. A special belief propagation (BP) gauge makes a single out term of the series least sensitive to variations then resulting in the loop series for PF introduced in (Chertkov, Chernyak 2006). In addition to restating the known results in the polynomial form, we also discover a new relation between the computationally tractable BP term (single out term of the loop series evaluated at the BP gauge) and the PF: sequential application of differential operators, each associated with an edge of the graph, to the BP polynomial results in the PF. Each term in the sequence corresponds to a BP polynomial of a modified GM derived by contraction of an edge. Even though complexity of computing factors in the derived GMs grow exponentially with the number of eliminated edges, polynomials associated with the new factors remain real stable if the original factors have this property. Moreover, we show that BP estimations for the PF do not decrease with eliminations, thus resulting overall in the proof that the BP solution of the original GM gives a lower bound for PF. The proof extends results of (Straszak, Vishnoi 2017) from bipartite to general graphs, however, it is limited to the case when the BP solution is feasible.

# Magister Dixit

“Understanding correlation, multivariate regression and all aspects of massaging data together to look at it from different angles for use in predictive and prescriptive modeling is the backbone knowledge that’s really step one of revealing intelligence…. If you don’t have this, all the data collection and presentation polishing in the world is meaningless.” Mitchell A. Sanders ( August 27, 2013 )

# Book Memo: “From Human Attention to Computational Attention”

 A Multidisciplinary Approach This both accessible and exhaustive book will help to improve modeling of attention and to inspire innovations in industry. It introduces the study of attention and focuses on attention modeling, addressing such themes as saliency models, signal detection and different types of signals, as well as real-life applications. The book is truly multi-disciplinary, collating work from psychology, neuroscience, engineering and computer science, amongst other disciplines. What is attention? We all pay attention every single moment of our lives. Attention is how the brain selects and prioritizes information. The study of attention has become incredibly complex and divided: this timely volume assists the reader by drawing together work on the computational aspects of attention from across the disciplines. Those working in the field as engineers will benefit from this book’s introduction to the psychological and biological approaches to attention, and neuroscientists can learn about engineering work on attention. The work features practical reviews and chapters that are quick and easy to read, as well as chapters which present deeper, more complex knowledge. Everyone whose work relates to human perception, to image, audio and video processing will find something of value in this book, from students to researchers and those in industry.

# Document worth reading: “Deep Reinforcement Learning: An Overview”

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework. Deep Reinforcement Learning: An Overview

# R Packages worth a look

Visualization of Subgroups for Decision Trees (visTree)
Provides a visualization for characterizing subgroups defined by a decision tree structure. The visualization simplifies the ability to interpret indiv …

A ‘ggplot2’-Plot of Composition of Solvency II SCR: SF and IM (ggsolvencyii)
An implementation of ‘ggplot2’-methods to present the composition of Solvency II Solvency Capital Requirement (SCR) as a series of concentric circle-pa …

Inference and Learning in Stochastic Automata (SAutomata)
Machine learning provides algorithms that can learn from data and make inferences or predictions. Stochastic automata is a class of input/output device …

# Distilled News

This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, decision trees, ensembles, correlation, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, cross-validation, model fitting, and many more
The windows clipboard is a quick way to get data in and out of R. How can we exploit this feature to accomplish our basic data exploration needs and when might its use be inappropriate? Read on.
his is code that will accompany an article that will appear in a special edition of a German IT magazine. The article is about explaining black-box machine learning models.
In the first article of this series, I built an Alpine-based Docker image with R base packages from Alpine’s native repositories, as well as one image with R compiled from source code. The images are hosted on Docker Hub, velaco/alpine-r repository. The next step was either to address the fatal errors I found while testing the installation of R or to proceed building an image with Shiny Server. The logical choice would have been to pass all tests with R’s base packages before proceeding, but I was a bit impatient and wanted to go through the process of building a Shiny Server as soon as possible. After two weeks of trial and error, I finally have a container that can start the server and run Shiny apps.
In this blog post, I will show you how you can quickly and easily forecast a univariate time series. I am going to use data from the EU Open Data Portal on air passenger transport. You can find the data here. I downloaded the data in the TSV format for Luxembourg Airport, but you could repeat the analysis for any airport.
Last week at the ODSC West conference, I was thrilled with the interest in my Using AI for Good workshop: it was wonderful to find a room full of data scientists eager to learn how data science and artificial intelligence can be used to help people and the planet. The workshop was focused around projects from the Microsoft AI for Good program. I’ve included some details about the projects below, and you can also check out the workshop slides and the accompanying Jupyter Notebooks that demonstrate the underlying AI methods used in the projects.
I did a remote install of Ubuntu Server today. This was somewhat novel because it’s the first time that I have not had physical access to the machine I was installing on. The server install went very smoothly indeed.
Interchanging RMarkdown and ‘spinnable’ R
Why do people act the way they do? Why do they buy products, quit their jobs, or change partners? Many of these motives can be inducted from people’s behaviour, and these behaviours are reflected in data. Companies have lots of data about their clients, employees, suppliers… Let’s put that data to work to do some smart data discovery and see what we could learn.
A job title indicates a lot about someone’s role and responsibilities. It says if they manage a team, if they control a budget, and their level of specialization. Knowing this is useful when automating business development or client outreach. For example, a company that sells voice recognition software may want to send messages to:
• CTOs and technical directors informing them of the price and benefits of using the voice recognition software.
• Potential investors or advisors messages inviting them to see the company’s potential market size.
• Founders and engineers instructing them how to use the software.
Training a software to classify job titles is a multi-text text classification problem. For this task, we can use the Python Natural Language Toolkit (NLTK) and Bayesian classification.
Uber has been one of the most active contributors to open source machine learning technologies in the last few years. While companies like Google or Facebook have focused their contributions in new deep learning stacks like TensorFlow, Caffe2 or PyTorch, the Uber engineering team has really focused on tools and best practices for building machine learning at scale in the real world. Technologies such as Michelangelo, Horovod, PyML, Pyro are some of examples of Uber’s contributions to the machine learning ecosystem. With only a small group of companies developing large scale machine learning solutions, the lessons and guidance from Uber becomes even more valuable for machine learning practitioners (I certainly learned a lot and have regularly written about Uber’s efforts).
Before you start learning Python, choose the IDE that suits you the best. As Python is one of the leading programming languages, there is a multitude of IDEs available. So the question is, ‘Which is the best Python IDE for Data Science?’
Digit recognition is not something that difficult or advanced. It is kind of ‘Hello world!’ program – not that cool, but you start exactly here. So I decided to share my work and at the same time refresh the knowledge – it’s being a long ago I played with images.
Data Science is the current buzzword in the market. Every company at the moment is looking to hire Data Science Professionals to solve some Data problem that they themselves are not aware of currently. Machine Learning has taken over the industry by storm and we have a bunch of self taught Data Scientists in the market. Since this Data Science word is an altogether different universe, it is very difficult to set up priorities on what to learn and what not to. So in this case the Harvard Business Review published an article on what you as a company or individual should give importance to. Let’s have a look.
A decision tree is a flowchart-like structure in which each internal node represents a test on a feature (e.g. whether a coin flip comes up heads or tails) , each leaf node represents a class label (decision taken after computing all features) and branches represent conjunctions of features that lead to those class labels. The paths from root to leaf represent classification rules. Below diagram illustrate the basic flow of decision tree for decision making with labels (Rain(Yes), No Rain(No)).
Using bash scripts to create data pipelines is incredibly useful as a data scientist. The possibilities with these scripts are almost endless, but here, I will be going through a tutorial on a very basic bash script to download data and count the number of rows and cols in a dataset. Once you get the hang of using bash scripts, you can have the basics for creating IoT devices, and much much more as this all works with a Raspberry Pi. One cool project that you could use this for is to download all of your twitter messages using the twitter api and then predict whether or not a message from a user on Twitter is spam or not. It could run on a Raspberry Pi server from your room! That is a little out of the scope of this tutorial though, so we will begin by looking at a dataset for cars speed in San Francisco!
I’ve been an avid reader on medium/towards data science for a while now, and I’ve enjoyed the diversity and openness of the subjects tackled by many authors. I wish to contribute to this awesome community by creating my own series of articles ‘From Scratch’, where I explain and implement/build anything from scratch (not necessarily in data science, you need only propose!) Why do I want to do that? In the current state of things, we are in possession of such powerful libraries and tools that can do a lot of the work for us. Most experienced authors are well aware of the complexities of implementing such tools. As such, they make use of them to provide short, accessible and to the point reads to users from diverse backgrounds. In many of the articles that I enjoyed, I failed to understand how this or that algorithm is implemented in practise. What are their limitations? Why were they invented? When should they be used?
Sometimes starting with examples might be a faster way to learn something rather than going theory first before getting into detailed examples. That’s what I will attempt to do here using an example from the official PyTorch tutorial that implements backpropogation and reverse engineer the math and subsequently the concept behind it.
This post follows Machine Learning Introduction 4. In the previous post, we described Machine Learning for marketing attribution. In this post, we will illuminate some of the details we ignored in that section. We will inspect a dataset about Marketing Attribution, perform one-hot encoding of our brands, manipulate our one-hot encoding to learn custom business insights, normalize our features, inspect our model inputs once this is all done, and interpret our outputs in detail.
In this post, we’re going to extend our understanding of gradient descent and apply it to a multivariate function.
This series aims to share my own endeavour to understand, explore and experiment on topics in machine learning. Mathematical notations in this particular post and the next one on multivariate gradient descent will be mostly in line with those used in the Machine Learning course by Andrew Ng. Understanding and being able to play around with the maths behind is the key in understanding machine learning. It allows us to choose the most suitable algorithms and tailor them according to the problems we want to solve. However, I have encountered many tutorials and lectures where equations used are simply impenetrable. All the symbols look cryptic and there seems to be a huge gap between what is being explained and those equations. I just can’t connect all the dots. Unfortunately, more often than not, maths hinders understanding when some knowledge is assumed and important steps are skipped. Therefore, wherever possible, I will expand the equations and avoid shortcuts, so that everyone can follow along how we reach from the left side of the equation to the right.

# Distilled News

The use of synthetic data generated by Generative Adversarial Networks (GANs) has become quite a popular method to do data augmentation for many applications. While practitioners celebrate this as an economical way to get more synthetic data that can be used to train downstream classifiers, it is not clear that they recognize the inherent pitfalls of this technique. In this paper, we aim to exhort practitioners against deriving any false sense of security against data biases based on data augmentation. To drive this point home, we show that starting with a dataset consisting of head-shots of engineering researchers, GAN-based augmentation ‘imagines’ synthetic engineers, most of whom have masculine features and white skin color (inferred from a human subject study conducted on Amazon Mechanical Turk). This demonstrates how biases inherent in the training data are reinforced, and sometimes even amplified, by GAN-based data augmentation; it should serve as a cautionary tale for the lay practitioners.
In this study, we develop a computer-aided material design system to represent and extract knowledge related to material design from natural language texts. A machine learning model is trained on a text corpus weakly labeled by minimal annotated relationship data (~100 labeled relationships) to extract knowledge from scientific articles. The knowledge is represented by relationships between scientific concepts, such as {annealing, grain size, strength}. The extracted relationships are represented as a knowledge graph formatted according to design charts, inspired by the process-structure-property-performance (PSPP) reciprocity. The design chart provides an intuitive effect of processes on properties and prospective processes to achieve the certain desired properties. Our system semantically searches the scientific literature and provides knowledge in the form of a design chart, and we hope it contributes more efficient developments of new materials.
One of the most frustrating issues I face in my professional life is the plentitude of ineffective reports generated within my company. Wherever I look around me is plenty of junk charts, like barplot showing useless 3D effects or ambiguous and crowded pie charts.
This is a simple package implementation for the Extended Isolation Forest method. It is an improvement on the original algorithm Isolation Forest which is described (among other places) in this paper for detecting anomalies and outliers from a data point distribution. The original algorithm suffers from an inconsistency in producing anomaly scores due to slicing operations. Even though the slicing hyperplanes are selected at random, they are always parallel to the coordinate reference frame. The shortcoming can be seen in score maps as presented in the example notebooks in this repository. In order to improve the situation, we propose an extension which allows the hyperplanes to be taken at random angles. The way in which this is done gives rise to multiple levels of extension depending on the dimensionality of the problem. For an N dimensional dataset, Extended Isolation Forest has N levels of extension, with 0 being identical to the case of standard Isolation Forest, and N-1 being the fully extended version. Here we provide the source code for the algorithm as well as documented example notebooks to help get started. Various visualizations are provided such as score distributions, score maps, aggregate slicing of the domain, and tree and whole forest visualizations. most examples are in 2D. We present one 3D example. However, the algorithm works readily with higher dimensional data.
Hidden Markov Models or HMMs are the most common models used for dealing with temporal Data. They also frequently come up in different ways in a Data Science Interview usually without the word HMM written over it. In such a scenario it is necessary to discern the problem as an HMM problem by knowing characteristics of HMMs.
Effective communication from machines and embedded sensors, actuators in industries are crucial to achieve industrial digitalization. Efficient remote monitoring as well as maintenance methodologies helps to accomplish and transform the existing industries to Smart Factories. Monitoring and maintenance leads to the aggregation of the real-time data from sensors via different existing and new industrial communication protocols. Development of user-friendly interface allows remote Condition Monitoring (CM). Context aware analysis of real-time and historical data provides capability to accomplish active Predictive Maintenance (PdM). Both CM and PdM needs access to the machine process data, industrial net-work and communication layer. Furthermore, data flow between individual components from the Cyber-Physical System (CPS) components starting from the actual machine to the database or analyze engine to the real visualization is important. Security and safety aspects on the application, communication, network and data flow level should be considered. This thesis presents a case study on benefits of PdM and CM, the security and safety aspect of the system and the current challenges and improvements. Components of the CPS ecosystem are taken into consideration to further investigate the individual components which en-ables predictive maintenance and condition monitoring. Additionally, safety and security aspects of each component is analyzed. Moreover, the current challenges and the possible improvements of the PdM and CM systems are analyzed. Also, challenges and improvements regarding the components is taken into consideration. Finally, based on the research, possible improvements have been proposed and validated by the researcher. For the new digital era of secure and robust PdM 4.0, the improvements are vital references.
A step-by-step overview of how to begin your project, including advice on how to craft a wise performance metric, setting up testing criteria to overcome human bias, and more.
Get the guidebook / whitepaper for a look at how today’s top data-driven companies scale their advanced analytics & machine learning efforts.
In this post, I share slides and notes from a keynote I gave at the Strata Data Conference in New York last September. As the data community begins to deploy more machine learning (ML) models, I wanted to review some important considerations.
I’m pleased to share Part I of my new book ‘Introduction to Reproducible Science in R’. The purpose of this book is to approach model development and software development holistically to help make science and research more reproducible. The need for such a book arose from observing some of the challenges that I’ve seen teaching graduate courses in natural language processing and machine learning, as well as training my own staff to become effective data scientists. While quantitative reasoning and mathematics are important, often I found that the primary obstacle to good data science was reproducibility and repeatability: it’s difficult to quickly reproduce someone else’s results.
At Wonder Technologies, we have spent a lot of time building Deep learning systems that understand the world through audio. From deep learning based voice extraction to teaching computers how to read our emotions, we needed to use a wide set of data to deliver APIs that worked even in the craziest sound environments. Here is a list of datasets that I found pretty useful for our research and that I’ve personally used to make my audio related models perform much better in real-world environments.
Statistics is one of the most important skills required by a data scientist. There is a lot of mathematics involved in statistics and it can be difficult to grasp. So in this tutorial we are going to go through some of the concepts of statistsics to learn and understand inferential statistics and master it.
My workplace works with large-scale databases that, amongst many things, contains data about people. For each person in the DB we have a unique identifier, which is composed of the person’s first name, last name, zip code. We hold ~500MM people in our DB, which can essentially have duplicates if there is a little change in the person’s name. For example, Rob Rosen and Robert Rosen (with the same zip code) will be treated as two different people. I want to note that if we get the same person an additional time, we just update the record’s timestamp, so there is no need for this sort of deduping. In addition, I would like to give credit to my co-worker Jonathan Harel who assisted me in the research for this project.
If you’ve been following our tech blog lately, you might have noticed we’re using a special type of neural networks called Mixture Density Network (MDN). MDNs do not only predict the expected value of a target, but also the underlying probability distribution. This blogpost will focus on how to implement such a model using Tensorflow, from the ground up, including explanations, diagrams and a Jupyter notebook with the entire source code.
Once we are done with training our model, we just can’t assume that it is going to work well on data that it has not seen before. In other words, we cant be sure that the model will have the desired accuracy and variance in production environment. We need some kind of assurance of the accuracy of the predictions that our model is putting out. For this, we need to validate our model. This process of deciding whether the numerical results quantifying hypothesised relationships between variables, are acceptable as descriptions of the data, is known as validation.
For understanding the mathematics for machine learning algorithms, especially deep learning algorithms, it is essential to build up the mathematical concepts from foundational to more advanced. Unfortunately, Mathematical theories are too hard/abstract/dry to digest in many cases. Imagine you are eating a pizza, it is always easier and more fun to go with a coke. The purpose of this article is to provide intuitive examples for fundamental mathematical theories to make the learning experience more enjoyable and memorable, which is to serve chicken wings with beer, fries with ketchup, and rib-eye with wine.
One of the main time investments I’ve seen data scientists make when building data products is manually performing feature engineering. While tools such as auto-sklearn and Auto-Keras have been able to automate much of the model fitting process when building a predictive model, determining which features to use as input to the fitting process is usually a manual process. I recently started using the FeatureTools library, which enables data scientists to also automate feature engineering.
As researchers, especially in (overly) prolific fields like Deep Learning, we often find ourselves overwhelmed by the huge amount of papers to read and keep track of in our work. I think one big reason for this is insufficient use of existing tools and services that aim to make our life easier. Another reason is the lack of a really good product which meets all our needs under one interface, but that is a topic for another post. Lately I’ve been getting into a new subfield of ML and got extremely frustrated with the process of prioritizing, reading and managing the relevant papers… I ended up looking for tools to help me deal with this overload and want to share with you the products and services that I’ve found. The goal is to improve the workflow and quality of life of anyone who works with scientific papers.
In this article, I introduce a new hyperbolic tangent based activation function, tangent linear unit (TaLU), for neural networks. The function was evaluated for performance using CIFAR-10 and CIFAR-100 database. The performance of the proposed activation function was in par or better than other activation functions such as: standard rectified linear unit (ReLU), leaky rectified linear unit (Leaky ReLU), and exponential linear unit (ELU).
In this tutorial, we will learn how to build a custom real-time object classifier to detect any object of your choice! We will be using BeautifulSoup and Selenium to scrape training images from Shutterstock, Amazon’s Mechanical Turk (or BBox Label Tool) to label images with bounding boxes, and YOLOv3 to train our custom detection model.

# Document worth reading: “Visions of a generalized probability theory”

In this Book we argue that the fruitful interaction of computer vision and belief calculus is capable of stimulating significant advances in both fields. From a methodological point of view, novel theoretical results concerning the geometric and algebraic properties of belief functions as mathematical objects are illustrated and discussed in Part II, with a focus on both a perspective ‘geometric approach’ to uncertainty and an algebraic solution to the issue of conflicting evidence. In Part III we show how these theoretical developments arise from important computer vision problems (such as articulated object tracking, data association and object pose estimation) to which, in turn, the evidential formalism is able to provide interesting new solutions. Finally, some initial steps towards a generalization of the notion of total probability to belief functions are taken, in the perspective of endowing the theory of evidence with a complete battery of estimation and inference tools to the benefit of all scientists and practitioners. Visions of a generalized probability theory