**Probabilistic Kernel Support Vector Machines**

We propose a probabilistic enhancement of standard {\em kernel Support Vector Machines} for binary classification, in order to address the case when, along with given data sets, a description of uncertainty (e.g., error bounds) may be available on each datum. In the present paper, we specifically consider Gaussian distributions to model uncertainty. Thereby, our data consist of pairs

,

, along with an indicator

to declare membership in one of two categories for each pair. These pairs may be viewed to represent the mean and covariance, respectively, of random vectors

taking values in a suitable linear space (typically

). Thus, our setting may also be viewed as a modification of Support Vector Machines to classify distributions, albeit, at present, only Gaussian ones. We outline the formalism that allows computing suitable classifiers via a natural modification of the standard “kernel trick.” The main contribution of this work is to point out a suitable kernel function for applying Support Vector techniques to the setting of uncertain data for which a detailed uncertainty description is also available (herein, “Gaussian points”).

**Learning to Engage with Interactive Systems: A field Study**

Physical agents that can autonomously generate engaging, life-like behaviour will lead to more responsive and interesting robots and other autonomous systems. Although many advances have been made for one-to-one interactions in well controlled settings, future physical agents should be capable of interacting with humans in natural settings, including group interaction. In order to generate engaging behaviours, the autonomous system must first be able to estimate its human partners’ engagement level. In this paper, we propose an approach for estimating engagement from behaviour and use the measure within a reinforcement learning framework to learn engaging interactive behaviours. The proposed approach is implemented in an interactive sculptural system in a museum setting. We compare the learning system to a baseline using pre-scripted interactive behaviours. Analysis based on sensory data and survey data shows that adaptable behaviours within a perceivable and understandable range can achieve higher engagement and likeability.

**Infinite Probabilistic Databases**

Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009). We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries. It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and datalog queries.

**No Adjective Ordering Mystery, and No Raven Paradox, Just an Ontological Mishap**

In the concluding remarks of Ontological Promiscuity Hobbs (1985) made what we believe to be a very insightful observation: given that semantics is an attempt at specifying the relation between language and the world, if ‘one can assume a theory of the world that is isomorphic to the way we talk about it … then semantics becomes nearly trivial’. But how exactly can we rectify our logical formalisms so that semantics, an endeavor that has occupied the most penetrating minds for over two centuries, can become (nearly) trivial, and what exactly does it mean to assume a theory of the world in our semantics? In this paper we hope to provide answers for both questions. First, we believe that a commonsense theory of the world can (and should) be embedded in our semantic formalisms resulting in a logical semantics grounded in commonsense metaphysics. Moreover, we believe the first step to accomplishing this vision is rectifying what we think was a crucial oversight in logical semantics, namely the failure to distinguish between two fundamentally different types of concepts: (i) ontological concepts, that correspond to what Cocchiarella (2001) calls first-intension concepts and are types in a strongly-typed ontology; and (ii) logical concepts (or second intension concepts), that are predicates corresponding to properties of (and relations between) objects of various ontological types1. In such a framework, which we will refer to henceforth by ontologik, it will be shown how type unification and other type operations can be used to account for the `missing text phenomenon’ (MTP) (see Saba, 2019a) that is at the heart of most challenges in the semantics of natural language, by uncovering the significant amount of missing text that is never explicitly stated in everyday discourse, but is often implicitly assumed as shared background knowledge.

**Improving interactive reinforcement learning: What makes a good teacher?**

Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.

**GraphTSNE: A Visualization Technique for Graph-Structured Data**

We present GraphTSNE, a novel visualization technique for graph-structured data based on t-SNE. The growing interest in graph-structured data increases the importance of gaining human insight into such datasets by means of visualization. However, among the most popular visualization techniques, classical t-SNE is not suitable on such datasets because it has no mechanism to make use of information from graph connectivity. On the other hand, standard graph visualization techniques, such as Laplacian Eigenmaps, have no mechanism to make use of information from node features. Our proposed method GraphTSNE is able to produce visualizations which account for both graph connectivity and node features. It is based on scalable and unsupervised training of a graph convolutional network on a modified t-SNE loss. By assembling a suite of evaluation metrics, we demonstrate that our method produces desirable visualizations on three benchmark datasets.

**Deep Comprehensive Correlation Mining for Image Clustering**

Recent developed deep unsupervised methods allow us to jointly learn representation and cluster unlabelled data. These deep clustering methods %like DAC start with mainly focus on the correlation among samples, e.g., selecting high precision pairs to gradually tune the feature representation, which neglects other useful correlations. In this paper, we propose a novel clustering framework, named deep comprehensive correlation mining(DCCM), for exploring and taking full advantage of various kinds of correlations behind the unlabeled data from three aspects: 1) Instead of only using pair-wise information, pseudo-label supervision is proposed to investigate category information and learn discriminative features. 2) The features’ robustness to image transformation of input space is fully explored, which benefits the network learning and significantly improves the performance. 3) The triplet mutual information among features is presented for clustering problem to lift the recently discovered instance-level deep mutual information to a triplet-level formation, which further helps to learn more discriminative features. Extensive experiments on several challenging datasets show that our method achieves good performance, e.g., attaining

clustering accuracy on CIFAR-10, and

on CIFAR-100, both of which significantly surpass the state-of-the-art results more than

.

**Human-Guided Learning of Column Networks: Augmenting Deep Learning with Advice**

Recently, deep models have been successfully applied in several applications, especially with low-level representations. However, sparse, noisy samples and structured domains (with multiple objects and interactions) are some of the open challenges in most deep models. Column Networks, a deep architecture, can succinctly capture such domain structure and interactions, but may still be prone to sub-optimal learning from sparse and noisy samples. Inspired by the success of human-advice guided learning in AI, especially in data-scarce domains, we propose Knowledge-augmented Column Networks that leverage human advice/knowledge for better learning with noisy/sparse samples. Our experiments demonstrate that our approach leads to either superior overall performance or faster convergence (i.e., both effective and efficient).

**LeanResNet: A Low-cost yet Effective Convolutional Residual Networks**

Convolutional Neural Networks (CNNs) filter the input data using a series of spatial convolution operators with compact stencils and point-wise non-linearities. Commonly, the convolution operators couple features from all channels, which leads to immense computational cost in the training of and prediction with CNNs. To improve the efficiency of CNNs, we introduce lean convolution operators that reduce the number of parameters and computational complexity. Our new operators can be used in a wide range of existing CNNs. Here, we exemplify their use in residual networks (ResNets), which have been very reliable for a few years now and analyzed intensively. In our experiments on three image classification problems, the proposed LeanResNet yields results that are comparable to other recently proposed reduced architectures using similar number of parameters.

**The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent**

The goal of this paper is to study why stochastic gradient descent (SGD) is efficient for neural networks, and how neural net design affects SGD. In particular, we investigate how overparameterization — an increase in the number of parameters beyond the number of training data — affects the dynamics of SGD. We introduce a simple concept called gradient confusion. When confusion is high, stochastic gradients produced by different data samples may be negatively correlated, slowing down convergence. But when gradient confusion is low, we show that SGD has better convergence properties than predicted by classical theory. Using theoretical and experimental results, we study how overparameterization affects gradient confusion, and thus the convergence of SGD, on linear models and neural networks. We show that increasing the number of parameters of linear models or increasing the width of neural networks leads to lower gradient confusion, and thus faster and easier model training. We also show how overparameterization by increasing the depth of neural networks results in higher gradient confusion, making deeper models harder to train. Finally, we observe empirically that techniques like batch normalization and skip connections reduce gradient confusion, which helps reduce the training burden of deep networks.

**A Hitchhiker’s Guide to Statistical Comparisons of Reinforcement Learning Algorithms**

Consistently checking the statistical significance of experimental results is the first mandatory step towards reproducible science. This paper presents a hitchhiker’s guide to rigorous comparisons of reinforcement learning algorithms. After introducing the concepts of statistical testing, we review the relevant statistical tests and compare them empirically in terms of false positive rate and statistical power as a function of the sample size (number of seeds) and effect size. We further investigate the robustness of these tests to violations of the most common hypotheses (normal distributions, same distributions, equal variances). Beside simulations, we compare empirical distributions obtained by running Soft-Actor Critic and Twin-Delayed Deep Deterministic Policy Gradient on Half-Cheetah. We conclude by providing guidelines and code to perform rigorous comparisons of RL algorithm performances.

**SR-GAN: Semantic Rectifying Generative Adversarial Network for Zero-shot Learning**

The existing Zero-Shot learning (ZSL) methods may suffer from the vague class attributes that are highly overlapped for different classes. Unlike these methods that ignore the discrimination among classes, in this paper, we propose to classify unseen image by rectifying the semantic space guided by the visual space. First, we pre-train a Semantic Rectifying Network (SRN) to rectify semantic space with a semantic loss and a rectifying loss. Then, a Semantic Rectifying Generative Adversarial Network (SR-GAN) is built to generate plausible visual feature of unseen class from both semantic feature and rectified semantic feature. To guarantee the effectiveness of rectified semantic features and synthetic visual features, a pre-reconstruction and a post reconstruction networks are proposed, which keep the consistency between visual feature and semantic feature. Experimental results demonstrate that our approach significantly outperforms the state-of-the-arts on four benchmark datasets.

**The Varchenko Determinant**

Varchenko introduced a distance function on chambers of a hyperplane arrangement which gives rise to a determinant indexed by chambers whose entry in position

is the distance between

and

, and proved that that determinant has a nice factorization: that is the Varchenko determinant. Recently, Aguiar and Mahajan defined a generalization of that distance function, and proved that, for a central hyperplane arrangement, the determinant given rise by their distance function has also a nice factorization. We prove that, for any hyperplane arrangement, the determinant given rise by the distance function of Aguiar and Mahajan has a nice factorization. We also prove that the same is true for the determinant indexed by chambers of an apartment.

**Copula-like Variational Inference**

This paper considers a new family of variational distributions motivated by Sklar’s theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension of state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity

. We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.

**A Discussion on Solving Partial Differential Equations using Neural Networks**

Can neural networks learn to solve partial differential equations (PDEs)? We investigate this question for two (systems of) PDEs, namely, the Poisson equation and the steady Navier–Stokes equations. The contributions of this paper are five-fold. (1) Numerical experiments show that small neural networks (< 500 learnable parameters) are able to accurately learn complex solutions for systems of partial differential equations. (2) It investigates the influence of random weight initialization on the quality of the neural network approximate solution and demonstrates how one can take advantage of this non-determinism using ensemble learning. (3) It investigates the suitability of the loss function used in this work. (4) It studies the benefits and drawbacks of solving (systems of) PDEs with neural networks compared to classical numerical methods. (5) It proposes an exhaustive list of possible directions of future work.

**Tutorial: Safe and Reliable Machine Learning**

This document serves as a brief overview of the ‘Safe and Reliable Machine Learning’ tutorial given at the 2019 ACM Conference on Fairness, Accountability, and Transparency (FAT* 2019). The talk slides can be found here:

https://bit.ly/2Gfsukp, while a video of the talk is available here:

https://youtu.be/FGLOCkC4KmE, and a complete list of references for the tutorial here:

https://bit.ly/2GdLPme.