Uncheatable Machine Learning Inference

Classification-as-a-Service (CaaS) is widely deployed today in machine intelligence stacks for a vastly diverse set of applications including anything from medical prognosis to computer vision tasks to natural language processing to identity fraud detection. The computing power required for training complex models on large datasets to perform inference to solve these problems can be very resource-intensive. A CaaS provider may cheat a customer by fraudulently bypassing expensive training procedures in favor of weaker, less computationally-intensive algorithms which yield results of reduced quality. Given a classification service supplier S, intermediary CaaS provider P claiming to use S as a classification backend, and customer C, our work addresses the following questions: (i) how can P‘s claim to be using S be verified by C? (ii) how might S make performance guarantees that may be verified by C? and (iii) how might one design a decentralized system that incentivizes service proofing and accountability? To this end, we propose a variety of methods for C to evaluate the service claims made by P using probabilistic performance metrics, instance seeding, and steganography. We also propose a method of measuring the robustness of a model using a blackbox adversarial procedure, which may then be used as a benchmark or comparison to a claim made by S. Finally, we propose the design of a smart contract-based decentralized system that incentivizes service accountability to serve as a trusted Quality of Service (QoS) auditor.

Using Semantic Role Knowledge for Relevance Ranking of Key Phrases inDocuments: An Unsupervised Approach

In this paper, we investigate the integration of sentence position and semantic role of words in a PageRank system to build a key phrase ranking method. We present the evaluation results of our approach on three scientific articles. We show that semantic role information, when integrated with a PageRank system, can become a new lexical feature. Our approach had an overall improvement on all the data sets over the state-of-art baseline approaches.

Science needs to rethink how it interacts with big data: Five principles for effective scientific big data systems

We should be in a golden age of scientific discovery, given that we have more data and more compute power available than ever before. But paradoxically, in many data-driven fields, the eureka moments are becoming more and more rare. Scientists, and the software tools they use, are struggling to keep pace with the explosion in the volume and complexity of scientific data. We describe here, five architectural principles we believe are essential in order to create effective, robust, and flexible platforms that make us of the best of emerging technology.

Decision Making with Argumentation Graphs

This work is about making decisions by digital means. Funds should be distributed by the students of Heinrich-Heine-University. The proposals were made by the students themselves without further influence. For this purpose, dialog-based argumentation is used to give the participants a better understanding of various arguments. In addition, a software service has been developed which allows the students to express their preferences for various proposals. An experiment was carried out at the university, which should prove whether students are satisfied with this type of participation. The results indicate that the procedure is well accepted and thus successful. However, improvements to the process itself were necessary during the experiment and should be considered for future procedures. Further procedures are desired.

Multivariate Convolutional Sparse Coding with Low Rank Tensor

This paper introduces a new multivariate convolutional sparse coding based on tensor algebra with a general model enforcing both element-wise sparsity and low-rankness of the activations tensors. By using the CP decomposition, this model achieves a significantly more efficient encoding of the multivariate signal-particularly in the high order/ dimension setting-resulting in better performance. We prove that our model is closely related to the Kruskal tensor regression problem, offering interesting theoretical guarantees to our setting. Furthermore, we provide an efficient optimization algorithm based on alternating optimization to solve this model. Finally, we evaluate our algorithm with a large range of experiments, highlighting its advantages and limitations.

Gradient Boosting Survival Tree with Applications in Credit Scoring

Credit scoring (Thomas et al., 2002) plays a vital role in the field of consumer finance. Survival analysis (Banasik et al., 1999) provides an advanced solution to the credit-scoring problem by quantifying the probability of survival time. In order to deal with highly heterogeneous industrial data collected in Chinese market of consumer finance, we propose a nonparametric ensemble tree model called gradient boosting survival tree (GBST) that extends the survival tree models (Gordon and Olshen, 1985; Ishwaran et al., 2008) with a gradient boosting algorithm (Friedman, 2001). The survival tree ensemble is learned by minimizing the negative log-likelihood in an additive manner. The proposed model optimizes the survival probability simultaneously for each time period, which can reduce the overall error significantly. Finally, as a test of the applicability, we apply the GBST model to quantify the credit risk with large-scale real market data. The results show that the GBST model outperforms the existing survival models measured by the concordance index (C-index), Kolmogorov-Smirnov (KS) index, as well as by the area under the receiver operating characteristic curve (AUC) of each time period.

No Need of Data Pre-processing: A General Framework for Radio-Based Device-Free Context Awareness

Device-free context awareness is important to many applications. There are two broadly used approaches for device-free context awareness, i.e. video-based and radio-based. Video-based applications can deliver good performance, but privacy is a serious concern. Radio-based context awareness has drawn researchers attention instead because it does not violate privacy and radio signal can penetrate obstacles. Recently, deep learning has been introduced into radio-based device-free context awareness and helps boost the recognition accuracy. The present works design explicit methods for each radio based application. They also use one additional step to extract features before conducting classification and exploit deep learning as a classification tool. The additional initial data processing step introduces unnecessary noise and information loss. Without initial data processing, it is, however, challenging to explore patterns of raw signals. In this paper, we are the first to propose an innovative deep learning based general framework for both signal processing and classification. The key novelty of this paper is that the framework can be generalised for all the radio-based context awareness applications. We also eliminate the additional effort to extract features from raw radio signals. We conduct extensive evaluations to show the superior performance of our proposed method and its generalisation.

TEASER: Early and Accurate Time Series Classification

Early time series classification (eTSC) is the problem of classifying a time series after as few measurements as possible with the highest possible accuracy. The most critical issue of any eTSC method is to decide when enough data of a time series has been seen to take a decision: Waiting for more data points usually makes the classification problem easier but delays the time in which a classification is made; in contrast, earlier classification has to cope with less input data, often leading to inferior accuracy. The state-of-the-art eTSC methods compute a fixed optimal decision time assuming that every times series has the same defined start time (like turning on a machine). However, in many real-life applications measurements start at arbitrary times (like measuring heartbeats of a patient), implying that the best time for taking a decision varies heavily between time series. We present TEASER, a novel algorithm that models eTSC as a two two-tier classification problem: In the first tier, a classifier periodically assesses the incoming time series to compute class probabilities. However, these class probabilities are only used as output label if a second-tier classifier decides that the predicted label is reliable enough, which can happen after a different number of measurements. In an evaluation using 45 benchmark datasets, TEASER is two to three times earlier at predictions than its competitors while reaching the same or an even higher classification accuracy. We further show TEASER’s superior performance using real-life use cases, namely energy monitoring, and gait detection.

Probabilistic Models with Deep Neural Networks

Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to (i) very restricted model classes where exact or approximate probabilistic inference were feasible, and (ii) small or medium-sized data sets which fit within the main memory of the computer. However, developments in variational inference, a general form of approximate probabilistic inference originated in statistical physics, are allowing probabilistic modeling to overcome these restrictions: (i) Approximate probabilistic inference is now possible over a broad class of probabilistic models containing a large number of parameters, and (ii) scalable inference methods based on stochastic gradient descent and distributed computation engines allow to apply probabilistic modeling over massive data sets. One important practical consequence of these advances is the possibility to include deep neural networks within a probabilistic model to capture complex non-linear stochastic relationships between random variables. These advances in conjunction with the release of novel probabilistic modeling toolboxes have greatly expanded the scope of application of probabilistic models, and allow these models to take advantage of the recent strides made by the deep learning community. In this paper we review the main concepts, methods and tools needed to use deep neural networks within a probabilistic modeling framework.

RCE: An Integration Environment for Engineering and Science

We present RCE (Remote Component Environment), an open-source framework developed primarily at DLR (German Aerospace Center) that enables its users to construct and execute multidisciplinary engineering workflows comprising multiple disciplinary tools. To this end, RCE supplies users with an easy-to-use graphical interface that allows for the intuitive integration of disciplinary tools. Users can execute the individual tools on arbitrary nodes present in the network and all data accrued during the execution of the workflow are collected and stored centrally. Hence, RCE makes it easy for collaborating engineers to contribute their individual disciplinary tools to a multidisciplinary design or analysis, and simplifies the subsequent analysis of the workflow’s results.

Bayesian Inference for Large Scale Image Classification

Bayesian inference promises to ground and improve the performance of deep neural networks. It promises to be robust to overfitting, to simplify the training procedure and the space of hyperparameters, and to provide a calibrated measure of uncertainty that can enhance decision making, agent exploration and prediction fairness. Markov Chain Monte Carlo (MCMC) methods enable Bayesian inference by generating samples from the posterior distribution over model parameters. Despite the theoretical advantages of Bayesian inference and the similarity between MCMC and optimization methods, the performance of sampling methods has so far lagged behind optimization methods for large scale deep learning tasks. We aim to fill this gap and introduce ATMC, an adaptive noise MCMC algorithm that estimates and is able to sample from the posterior of a neural network. ATMC dynamically adjusts the amount of momentum and noise applied to each parameter update in order to compensate for the use of stochastic gradients. We use a ResNet architecture without batch normalization to test ATMC on the Cifar10 benchmark and the large scale ImageNet benchmark and show that, despite the absence of batch normalization, ATMC outperforms a strong optimization baseline in terms of both classification accuracy and test log-likelihood. We show that ATMC is intrinsically robust to overfitting on the training data and that ATMC provides a better calibrated measure of uncertainty compared to the optimization baseline.

Making GDPR Usable: A Model to Support Usability Evaluations of Privacy

We introduce a new perspective on the evaluation of privacy, where rights of the data subjects, privacy principles, and usability criteria are intertwined. This new perspective is visually represented through a cube where each of its three axes of variability captures, respectively: principles, rights, and usability criteria. In this way, our model, called Usable Privacy Cube (or UP Cube), brings out two perspectives on privacy: that of the data subjects and that of the controllers/processors. In the long run, the UP Cube is meant to be the model behind a new certification methodology capable of evaluating the usability of privacy. Our research builds on the criteria proposed by the EuroPriSe certification scheme by adding usability criteria to their evaluation. We slightly reorganize the criteria of EuroPriSe to fit with the UP Cube model, i.e., we show how the EuroPriSe can be viewed as a combination of only principles and rights, forming the basis of the UP Cube. Usability criteria are defined based on goals that we extract from the data protection regulations, at the same time considering the needs, goals and characteristics of different types of users and their context of use. The criteria are designed to produce measurements of the level of usability with which the privacy goals of the data protection are reached. Considering usability criteria allows for greater business differentiation beyond GDPR compliance.

A Benchmark of Visual Storytelling in Social Media

Media editors in the newsroom are constantly pressed to provide a ‘like-being there’ coverage of live events. Social media provides a disorganised collection of images and videos that media professionals need to grasp before publishing their latest news updated. Automated news visual storyline editing with social media content can be very challenging, as it not only entails the task of finding the right content but also making sure that news content evolves coherently over time. To tackle these issues, this paper proposes a benchmark for assessing social media visual storylines. The SocialStories benchmark, comprised by total of 40 curated stories covering sports and cultural events, provides the experimental setup and introduces novel quantitative metrics to perform a rigorous evaluation of visual storytelling with social media data.

Deep Kernel Learning for Clustering

We propose a deep learning approach for discovering kernels tailored to identifying clusters over sample data. Our neural network produces sample embeddings that are motivated by–and are at least as expressive as–spectral clustering. Our training objective, based on the Hilbert Schmidt Information Criterion, can be optimized via gradient adaptations on the Stiefel manifold, leading to significant acceleration over spectral methods relying on eigendecompositions. Finally, our trained embedding can be directly applied to out-of-sample data. We show experimentally that our approach outperforms several state-of-the-art deep clustering methods, as well as traditional approaches such as k-means and spectral clustering over a broad array of real-life and synthetic datasets.

BERT-based Ranking for Biomedical Entity Normalization

Developing high-performance entity normalization algorithms that can alleviate the term variation problem is of great interest to the biomedical community. Although deep learning-based methods have been successfully applied to biomedical entity normalization, they often depend on traditional context-independent word embeddings. Bidirectional Encoder Representations from Transformers (BERT), BERT for Biomedical Text Mining (BioBERT) and BERT for Clinical Text Mining (ClinicalBERT) were recently introduced to pre-train contextualized word representation models using bidirectional Transformers, advancing the state-of-the-art for many natural language processing tasks. In this study, we proposed an entity normalization architecture by fine-tuning the pre-trained BERT / BioBERT / ClinicalBERT models and conducted extensive experiments to evaluate the effectiveness of the pre-trained models for biomedical entity normalization using three different types of datasets. Our experimental results show that the best fine-tuned models consistently outperformed previous methods and advanced the state-of-the-art for biomedical entity normalization, with up to 1.17% increase in accuracy.

VisualBERT: A Simple and Performant Baseline for Vision and Language

We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.

On the Adversarial Robustness of Neural Networks without Weight Transport

Neural networks trained with backpropagation, the standard algorithm of deep learning which uses weight transport, are easily fooled by existing gradient-based adversarial attacks. This class of attacks are based on certain small perturbations of the inputs to make networks misclassify them. We show that less biologically implausible deep neural networks trained with feedback alignment, which do not use weight transport, can be harder to fool, providing actual robustness. Tested on MNIST, deep neural networks trained without weight transport (1) have an adversarial accuracy of 98% compared to 0.03% for neural networks trained with backpropagation and (2) generate non-transferable adversarial examples. However, this gap decreases on CIFAR-10 but still significant particularly for small perturbation magnitude less than 1/2.