EvalAI: Towards Better Evaluation Systems for AI Agents

We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale. EvalAI is built to provide a scalable solution to the research community to fulfill the critical need of evaluating machine learning models and agents acting in an environment against annotations or with a human-in-the-loop. This will help researchers, students, and data scientists to create, collaborate, and participate in AI challenges organized around the globe. By simplifying and standardizing the process of benchmarking these models, EvalAI seeks to lower the barrier to entry for participating in the global scientific effort to push the frontiers of machine learning and artificial intelligence, thereby increasing the rate of measurable progress in this domain.


NeurAll: Towards a Unified Model for Visual Perception in Automated Driving

Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc. However, these tasks are independently explored and modeled. In this paper, we propose a joint multi-task network design called NeurAll for learning all tasks simultaneously. Our main motivation is the computational efficiency achieved by sharing the expensive initial convolutional layers between all tasks. Indeed, the main bottleneck in automated driving systems is the limited processing power available on deployment hardware. There could be other benefits in improving accuracy for some tasks and it eases development effort. It also offers scalability to add more tasks leveraging existing features and achieving better generalization. We survey various CNN based solutions for visual perception tasks in automated driving. Then we propose a unified CNN model for the important tasks and discuss several advanced optimization and architecture design techniques to improve the baseline model. The paper is partly review and partly positional with demonstration of several preliminary results promising for future research. Firstly, we show that an efficient two-task model performing semantic segmentation and object detection achieves similar accuracies compared to separate models on various datasets with minimized runtime. We then illustrate that using depth regression as auxiliary task improves semantic segmentation and using multi-stream semantic segmentation outperforms one-stream semantic segmentation. The two-task network achieves 30 fps on an automotive grade low power SOC for 1280×384 image resolution


Max-Min Fair Sensor Scheduling: Game-theoretic Perspective and Algorithmic Solution

We consider the design of a fair sensor schedule for a number of sensors monitoring different linear time-invariant processes. The largest average remote estimation error among all processes is to be minimized. We first consider a general setup for the max-min fair allocation problem. By reformulating the problem as its equivalent form, we transform the fair resource allocation problem into a zero-sum game between a ‘judge’ and a resource allocator. We propose an equilibrium seeking procedure and show that there exists a unique Nash equilibrium in pure strategy for this game. We then apply the result to the sensor scheduling problem and show that the max-min fair sensor scheduling policy can be achieved.


Hybrid Forest: A Concept Drift Aware Data Stream Mining Algorithm

Nowadays with a growing number of online controlling systems in the organization and also a high demand of monitoring and stats facilities that uses data streams to log and control their subsystems, data stream mining becomes more and more vital. Hoeffding Trees (also called Very Fast Decision Trees a.k.a. VFDT) as a Big Data approach in dealing with the data stream for classification and regression problems showed good performance in handling facing challenges and making the possibility of any-time prediction. Although these methods outperform other methods e.g. Artificial Neural Networks (ANN) and Support Vector Regression (SVR), they suffer from high latency in adapting with new concepts when the statistical distribution of incoming data changes. In this article, we introduced a new algorithm that can detect and handle concept drift phenomenon properly. This algorithms also benefits from fast startup ability which helps systems to be able to predict faster than other algorithms at the beginning of data stream arrival. We also have shown that our approach will overperform other controversial approaches for classification and regression tasks.


ELKI: A large open-source library for data analysis – ELKI Release 0.7.5 ‘Heidelberg’

This paper documents the release of the ELKI data mining framework, version 0.7.5. ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers data index structures such as the R*-tree that can provide major performance gains. ELKI is designed to be easy to extend for researchers and students in this domain, and welcomes contributions of additional methods. ELKI aims at providing a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms. We will first outline the motivation for this release, the plans for the future, and then give a brief overview over the new functionality in this version. We also include an appendix presenting an overview on the overall implemented functionality.


A Shallow Triple Stream Three-dimensional CNN (STSTNet) for Micro-expression Recognition System

In the recent year, the state-of-the-arts of facial micro-expression recognition task have been significantly advanced by the emergence of data-driven approaches based on deep learning. Due to the superb learning capacity of deep learning, it generates promising performance beyond the traditional handcrafted approaches. Recently, many researchers have focused on developing better networks by increasing its depth, as deep networks can effectively approximate certain function classes more efficiently than shallow ones. In this paper, we aim to design a shallow network to extract the high level features of the micro-expression details. Specifically, a two-layer neural network, namely Shallow Triple Stream Three-dimensional CNN (STSTNet) is proposed. The network is capable to learn the features from three optical flow features (i.e., optical strain, horizontal and vertical optical flow images) computed from the onset and apex frames from each video. Our experimental results demonstrate the viability of the proposed STSTNet, which exhibits the UAR recognition results of 76.05%, 70.13%, 86.86% and 68.10% in composite, SMIC, CASME II and SAMM databases, respectively.


Machine Learning With Feature Selection Using Principal Component Analysis for Malware Detection: A Case Study

Cyber security threats have been growing significantly in both volume and sophistication over the past decade. This poses great challenges to malware detection without considerable automation. In this paper, we have proposed a novel approach by extending our recently suggested artificial neural network (ANN) based model with feature selection using the principal component analysis (PCA) technique for malware detection. The effectiveness of the approach has been successfully demonstrated with the application in PDF malware detection. A varying number of principal components is examined in the comparative study. Our evaluation shows that the model with PCA can significantly reduce feature redundancy and learning time with minimum impact on data information loss, as confirmed by both training and testing results based on around 105,000 real-world PDF documents. Of the evaluated models using PCA, the model with 32 principal feature components exhibits very similar training accuracy to the model using the 48 original features, resulting in around 33% dimensionality reduction and 22% less learning time. The testing results further confirm the effectiveness and show that the model is able to achieve 93.17% true positive rate (TPR) while maintaining the same low false positive rate (FPR) of 0.08% as the case when no feature selection is applied, which significantly outperforms all evaluated seven well known commercial antivirus (AV) scanners of which the best scanner only has a TPR of 84.53%.


A Bayesian Approach to Joint Estimation of Multiple Graphical Models

The problem of joint estimation of multiple graphical models from high dimensional data has been studied in the statistics and machine learning literature, due to its importance in diverse fields including molecular biology, neuroscience and the social sciences. This work develops a Bayesian approach that decomposes the model parameters across the multiple graphical models into shared components across subsets of models and edges, and idiosyncratic ones. Further, it leverages a novel multivariate prior distribution, coupled with a pseudo-likelihood that enables fast computations through a robust and efficient Gibbs sampling scheme. We establish strong posterior consistency for model selection, as well as estimation of model parameters under high dimensional scaling with the number of variables growing exponentially with the sample size. The efficacy of the proposed approach is illustrated on both synthetic and real data. Keywords: Pseudo-likelihood, Gibbs sampling, posterior consistency, Omics data


A Bandit Framework for Optimal Selection of Reinforcement Learning Agents

Deep Reinforcement Learning has been shown to be very successful in complex games, e.g. Atari or Go. These games have clearly defined rules, and hence allow simulation. In many practical applications, however, interactions with the environment are costly and a good simulator of the environment is not available. Further, as environments differ by application, the optimal inductive bias (architecture, hyperparameters, etc.) of a reinforcement agent depends on the application. In this work, we propose a multi-arm bandit framework that selects from a set of different reinforcement learning agents to choose the one with the best inductive bias. To alleviate the problem of sparse rewards, the reinforcement learning agents are augmented with surrogate rewards. This helps the bandit framework to select the best agents early, since these rewards are smoother and less sparse than the environment reward. The bandit has the double objective of maximizing the reward while the agents are learning and selecting the best agent after a finite number of learning steps. Our experimental results on standard environments show that the proposed framework is able to consistently select the optimal agent after a finite number of steps, while collecting more cumulative reward compared to selecting a sub-optimal architecture or uniformly alternating between different agents.


Differential Similarity in Higher Dimensional Spaces: Theory and Applications

This paper presents an extension and an elaboration of the theory of differential similarity, which was originally proposed in arXiv:1401.2411 [cs.LG]. The goal is to develop an algorithm for clustering and coding that combines a geometric model with a probabilistic model in a principled way. For simplicity, the geometric model in the earlier paper was restricted to the three-dimensional case. The present paper removes this restriction, and considers the full n-dimensional case. Although the mathematical model is the same, the strategies for computing solutions in the n-dimensional case are different, and one of the main purposes of this paper is to develop and analyze these strategies. Another main purpose is to devise techniques for estimating the parameters of the model from sample data, again in n dimensions. We evaluate the solution strategies and the estimation techniques by applying them to two familiar real-world examples: the classical MNIST dataset and the CIFAR-10 dataset.


Accelerating Partial Evaluation in Distributed SPARQL Query Evaluation

Partial evaluation has recently been used for processing SPARQL queries over a large resource description framework (RDF) graph in a distributed environment. However, the previous approach is inefficient when dealing with complex queries. In this study, we further improve the ‘partial evaluation and assembly’ framework for answering SPARQL queries over a distributed RDF graph, while providing performance guarantees. Our key idea is to explore the intrinsic structural characteristics of partial matches to filter out irrelevant partial results, while providing performance guarantees on a network trace (data shipment) or the computational cost (response time). We also propose an efficient assembly algorithm to utilize the characteristics of partial matches to merge them and form final results. To improve the efficiency of finding partial matches further, we propose an optimization that communicates variables’ candidates among sites to avoid redundant computations. In addition, although our approach is partitioning-tolerant, different partitioning strategies result in different performances, and we evaluate different partitioning strategies for our approach. Experiments over both real and synthetic RDF datasets confirm the superiority of our approach.


Hawkes processes for credit indices time series analysis: How random are trades arrival times?

Targeting a better understanding of credit market dynamics, the authors have studied a stochastic model named the Hawkes process. Describing trades arrival times, this kind of model allows for the capture of self-excitement and mutual interactions phenomena. The authors propose here a simple yet conclusive method for fitting multidimensional Hawkes processes with exponential kernels, based on a maximum likelihood non-convex optimization. The method was successfully tested on simulated data, then used on new publicly available real trading data for three European credit indices, thus enabling quantification of self-excitement as well as volume impacts or cross indices influences.


Path Capsule Networks

Capsule network (CapsNet) was introduced as an enhancement over convolutional neural networks, supplementing the latter’s invariance properties with equivariance through pose estimation. CapsNet achieved a very decent performance with a shallow architecture and a significant reduction in parameters count. However, the width of the first layer in CapsNet is still contributing to a significant number of its parameters and the shallowness may be limiting the representational power of the capsules. To address these limitations, we introduce Path Capsule Network (PathCapsNet), a deep parallel multi-path version of CapsNet. We show that a judicious coordination of depth, max-pooling, regularization by DropCircuit and a new fan-in routing by agreement technique can achieve better or comparable results to CapsNet, while further reducing the parameter count significantly.


A Distributed and Approximated Nearest Neighbors Algorithm for an Efficient Large Scale Mean Shift Clustering

In this paper we target the class of modal clustering methods where clusters are defined in terms of the local modes of the probability density function which generates the data. The most well-known modal clustering method is the k-means clustering. Mean Shift clustering is a generalization of the k-means clustering which computes arbitrarily shaped clusters as defined as the basins of attraction to the local modes created by the density gradient ascent paths. Despite its potential, the Mean Shift approach is a computationally expensive method for unsupervised learning. Thus, we introduce two contributions aiming to provide clustering algorithms with a linear time complexity, as opposed to the quadratic time complexity for the exact Mean Shift clustering. Firstly we propose a scalable procedure to approximate the density gradient ascent. Second, our proposed scalable cluster labeling technique is presented. Both propositions are based on Locality Sensitive Hashing (LSH) to approximate nearest neighbors. These two techniques may be used for moderate sized datasets. Furthermore, we show that using our proposed approximations of the density gradient ascent as a pre-processing step in other clustering methods can also improve dedicated classification metrics. For the latter, a distributed implementation, written for the Spark/Scala ecosystem is proposed. For all these considered clustering methods, we present experimental results illustrating their labeling accuracy and their potential to solve concrete problems.


Verifiable Smart Contract Portability

With the advent of blockchain technologies, the idea of decentralized applications has gained traction. Smart contracts permit the implementation of application logic to foster distributed systems that are capable of removing intermediaries. Hereby, lock in effects originating from isolated data storage and central authorities are mitigated. Yet, smart contracts deployed to a ledger generate dependencies on the underlying blockchain. Over time, requirements regarding contract execution may detach from the utilized chain due to contradicting incentives and security or performance issues. To avoid a novel form of lock in effect towards a host blockchain, we introduce a concept for smart contract portability that permits any user to migrate contract logic and state between blockchains in a flexible and verifiable manner. As the Ethereum Virtual Machine (EVM) is supported by a multitude of blockchain implementations, it poses a common execution environment for smart contracts. We provide a toolbox that facilitates smart contract portability between EVM-compatible blockchains without trust requirements in the entity executing the migration process. To prove the concept’s soundness, we transfer token contracts based on the ERC20 standard as well as applications containing dependencies to other smart contracts. Our evaluation shows the validity of ported applications including their current states.


Deep Hashing using Entropy Regularised Product Quantisation Network

In large scale systems, approximate nearest neighbour search is a crucial algorithm to enable efficient data retrievals. Recently, deep learning-based hashing algorithms have been proposed as a promising paradigm to enable data dependent schemes. Often their efficacy is only demonstrated on data sets with fixed, limited numbers of classes. In practical scenarios, those labels are not always available or one requires a method that can handle a higher input variability, as well as a higher granularity. To fulfil those requirements, we look at more flexible similarity measures. In this work, we present a novel, flexible, end-to-end trainable network for large-scale data hashing. Our method works by transforming the data distribution to behave as a uniform distribution on a product of spheres. The transformed data is subsequently hashed to a binary form in a way that maximises entropy of the output, (i.e. to fully utilise the available bit-rate capacity) while maintaining the correctness (i.e. close items hash to the same key in the map). We show that the method outperforms baseline approaches such as locality-sensitive hashing and product quantisation in the limited capacity regime.


Towards an All-Purpose Content-Based Multimedia Information Retrieval System

The growth of multimedia collections – in terms of size, heterogeneity, and variety of media types – necessitates systems that are able to conjointly deal with several forms of media, especially when it comes to searching for particular objects. However, existing retrieval systems are organized in silos and treat different media types separately. As a consequence, retrieval across media types is either not supported at all or subject to major limitations. In this paper, we present vitrivr, a content-based multimedia information retrieval stack. As opposed to the keyword search approach implemented by most media management systems, vitrivr makes direct use of the object’s content to facilitate different types of similarity search, such as Query-by-Example or Query-by-Sketch, for and, most importantly, across different media types – namely, images, audio, videos, and 3D models. Furthermore, we introduce a new web-based user interface that enables easy-to-use, multimodal retrieval from and browsing in mixed media collections. The effectiveness of vitrivr is shown on the basis of a user study that involves different query and media types. To the best of our knowledge, the full vitrivr stack is unique in that it is the first multimedia retrieval system that seamlessly integrates support for four different types of media. As such, it paves the way towards an all-purpose, content-based multimedia information retrieval system.


Energy-recycling Blockchain with Proof-of-Deep-Learning

An enormous amount of energy is wasted in Proofof-Work (PoW) mechanisms adopted by popular blockchain applications (e.g., PoW-based cryptocurrencies), because miners must conduct a large amount of computation. Owing to this, one serious rising concern is that the energy waste not only dilutes the value of the blockchain but also hinders its further application. In this paper, we propose a novel blockchain design that fully recycles the energy required for facilitating and maintaining it, which is re-invested to the computation of deep learning. We realize this by proposing Proof-of-Deep-Learning (PoDL) such that a valid proof for a new block can be generated if and only if a proper deep learning model is produced. We present a proof-of-concept design of PoDL that is compatible with the majority of the cryptocurrencies that are based on hash-based PoW mechanisms. Our benchmark and simulation results show that the proposed design is feasible for various popular cryptocurrencies such as Bitcoin, Bitcoin Cash, and Litecoin.


Scaling Big Data Platform for Big Data Pipeline

Monitoring and Managing High Performance Computing (HPC) systems and environments generate an ever growing amount of data. Making sense of this data and generating a platform where the data can be visualized for system administrators and management to proactively identify system failures or understand the state of the system requires the platform to be as efficient and scalable as the underlying database tools used to store and analyze the data. In this paper we will show how we leverage Accumulo, d4m, and Unity to generate a 3D visualization platform to monitor and manage the Lincoln Laboratory Supercomputer systems and how we have had to retool our approach to scale with our systems.


A physics-aware, probabilistic machine learning framework for coarse-graining high-dimensional systems in the Small Data regime

The automated construction of coarse-grained models represents a pivotal component in computer simulation of physical systems and is a key enabler in various analysis and design tasks related to uncertainty quantification. Pertinent methods are severely inhibited by the high-dimension of the parametric input and the limited number of training input/output pairs that can be generated when computationally demanding forward models are considered. Such cases are frequently encountered in the modeling of random heterogeneous media where the scale of the microstructure necessitates the use of high-dimensional random vectors and very fine discretizations of the governing equations. The present paper proposes a probabilistic Machine Learning framework that is capable of operating in the presence of Small Data by exploiting aspects of the physical structure of the problem as well as contextual knowledge. As a result, it can perform comparably well under extrapolative conditions. It unifies the tasks of dimensionality and model-order reduction through an encoder-decoder scheme that simultaneously identifies a sparse set of salient lower-dimensional microstructural features and calibrates an inexpensive, coarse-grained model which is predictive of the output. Information loss is accounted for and quantified in the form of probabilistic predictive estimates. The learning engine is based on Stochastic Variational Inference. We demonstrate how the variational objectives can be used not only to train the coarse-grained model, but also to suggest refinements that lead to improved predictions.


KTBoost: Combined Kernel and Tree Boosting

In this article, we introduce a novel boosting algorithm called `KTBoost’, which combines kernel boosting and tree boosting. In each boosting iteration, the algorithm adds either a regression tree or reproducing kernel Hilbert space (RKHS) regression function to the ensemble of base learners. Intuitively, the idea is that discontinuous trees and continuous RKHS regression functions complement each other, and that this combination allows for better learning of both continuous and discontinuous functions as well as functions that exhibit parts with varying degrees of regularity. We empirically show that KTBoost outperforms both tree and kernel boosting in terms of predictive accuracy on a wide array of data sets.


Data-driven unsupervised clustering of online learner behaviour

The widespread adoption of online courses opens opportunities for the analysis of learner behaviour and for the optimisation of web-based material adapted to observed usage. Here we introduce a mathematical framework for the analysis of time series collected from online engagement of learners, which allows the identification of clusters of learners with similar online behaviour directly from the data, i.e., the groups of learners are not pre-determined subjectively but emerge algorithmically from the analysis and the data.The method uses a dynamic time warping kernel to create a pairwise similarity between time series of learner actions, and combines it with an unsupervised multiscale graph clustering algorithm to cluster groups of learners with similar patterns of behaviour. We showcase our approach on online engagement data of adult learners taking six web-based courses as part of a post-graduate degree at Imperial Business School. Our analysis identifies clusters of learners with statistically distinct patterns of engagement, ranging from distributed to massed learning, with different levels of adherence to pre-planned course structure and/or task completion, and also revealing outlier learners with highly sporadic behaviour. A posteriori comparison with performance showed that, although the majority of low-performing learners are part of in the massed learning cluster, the high performing learners are distributed across clusters with different traits of online engagement. We also show that our methodology is able to identify low performing learners more accurately than common classification methods based on raw statistics extracted from the data.


Policy Learning for Fairness in Ranking

Conventional Learning-to-Rank (LTR) methods optimize the utility of the rankings to the users, but they are oblivious to their impact on the ranked items. However, there has been a growing understanding that the latter is important to consider for a wide range of ranking applications (e.g. online marketplaces, job placement, admissions). To address this need, we propose a general LTR framework that can optimize a wide range of utility metrics (e.g. NDCG) while satisfying fairness of exposure constraints with respect to the items. This framework expands the class of learnable ranking functions to stochastic ranking policies, which provides a language for rigorously expressing fairness specifications. Furthermore, we provide a new LTR algorithm called Fair-PG-Rank for directly searching the space of fair ranking policies via a policy-gradient approach. Beyond the theoretical evidence in deriving the framework and the algorithm, we provide empirical results on simulated and real-world datasets verifying the effectiveness of the approach in individual and group-fairness settings.