This article introduces the concepts around Online Bandit Linear Optimization and explores an efficient setup called SCRiBLe (Self-Concordant Regularization in Bandit Learning) created by Abernethy et. al.\cite{abernethy}. The SCRiBLe setup and algorithm yield a $O(\sqrt{T})$ regret bound and polynomial run time complexity bound on the dimension of the input space. In this article we build up to the bandit linear optimization case and study SCRiBLe.
Single-trial classification of event-related potentials in electroencephalogram (EEG) signals is a very important paradigm of brain-computer interface (BCI). Because of individual differences, usually some subject-specific calibration data are required to tailor the classifier for each subject. Transfer learning has been extensively used to reduce such calibration data requirement, by making use of auxiliary data from similar/relevant subjects/tasks. However, all previous research assumes that all auxiliary data have been labeled. This paper considers a more general scenario, in which part of the auxiliary data could be unlabeled. We propose active semi-supervised transfer learning (ASTL) for offline BCI calibration, which integrates active learning, semi-supervised learning, and transfer learning. Using a visual evoked potential oddball task and three different EEG headsets, we demonstrate that ASTL can achieve consistently good performance across subjects and headsets, and it outperforms some state-of-the-art approaches in the literature.
Bayesian On-line Changepoint Detection is extended to on-line model selection and non-stationary spatio-temporal processes. We propose spatially structured Vector Autoregressions (VARs) for modelling the process between changepoints (CPs) and give an upper bound on the approximation error of such models. The resulting algorithm performs prediction, model selection and CP detection on-line. Its time complexity is linear and its space complexity constant, and thus it is two orders of magnitudes faster than its closest competitor. In addition, it outperforms the state of the art for multivariate data.
Deep learning has made significant improvements at many image processing tasks in recent years, such as image classification, object recognition and object detection. Convolutional neural networks (CNN), which is a popular deep learning architecture designed to process data in multiple array form, show great success to almost all detection \& recognition problems and computer vision tasks. However, the number of parameters in a CNN is too high such that the computers require more energy and larger memory size. In order to solve this problem, we propose a novel energy efficient model Binary Weight and Hadamard-transformed Image Network (BWHIN), which is a combination of Binary Weight Network (BWN) and Hadamard-transformed Image Network (HIN). It is observed that energy efficiency is achieved with a slight sacrifice at classification accuracy. Among all energy efficient networks, our novel ensemble model outperforms other energy efficient models.
There is an increasing demand for algorithms to explain their outcomes. So far, there is no method that explains the rankings produced by a ranking algorithm. To address this gap we propose LISTEN, a LISTwise ExplaiNer, to explain rankings produced by a ranking algorithm. To efficiently use LISTEN in production, we train a neural network to learn the underlying explanation space created by LISTEN; we call this model Q-LISTEN. We show that LISTEN produces faithful explanations and that Q-LISTEN is able to learn these explanations. Moreover, we show that LISTEN is safe to use in a real world environment: users of a news recommendation system do not behave significantly differently when they are exposed to explanations generated by LISTEN instead of manually generated explanations.
Principal component analysis (PCA) is widely used for feature extraction and dimensionality reduction, with documented merits in diverse tasks involving high-dimensional data. Standard PCA copes with one dataset at a time, but it is challenged when it comes to analyzing multiple datasets jointly. In certain data science settings however, one is often interested in extracting the most discriminative information from one dataset of particular interest (a.k.a. target data) relative to the other(s) (a.k.a. background data). To this end, this paper puts forth a novel approach, termed discriminative (d) PCA, for such discriminative analytics of multiple datasets. Under certain conditions, dPCA is proved to be least-squares optimal in recovering the component vector unique to the target data relative to background data. To account for nonlinear data correlations, (linear) dPCA models for one or multiple background datasets are generalized through kernel-based learning. Interestingly, all dPCA variants admit an analytical solution obtainable with a single (generalized) eigenvalue decomposition. Finally, corroborating dimensionality reduction tests using both synthetic and real datasets are provided to validate the effectiveness of the proposed methods.
Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.
This paper studies teacher-student optimization on neural networks, i.e., adopting the supervision from a trained (teacher) network to optimize another (student) network. Conventional approaches enforced the student to learn from a strict teacher which fit a hard distribution and achieved high recognition accuracy, but we argue that a more tolerant teacher often educate better students. We start with adding an extra loss term to a patriarch network so that it preserves confidence scores on a primary class (the ground-truth) and several visually-similar secondary classes. The patriarch is also known as the first teacher. In each of the following generations, a student learns from the teacher and becomes the new teacher in the next generation. Although the patriarch is less powerful due to ambiguity, the students enjoy a persistent ability growth as we gradually fine-tune them to fit one-hot distributions. We investigate standard image classification tasks (CIFAR100 and ILSVRC2012). Experiments with different network architectures verify the superiority of our approach, either using a single model or an ensemble of models.
In this short paper, we evaluate the performance of three well-known Machine Learning techniques for predicting the impact of a post in Facebook. Social medias have a huge influence in the social behaviour. Therefore to develop an automatic model for predicting the impact of posts in social medias can be useful to the society. In this article, we analyze the efficiency for predicting the post impact of three popular techniques: Support Vector Regression (SVR), Echo State Network (ESN) and Adaptive Network Fuzzy Inject System (ANFIS). The evaluation was done over a public and well-known benchmark dataset.
Malicious scripts are an important computer infection threat vector. Our analysis reveals that the two most prevalent types of malicious scripts include JavaScript and VBScript. The percentage of detected JavaScript attacks are on the rise. To address these threats, we investigate two deep recurrent models, LaMP (LSTM and Max Pooling) and CPoLS (Convoluted Partitioning of Long Sequences), which process JavaScript and VBScript as byte sequences. Lower layers capture the sequential nature of these byte sequences while higher layers classify the resulting embedding as malicious or benign. Unlike previously proposed solutions, our models are trained in an end-to-end fashion allowing discriminative training even for the sequential processing layers. Evaluating these models on a large corpus of 296,274 JavaScript files indicates that the best performing LaMP model has a 65.9% true positive rate (TPR) at a false positive rate (FPR) of 1.0%. Similarly, the best CPoLS model has a TPR of 45.3% at an FPR of 1.0%. LaMP and CPoLS yield a TPR of 69.3% and 67.9%, respectively, at an FPR of 1.0% on a collection of 240,504 VBScript files.
The estimation of crowd count in images has a wide range of applications such as video surveillance, traffic monitoring, public safety and urban planning. Recently, the convolutional neural network (CNN) based approaches have been shown to be more effective in crowd counting than traditional methods that use handcrafted features. However, the existing CNN-based methods still suffer from large number of parameters and large storage space, which require high storage and computing resources and thus limit the real-world application. Consequently, we propose a deeply-recursive network (DR-ResNet) based on ResNet blocks for crowd counting. The recursive structure makes the network deeper while keeping the number of parameters unchanged, which enhances network capability to capture statistical regularities in the context of the crowd. Besides, we generate a new dataset from the video-monitoring data of Beijing bus station. Experimental results have demonstrated that proposed method outperforms most state-of-the-art methods with far less number of parameters.
Recent CNN-based saliency models have achieved great performance on public datasets, however, most of them are sensitive to distortion (e.g., noise, compression). In this paper, an end-to-end generic salient object segmentation model called Metric Expression Network (MEnet) is proposed to overcome this drawback. Within this architecture, we construct a new topological metric space, with the implicit metric being determined by the deep network. In this way, we succeed in grouping all the pixels within the observed image semantically within this latent space into two regions: a salient region and a non-salient region. With this method, all feature extractions are carried out at the pixel level, which makes the output boundaries of salient object fine-grained. Experimental results show that the proposed metric can generate robust salient maps that allow for object segmentation. By testing the method on several public benchmarks, we show that the performance of MEnet has achieved good results. Furthermore, the proposed method outperforms previous CNN-based methods on distorted images.
The field of retail analytics has been transformed by the availability of rich data which can be used to perform tasks such as demand forecasting and inventory management. However, one task which has proved more challenging is the forecasting of demand for products which exhibit very few sales. The sparsity of the resulting data limits the degree to which traditional analytics can be deployed. To combat this, we represent sales data as a structured sparse multivariate point process which allows for features such as auto-correlation, cross-correlation, and temporal clustering, known to be present in sparse sales data. We introduce a Bayesian point process model to capture these phenomena, which includes a hurdle component to cope with sparsity and an exciting component to cope with temporal clustering within and across products. We then cast this model within a Bayesian hierarchical framework, to allow the borrowing of information across different products, which is key in addressing the data sparsity per product. We conduct a detailed analysis using real sales data to show that this model outperforms existing methods in terms of predictive power and we discuss the interpretation of the inference.
Natural language interfaces for relational databases have been explored for several decades. Majority of the work have focused on translating natural language sentences to SQL queries or narrating SQL queries in natural language. Scant attention has been paid for natural language understanding of query execution plans (QEP) of SQL queries. In this demonstration, we present a novel generic system called NEURON that facilitates natural language interaction with QEPs. NEURON accepts a SQL query (which may include joins, aggregation, nesting, among other things) as input, executes it, and generates a natural language-based description (both in text and voice form) of the execution strategy deployed by the underlying RDBMS. Furthermore, it facilitates understanding of various features related to the QEP through a natural language-based question answering framework. NEURON can be potentially useful to database application developers in comprehending query execution strategies and to database instructors and students for pedagogical support.
The rise of ‘big data’ has led to the frequent need to process and store datasets containing large numbers of high dimensional observations. Due to storage restrictions, these observations might be recorded in a lossy-but-sparse manner, with information collapsed onto a few entries which are considered important. This results in informative missingness in the observed data. Our motivating application comes from retail analytics, where the behaviour of product sales is summarised by the price elasticity of each product with respect to a small number of its top competitors. The resulting data are vectors of order statistics, due to only the top few entries being observed. Interest lies in characterising the behaviour of a product’s competitors, and clustering products based on how their competition is spread across the market. We develop nonparametric Bayesian methodology for modelling vectors of order statistics that utilises a Dirichlet Process Mixture Model with an Exponentiated Weibull kernel. Our approach allows us added flexibility for the distribution of each vector, while providing parameters that characterise the decay of the leading entries. We implement our methods on a retail analytics dataset of the cross-elasticity coefficients, and our analysis reveals distinct types of behaviour across the different products of interest.
The introduction of the schema.org vocabulary was a big step towards making websites machine read- and understandable. Due to schema.org’s RDF-like nature storing annotations in a graph database is easy and efficient. In this paper the authors show how they gather touristic data in the Austrian region of Tirol and provide this data publicly in a knowledge graph. The definition of subsets of the vocabulary is followed by providing means to map data sources efficiently to schema.org and then store the annotated content into the graph. To showcase the consumption of the touristic data four scenarios are described which use the knowledge graph for real life applications and data analysis.
Gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems. Here, we highlight a systemic issue of gradient dynamics that arise for saddle point problems, namely the presence of undesired stable stationary points that are no local optima. We propose a novel optimization approach that exploits curvature information in order to escape from these undesired stationary points. We prove that different optimization methods, including gradient method and adagrad, equipped with curvature exploitation can escape non-optimal stationary points. We also provide empirical results on common saddle point problems which confirm the advantage of using curvature exploitation.
This paper explores a variety of topics related to the question of testing the equality of covariance matrices in multivariate linear models, particularly in the MANOVA setting. The main focus is on graphical methods that can be used to address the evaluation of this assumption. We introduce some extensions of data ellipsoids, hypothesis-error (HE) plots and canonical discriminant plots and demonstrate how they can be applied to the testing of equality of covariance matrices. Further, a simple plot of the components of Box’s M test is proposed that shows _how_ groups differ in covariance and also suggests other visualizations and alternative test statistics. These methods are implemented and freely available in the **heplots** and **candisc** packages for R. Examples from the paper are available in supplementary materials.
We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework. We propose a hierarchical multi-scale language model in which short time-scale dependencies are encoded in the hidden state of a lower-level recurrent neural network while longer time-scale dependencies are encoded in the dynamic of the lower-level network by having a meta-learner update the weights of the lower-level neural network in an online meta-learning fashion. We use elastic weights consolidation as a higher-level to prevent catastrophic forgetting in our continuous learning framework.
Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named SASS which is based on the notion of similarities in the agent’s state-action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single speedup method with minimal human designer effort overhead.
Analysis of longitudinal randomised controlled trials is frequently complicated because patients deviate from the protocol. Where such deviations are relevant for the estimand, we are typically required to make an untestable assumption about post-deviation behaviour in order to perform our primary analysis and estimate the treatment effect. In such settings, it is now widely recognised that we should follow this with sensitivity analyses to explore the robustness of our inferences to alternative assumptions about post-deviation behaviour. Although there has been a lot of work on how to conduct such sensitivity analyses, little attention has been given to the appropriate loss of information due to missing data within sensitivity analysis. We argue more attention needs to be given to this issue, showing it is quite possible for sensitivity analysis to decrease and increase the information about the treatment effect. To address this critical issue, we introduce the concept of information-anchored sensitivity analysis. By this we mean sensitivity analysis in which the proportion of information about the treatment estimate lost due to missing data is the same as the proportion of information about the treatment estimate lost due to missing data in the primary analysis. We argue this forms a transparent, practical starting point for interpretation of sensitivity analysis. We then derive results showing that, for longitudinal continuous data, a broad class of controlled and reference-based sensitivity analyses performed by multiple imputation are information-anchored. We illustrate the theory with simulations and an analysis of a peer review trial, then discuss our work in the context of other recent work in this area. Our results give a theoretical basis for the use of controlled multiple imputation procedures for sensitivity analysis.
Machine Learning techniques are widely used by online services (e.g. Google, Apple) in order to analyze and make predictions on user data. As many of the provided services are user-centric (e.g. personal photo collections, speech recognition, personal assistance), user data generated on personal devices is key to provide the service. In order to protect the data and the privacy of the user, federated learning techniques have been proposed where the data never leaves the user’s device and ‘only’ model updates are communicated back to the server. In our work, we propose a new threat model that is not concerned with learning about the content – but rather is concerned with the linkability of users during such decentralized learning scenarios. We show that model updates are characteristic for users and therefore lend themselves to linkability attacks. We show identification and matching of users across devices in closed and open world scenarios. In our experiments, we find our attacks to be highly effective, achieving 20x-175x chance-level performance. In order to mitigate the risks of linkability attacks, we study various strategies. As adding random noise does not offer convincing operation points, we propose strategies based on using calibrated domain-specific data; we find these strategies offers substantial protection against linkability threats with little effect to utility.
In this work, we argue for the importance of causal reasoning in creating fair algorithms for decision making. We give a review of existing approaches to fairness, describe work in causality necessary for the understanding of causal approaches, argue why causality is necessary for any approach that wishes to be fair, and give a detailed analysis of the many recent approaches to causality-based fairness.
Inspired by recent successes of Monte-Carlo tree search (MCTS) in a number of artificial intelligence (AI) application domains, we propose a model-based reinforcement learning (RL) technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon Markov decision process. The terminal condition of the finite-horizon problems, or the leaf-node evaluator of the decision tree generated by MCTS, is specified using a combination of an estimated value function and an estimated policy function. The recommendations generated by the MCTS procedure are then provided as feedback in order to refine, through classification and regression, the leaf-node evaluator for the next iteration. We provide the first sample complexity bounds for a tree search-based RL algorithm. In addition, we show that a deep neural network implementation of the technique can create a competitive AI agent for the popular multi-player online battle arena (MOBA) game King of Glory.