When dealing with non-stationary systems, for which many time series are available, it is common to divide time in epochs, i.e. smaller time intervals and deal with short time series in the hope to have some form of approximate stationarity on that time scale. We can then study time evolution by looking at properties as a function of the epochs. This leads to singular correlation matrices and thus poor statistics. In the present paper, we propose an ensemble technique to deal with a large set of short time series without any consideration of non-stationarity. We randomly select subsets of time series and thus create an ensemble of non-singular correlation matrices. As the selection possibilities are binomially large, we will obtain good statistics for eigenvalues of correlation matrices, which are typically not independent. Once we defined the ensemble, we analyze its behavior for constant and block-diagonal correlations and compare numerics with analytic results for the corresponding correlated Wishart ensembles. We discuss differences resulting from spurious correlations due to repeatitive use of time-series. The usefulness of this technique should extend beyond the stationary case if, on the time scale of the epochs, we have quasi-stationarity at least for most epochs.
The efficient management of data is an important prerequisite for realising the potential of the Internet of Things (IoT). Two issues given the large volume of structured time-series IoT data are, addressing the difficulties of data integration between heterogeneous Things and improving ingestion and query performance across databases on both resource-constrained Things and in the cloud. In this paper, we examine the structure of public IoT data and discover that the majority exhibit unique flat, wide and numerical characteristics with a mix of evenly and unevenly-spaced time-series. We investigate the advances in time-series databases for telemetry data and combine these findings with microbenchmarks to determine the best compression techniques and storage data structures to inform the design of a novel solution optimised for IoT data. A query translation method with low overhead even on resource-constrained Things allows us to utilise rich data models like the Resource Description Framework (RDF) for interoperability and data integration on top of the optimised storage. Our solution, TritanDB, shows an order of magnitude performance improvement across both Things and cloud hardware on many state-of-the-art databases within IoT scenarios. Finally, we describe how TritanDB supports various analyses of IoT time-series data like forecasting.
Identification of unexpectedly high values in a time series is useful for epidemiologists, economists, and other social scientists interested in the effect of an exposure spike on an outcome variable. However, the best method to identify spikes in time series is not known. This paper aims to fill this gap by testing the performance of several spike detection methods in a simulation setting. We created simulations parameterized by monthly violence rates in nine California cities that represented different series features, and randomly inserted spikes into the series. We then compared the ability to detect spikes of the following methods: ARIMA modeling, Kalman filtering and smoothing, wavelet modeling with soft thresholding, and an iterative outlier detection method. We varied the magnitude of spikes from 10-50% of the mean rate over the study period and varied the number of spikes inserted from 1 to 10. We assessed performance of each method using sensitivity and specificity. The Kalman filtering and smoothing procedure had the best overall performance. We applied Kalman filtering and smoothing to the monthly violence rates in nine California cities and identified spikes in the rate over the 2005-2012 period.
The evaluation of interactive machine learning systems remains a difficult task. These systems learn from and adapt to the human, but at the same time, the human receives feedback and adapts to the system. Getting a clear understanding of these subtle mechanisms of co-operation and co-adaptation is challenging. In this chapter, we report on our experience in designing and evaluating various interactive machine learning applications from different domains. We argue for coupling two types of validation: algorithm-centered analysis, to study the computational behaviour of the system; and human-centered evaluation, to observe the utility and effectiveness of the application for end-users. We use a visual analytics application for guided search, built using an interactive evolutionary approach, as an exemplar of our work. Our observation is that human-centered design and evaluation complement algorithmic analysis, and can play an important role in addressing the ‘black-box’ effect of machine learning. Finally, we discuss research opportunities that require human-computer interaction methodologies, in order to support both the visible and hidden roles that humans play in interactive machine learning.
The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes. This can only be achieved through scalable and efficient distributed training, since a single node/card cannot satisfy the compute, memory, and I/O requirements of today’s state-of-the-art deep neural networks. However, scaling synchronous Stochastic Gradient Descent (SGD) is still a challenging problem and requires continued research/development. This entails innovations spanning algorithms, frameworks, communication libraries, and system design. In this paper, we describe the philosophy, design, and implementation of Intel Machine Learning Scalability Library (MLSL) and present proof-points demonstrating scaling DL training on 100s to 1000s of nodes across Cloud and HPC systems.
The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call ‘direct optimization’, requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $\mathcal{O}(fp)$ effort; where $f$ is the number of frameworks and $p$ is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms. Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of operations).
Neural text generation models are often autoregressive language models or seq2seq models. These models generate text by sampling words sequentially, with each word conditioned on the previous word, and are state-of-the-art for several machine translation and summarization benchmarks. These benchmarks are often defined by validation perplexity even though this is not a direct measure of the quality of the generated text. Additionally, these models are typically trained via maximum likelihood and teacher forcing. These methods are well-suited to optimizing perplexity but can result in poor sample quality since generating text requires conditioning on sequences of words that may have never been observed at training time. We propose to improve sample quality using Generative Adversarial Networks (GANs), which explicitly train the generator to produce high quality samples and have shown a lot of success in image generation. GANs were originally designed to output differentiable values, so discrete language generation is challenging for them. We claim that validation perplexity alone is not indicative of the quality of text generated by a model. We introduce an actor-critic conditional GAN that fills in missing text conditioned on the surrounding context. We show qualitatively and quantitatively, evidence that this produces more realistic conditional and unconditional text samples compared to a maximum likelihood trained model.
Online Reputation Monitoring (ORM) is concerned with the use of computational tools to measure the reputation of entities online, such as politicians or companies. In practice, current ORM methods are constrained to the generation of data analytics reports, which aggregate statistics of popularity and sentiment on social media. We argue that this format is too restrictive as end users often like to have the flexibility to search for entity-centric information that is not available in predefined charts. As such, we propose the inclusion of entity retrieval capabilities as a first step towards the extension of current ORM capabilities. However, an entity’s reputation is also influenced by the entity’s relationships with other entities. Therefore, we address the problem of Entity-Relationship (E-R) retrieval in which the goal is to search for multiple connected entities. This is a challenging problem which traditional entity search systems cannot cope with. Besides E-R retrieval we also believe ORM would benefit of text-based entity-centric prediction capabilities, such as predicting entity popularity on social media based on news events or the outcome of political surveys. However, none of these tasks can provide useful results if there is no effective entity disambiguation and sentiment analysis tailored to the context of ORM. Consequently, this thesis address two computational problems in Online Reputation Monitoring: Entity Retrieval and Text Mining. We researched and developed methods to extract, retrieve and predict entity-centric information spread across the Web.
This paper is devoted to a detailed convergence analysis of the method of codifferential descent (MCD) developed by professor V.F. Demyanov for solving a large class of nonsmooth nonconvex optimization problems. We propose a generalization of the MCD that is more suitable for applications than the original method, and that utilizes only a part of a codifferential on every iteration, which allows one to reduce the overall complexity of the method. With the use of some general results on uniformly codifferentiable functions obtained in this paper, we prove the global convergence of the generalized MCD in the infinite dimensional case. Also, we propose and analyse a quadratic regularization of the MCD, which is the first general method for minimizing a codifferentiable function over a convex set. Apart from convergence analysis, we also discuss the robustness of the MCD with respect to computational errors, possible step size rules, and a choice of parameters of the algorithm. In the end of the paper we estimate a rate of convergence of the MCD for a class of nonsmooth nonconvex functions that arises, in particular, in cluster analysis. We prove that under some general assumptions the method converges with linear rate, and it convergence quadratically, provided a certain first order sufficient optimality condition holds true.
This paper describes and discusses Bayesian Neural Network (BNN). The paper showcases a few different applications of them for classification and regression problems. BNNs are comprised of a Probabilistic Model and a Neural Network. The intent of such a design is to combine the strengths of Neural Networks and Stochastic modeling. Neural Networks exhibit continuous function approximator capabilities. Stochastic models allow direct specification of a model with known interaction between parameters to generate data. During the prediction phase, stochastic models generate a complete posterior distribution and produce probabilistic guarantees on the predictions. Thus BNNs are a unique combination of neural network and stochastic models with the stochastic model forming the core of this integration. BNNs can then produce probabilistic guarantees on it’s predictions and also generate the distribution of parameters that it has learnt from the observations. That means, in the parameter space, one can deduce the nature and shape of the neural network’s learnt parameters. These two characteristics makes them highly attractive to theoreticians as well as practitioners. Recently there has been a lot of activity in this area, with the advent of numerous probabilistic programming libraries such as: PyMC3, Edward, Stan etc. Further this area is rapidly gaining ground as a standard machine learning approach for numerous problems
This paper studies an online optimization problem with switching costs and a finite prediction window. We propose two computationally efficient algorithms: Receding Horizon Gradient Descent (RHGD), and Receding Horizon Accelerated Gradient (RHAG). Both algorithms only require a finite number of gradient evaluations at each time. We show that both the dynamic regret and the competitive ratio of the proposed algorithms decay exponentially fast with the length of the prediction window, and the decay rate of RHAG is larger than RHGD. Moreover, we provide a fundamental lower bound on the dynamic regret for general online algorithms with a finite prediction window. The lower bound matches the dynamic regret of our RHAG, meaning that the performance can not improve significantly even with more computation. Lastly, we present simulation results to test our algorithms numerically.
We introduce a correlation coefficient that is specifically designed to deal with a variety of ranking formats including those containing non-strict (i.e., with-ties) and incomplete (i.e., null) preferences. The new measure, which can be regarded as a generalization of the seminal Kendall tau correlation coefficient, is proven to be equivalent to a recently developed axiomatic ranking distance. In an effort to further unify and enhance the two robust ranking methodologies, this work proves the equivalence an additional axiomatic-distance and correlation-coefficient pairing in the space of non-strict incomplete rankings. In particular, the bridging of these complementary theories reinforces the singular suitability of the featured correlation coefficient to solve the general consensus ranking problem. The latter premise is further bolstered by an accompanying set of experiments on random instances, which are generated via a herein developed sampling technique connected with the classic Mallows distribution of ranking data. To carry out the featured experiments we devise a specialized branch and bound algorithm that provides the full set of alternative optimal solutions efficiently. Applying the algorithm on the generated random instances reveals that the featured correlation coefficient yields consistently fewer alternative optimal solutions as data becomes noisier.
This paper investigates and evaluates support vector machine active learning algorithms for use with imbalanced datasets, which commonly arise in many applications such as information extraction applications. Algorithms based on closest-to-hyperplane selection and query-by-committee selection are combined with methods for addressing imbalance such as positive amplification based on prevalence statistics from initial random samples. Three algorithms (ClosestPA, QBagPA, and QBoostPA) are presented and carefully evaluated on datasets for text classification and relation extraction. The ClosestPA algorithm is shown to consistently outperform the other two in a variety of ways and insights are provided as to why this is the case.
Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.
Graphical lasso is one of the most used estimators for inferring genetic networks. Despite its diffusion, there are several fields in applied research where the limits of detection of modern measurement technologies make the use of this estimator theoretically unfounded, even when the assumption of a multivariate Gaussian distribution is satisfied. Typical examples are data generated by polymerase chain reactions and flow cytometer. The combination of censoring and high-dimensionality make inference of the underlying genetic networks from these data very challenging. In this paper we propose an $\ell_1$-penalized Gaussian graphical model for censored data and derive two EM-like algorithms for inference. By an extensive simulation study, we evaluate the computational efficiency of the proposed algorithms and show that our proposal overcomes existing competitors when censored data are available. We apply the proposed method to gene expression data coming from microfluidic RT-qPCR technology in order to make inference on the regulatory mechanisms of blood development.
We propose a novel Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Buchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, RL synthesizes a policy that satisfies the property: as such, the policy synthesis procedure is ‘constrained’ by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP – a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.