Approximate computing is being considered as a promising design paradigm to overcome the energy and performance challenges in computationally demanding applications. If the case where the accuracy can be configured, the quality level versus energy efficiency or delay also may be traded-off. For this technique to be used, one needs to make sure a satisfactory user experience. This requires employing error predictors to detect unacceptable approximation errors. In this work, we propose a scheduling-aware feature selection method which leverages the intermediate results of the hardware accelerator to improve the prediction accuracy. Additionally, it configures the error predictors according to the energy consumption and latency of the system. The approach enjoys the flexibility of the prediction time for a higher accuracy. The results on various benchmarks demonstrate significant improvements in the prediction accuracy compared to the prior works which used only the accelerator inputs for the prediction.
When designing a neural caption generator, a convolutional neural network can be used to extract image features. Is it possible to also use a neural language model to extract sentence prefix features? We answer this question by trying different ways to transfer the recurrent neural network and embedding layer from a neural language model to an image caption generator. We find that image caption generators with transferred parameters perform better than those trained from scratch, even when simply pre-training them on the text of the same captions dataset it will later be trained on. We also find that the best language models (in terms of perplexity) do not result in the best caption generators after transfer learning.
Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 – 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.
We present a study for the generation of events from a physical process with generative deep learning. To simulate physical processes it is not only important to produce physical events, but also to produce the events with the right frequency of occurrence (density). We investigate the feasibility to learn the event generation and the frequency of occurrence with Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) to produce events like Monte Carlo generators. We study three toy models from high energy physics, i.e. a simple two-body decay, the processes $e^+e^-\to Z \to l^+l^-$ and $p p \to t\bar{t}$ including the decay of the top quarks and a simulation of the detector response. We show that GANs and the standard VAE do not produce the right distributions. By buffering density information of Monte Carlo events in latent space given the encoder of a VAE we are able to construct a prior for the sampling of new events from the decoder that yields distributions that are in very good agreement with real Monte Carlo events and are generated $\mathcal{O}(10^8)$ times faster. Applications of this work include generic density estimation and sampling, targeted event generation via a principal component analysis of encoded events in the latent space and the possibility to generate better random numbers for importance sampling, e.g. for the phase space integration of matrix elements in quantum perturbation theories. The method also allows to build event generators directly from real data events.
Learning informative representations of data is one of the primary goals of deep learning, but there is still little understanding as to what representations a neural network actually learns. To better understand this, subspace match was recently proposed as a method for assessing the similarity of the representations learned by neural networks. It has been shown that two networks with the same architecture trained from different initializations learn representations that at hidden layers show low similarity when assessed with subspace match, even when the output layers show high similarity and the networks largely exhibit similar performance on classification tasks. In this note, we present a simple example motivated by standard results in commutative algebra to illustrate how this can happen, and show that although the subspace match at a hidden layer may be 0, the representations learned may be isomorphic as vector spaces. This leads us to conclude that a subspace match comparison of learned representations may well be uninformative, and it points to the need for better methods of understanding learned representations.
Although multivariate count data are routinely collected in many application areas, there is surprisingly little work developing flexible models for characterizing their dependence structure. This is particularly true when interest focuses on inferring the conditional independence graph. In this article, we propose a new class of pairwise Markov random field-type models for the joint distribution of a multivariate count vector. By employing a novel type of transformation, we avoid restricting to non-negative dependence structures or inducing other restrictions through truncations. Taking a Bayesian approach to inference, we choose a Dirichlet process prior for the distribution of a random effect to induce great flexibility in the specification. An efficient Markov chain Monte Carlo (MCMC) algorithm is developed for posterior computation. We prove various theoretical properties, including posterior consistency, and show that our COunt Nonparametric Graphical Analysis (CONGA) approach has good performance relative to competitors in simulation studies. The methods are motivated by an application to neuron spike count data in mice.
Pollutant emissions from coal-burning power plants have been deemed to adversely impact ambient air quality and public health conditions. Despite the noticeable reduction in emissions and the improvement of air quality since the Clean Air Act (CAA) became the law, the public-health benefits from changes in emissions have not been widely evaluated yet. In terms of the chain of accountability (HEI Accountability Working Group, 2003), the link between pollutant emissions from the power plants (SO2) and public health conditions (respiratory diseases) accounting for changes in ambient air quality (PM2.5) is unknown. We provide the first assessment of the longitudinal effect of specific pollutant emission (SO2) on public health outcomes that is mediated through changes in the ambient air quality. It is of particular interest to examine the extent to which the effect that is mediated through changes in local ambient air quality differs from year to year. In this paper, we propose a Bayesian approach to estimate novel causal estimands: time-varying mediation effects in the presence of mediators and responses measured every year. We replace the commonly invoked sequential ignorability assumption with a new set of assumptions which are sufficient to identify the distributions of the natural indirect and direct effects in this setting.
Blockchain technologies are expected to make a significant impact on a variety of industries. However, one issue holding them back is their limited transaction throughput, especially compared to established solutions such as distributed database systems. In this paper, we re-architect a modern permissioned blockchain system, Hyperledger Fabric, to increase transaction throughput from 3,000 to 20,000 transactions per second. We focus on performance bottlenecks beyond the consensus mechanism, and we propose architectural changes that reduce computation and I/O overhead during transaction ordering and validation to greatly improve throughput. Notably, our optimizations are fully plug-and-play and do not require any interface changes to Hyperledger Fabric.
The design of reward functions in reinforcement learning is a human skill that comes with experience. Unfortunately, there is not any methodology in the literature that could guide a human to design the reward function or to allow a human to transfer the skills developed in designing reward functions to another human and in a systematic manner. In this paper, we use Systematic Instructional Design, an approach in human education, to engineer a machine education methodology to design reward functions for reinforcement learning. We demonstrate the methodology in designing a hierarchical genetic reinforcement learner that adopts a neural network representation to evolve a swarm controller for an agent shepherding a boids-based swarm. The results reveal that the methodology is able to guide the design of hierarchical reinforcement learners, with each model in the hierarchy learning incrementally through a multi-part reward function. The hierarchy acts as a decision fusion function that combines the individual behaviours and skills learnt by each instruction to create a smart shepherd to control the swarm.
Long-term temporal correlations observed in event sequences of natural and social phenomena have been characterized by algebraically decaying autocorrelation functions. Such temporal correlations can be understood not only by heterogeneous interevent times (IETs) but also by correlations between IETs. In contrast to the role of the heterogeneities of IETs on the autocorrelation function, yet little is known about the effects due to the correlations between IETs. In order to investigate these effects, we devise an analytically solvable model with tunable memory coefficient $M$ between two consecutive IETs. We derive an analytic form of the autocorrelation function as a function of $M$ for arbitrary IET distributions, which are numerically confirmed, particularly for heavy-tailed IET distributions. Our analytic approach enables us to better understand long-term temporal correlations due to the correlations between IETs.
Computational Intelligence algorithms have gained a lot of attention of researchers in the recent years due to their ability to deliver near optimal solutions. In this paper we propose a new hierarchy which classifies algorithms based on their sources of inspiration. The algorithms have been divided into two broad domains namely modeling of human mind and nature inspired intelligence. Algorithms of Modeling of human mind take their motivation from the manner in which humans perceive and deal with information. Similarly algorithms of nature inspired intelligence domain are based on ordinary phenomenon occurring in nature. The latter has further been broken into swarm intelligence, geosciences and artificial immune system. Geoscience based is the new domain whose algorithms are based on geographic phenomenon on the Earths surface. A comprehensive tabular comparison is done amongst algorithms in each domain in various attributes such as problem solving method, application, characteristics and more. For further insights, we examine a variant of every algorithm and its implementation for a specific application. To understand the performance and efficiency better, we compare the performance of select algorithms on Traveling salesman problem.
Traditional multi-armed bandit problems are geared towards finding the arm with the highest expected value — an objective that is risk-neutral. In several practical applications, e.g., finance, a risk-sensitive objective is to control the worst-case losses and Conditional Value-at-Risk (CVaR) is a popular risk measure for modelling the aforementioned objective. We consider the CVaR optimization problem in a best-arm identification framework under a fixed budget. First, we derive a novel two-sided concentration bound for a well-known CVaR estimator using empirical distribution function, assuming that the underlying distribution is unbounded, but either sub-Gaussian or light-tailed. This bound may be of independent interest. Second, we adapt the well-known successive rejects algorithm to incorporate a CVaR-based criterion and derive an upper-bound on the probability of incorrect identification of our proposed algorithm.
The model interpretation is essential in many application scenarios and to build a classification model with a ease of model interpretation may provide useful information for further studies and improvement. It is common to encounter with a lengthy set of variables in modern data analysis, especially when data are collected in some automatic ways. This kinds of datasets may not collected with a specific analysis target and usually contains redundant features, which have no contribution to a the current analysis task of interest. Variable selection is a common way to increase the ability of model interpretation and is popularly used with some parametric classification models. There is a lack of studies about variable selection in nonparametric classification models such as the density estimation-based methods and this is especially the case for multiple-class classification situations. In this study we study multiple-class classification problems using the thought of sparse non-parametric density estimation and propose a method for identifying high impacts variables for each class. We present the asymptotic properties and the computation procedure for the proposed method together with some suggested sample size. We also repost the numerical results using both synthesized and some real data sets.
As one of the most popular services over online communities, the social recommendation has attracted increasing research efforts recently. Among all the recommendation tasks, an important one is social item recommendation over high speed social media streams. Existing streaming recommendation techniques are not effective for handling social users with diverse interests. Meanwhile, approaches for recommending items to a particular user are not efficient when applied to a huge number of users over high speed streams. In this paper, we propose a novel framework for the social recommendation over streaming environments. Specifically, we first propose a novel Bi-Layer Hidden Markov Model (BiHMM) that adaptively captures the behaviors of social users and their interactions with influential official accounts to predict their long-term and short-term interests. Then, we design a new probabilistic entity matching scheme for effectively identifying the relevance score of a streaming item to a user. Following that, we propose a novel indexing scheme called {\Tree} for improving the efficiency of our solution. Extensive experiments are conducted to prove the high performance of our approach in terms of the recommendation quality and time cost.
Deep Neural Networks (DNNs) have revolutionized numerous applications, but the demand for ever more performance remains unabated. Scaling DNN computations to larger clusters is generally done by distributing tasks in batch mode using methods such as distributed synchronous SGD. Among the issues with this approach is that to make the distributed cluster work with high utilization, the workload distributed to each node must be large, which implies nontrivial growth in the SGD mini-batch size. In this paper, we propose a framework called FPDeep, which uses a hybrid of model and layer parallelism to configure distributed reconfigurable clusters to train DNNs. This approach has numerous benefits. First, the design does not suffer from batch size growth. Second, novel workload and weight partitioning leads to balanced loads of both among nodes. And third, the entire system is a fine-grained pipeline. This leads to high parallelism and utilization and also minimizes the time features need to be cached while waiting for back-propagation. As a result, storage demand is reduced to the point where only on-chip memory is used for the convolution layers. We evaluate FPDeep with the Alexnet, VGG-16, and VGG-19 benchmarks. Experimental results show that FPDeep has good scalability to a large number of FPGAs, with the limiting factor being the FPGA-to-FPGA bandwidth. With 6 transceivers per FPGA, FPDeep shows linearity up to 83 FPGAs. Energy efficiency is evaluated with respect to GOPs/J. FPDeep provides, on average, 6.36x higher energy efficiency than comparable GPU servers.
Network inference has been attracting increasing attention in several fields, notably systems biology, control engineering and biomedicine. To develop a therapy, it is essential to understand the connectivity of biochemical units and the internal working mechanisms of the target network. A network is mainly characterized by its topology and internal dynamics. In particular, sparse topology and stable system dynamics are fundamental properties of many real-world networks. In recent years, kernel-based methods have been popular in the system identification community. By incorporating empirical Bayes, this framework, which we call KEB, is able to promote system stability and impose sparse network topology. Nevertheless, KEB may not be ideal for topology detection due to local optima and numerical errors. Here, therefore, we propose an alternative, data-driven, method that is designed to greatly improve inference accuracy, compared with KEB. The proposed method uses dynamical structure functions to describe networks so that the information of unmeasurable nodes is encoded in the model. A powerful numerical sampling method, namely reversible jump Markov chain Monte Carlo (RJMCMC), is applied to explore full Bayesian models effectively. Monte Carlo simulations indicate that our approach produces more accurate networks compared with KEB methods. Furthermore, simulations of a synthetic biological network demonstrate that the performance of the proposed method is superior to that of the state-of-the-art method, namely iCheMA. The implication is that the proposed method can be used in a wide range of applications, such as controller design, machinery fault diagnosis and therapy development.
Fabricating neural models for a wide range of mobile devices demands for specific design of networks due to highly constrained resources. Both evolution algorithms (EA) and reinforced learning methods (RL) have been introduced to address Neural Architecture Search, distinct efforts to integrate both categories have also been proposed. However, these combinations usually concentrate on a single objective such as error rate of image classification. They also fail to harness the very benefits from both sides. In this paper, we present a new multi-objective oriented algorithm called MoreMNAS (Multi-Objective Reinforced Evolution in Mobile Neural Architecture Search) by leveraging good virtues from both EA and RL. In particular, we incorporate a variant of multi-objective genetic algorithm NSGA-II, in which the search space is composed of various cells so that crossovers and mutations can be performed at the cell level. Moreover, reinforced control is mixed with random process to regulate arbitrary mutation, maintaining a delicate balance between exploration and exploitation. Therefore, not only does our method prevent the searched models from degrading during the evolution process, but it also makes better use of learned knowledge. Our preliminary experiments conducted in Super Resolution domain (SR) deliver rivalling models compared to some state-of-the-art methods with much less FLOPS. More results will be disclosed very soon
We propose a procedure to decide between the null hypothesis of (strict) stationarity and the alternative of non-stationarity, in the context of a Random Coefficient AutoRegression (RCAR). The procedure is based on randomising a diagnostic which diverges to positive infinity under the null, and drifts to zero under the alternative. Thence, we propose a randomised test which can be used directly and – building on it – a decision rule to discern between the null and the alternative. The procedure can be applied under very general circumstances: albeit developed for an RCAR model, it can be used in the case of a standard AR(1) model, without requiring any modifications or prior knowledge. Also, the test works (again with no modification or prior knowledge being required) in the presence of infinite variance, and in general requires minimal assumptions on the existence of moments.
Rare events attract more attention and interests in many scenarios of big data such as anomaly detection and security systems. To characterize the rare events importance from probabilistic perspective, the message importance measure (MIM) is proposed as a kind of semantics analysis tool. Similar to Shannon entropy, the MIM has its special functional on information processing, in which the parameter $\varpi$ of MIM plays a vital role. Actually, the parameter $\varpi$ dominates the properties of MIM, based on which the MIM has three work regions where the corresponding parameters satisfy $0 \le \varpi \le 2/\max\{p(x_i)\}$, $\varpi > 2/\max\{p(x_i)\}$ and $\varpi < 0$ respectively. Furthermore, in the case $0 \le \varpi \le 2/\max\{p(x_i)\}$, there are some similarity between the MIM and Shannon entropy in the information compression and transmission, which provide a new viewpoint for information theory. This paper first constructs a system model with message importance measure and proposes the message importance loss to enrich the information processing strategies. Moreover, we propose the message importance loss capacity to measure the information importance harvest in a transmission. Furthermore, the message importance distortion function is presented to give an upper bound of information compression based on message importance measure. Additionally, the bitrate transmission constrained by the message importance loss is investigated to broaden the scope for Shannon information theory.