Heterogeneous programming has started becoming the norm in order to achieve better performance by running portions of code on the most appropriate hardware resource. Currently, significant engineering efforts are undertaken in order to enable existing programming languages to perform heterogeneous execution mainly on GPUs. In this paper we describe Jacc, an experimental framework which allows developers to program GPGPUs directly from Java. By using the Jacc framework, developers have the ability to add GPGPU support into their applications with minimal code refactoring. To simplify the development of GPGPU applications we allow developers to model heterogeneous code using two key abstractions: \textit{tasks}, which encapsulate all the information needed to execute code on a GPGPU; and \textit{task graphs}, which capture the inter-task control-flow of the application. Using this information the Jacc runtime is able to automatically handle data movement and synchronization between the host and the GPGPU; eliminating the need for explicitly managing disparate memory spaces. In order to generate highly parallel GPGPU code, Jacc provides developers with the ability to decorate key aspects of their code using annotations. The compiler, in turn, exploits this information in order to automatically generate code without requiring additional code refactoring. Finally, we demonstrate the advantages of Jacc, both in terms of programmability and performance, by evaluating it against existing Java frameworks. Experimental results show an average performance speedup of 32x and a 4.4x code decrease across eight evaluated benchmarks on a NVIDIA Tesla K20m GPU.
We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and na\'{i}ve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.
Fractional imputation (FI) is a relatively new method of imputation for handling item nonresponse in survey sampling. In FI, several imputed values with their fractional weights are created for each missing item. Each fractional weight represents the conditional probability of the imputed value given the observed data, and the parameters in the conditional probabilities are often computed by an iterative method such as EM algorithm. The underlying model for FI can be fully parametric, semiparametric, or nonparametric, depending on plausibility of assumptions and the data structure. In this paper, we give an overview of FI, introduce key ideas and methods to readers who are new to the FI literature, and highlight some new development. We also provide guidance on practical implementation of FI and valid inferential tools after imputation. We demonstrate the empirical performance of FI with respect to multiple imputation using a pseudo finite population generated from a sample in Monthly Retail Trade Survey in US Census Bureau.
Anomaly detection is an important task in many real world applications such as fraud detection, suspicious activity detection, health care monitoring etc. In this paper, we tackle this problem from supervised learning perspective in online learning setting. We maximize well known \emph{Gmean} metric for class-imbalance learning in online learning framework. Specifically, we show that maximizing \emph{Gmean} is equivalent to minimizing a convex surrogate loss function and based on that we propose novel online learning algorithm for anomaly detection. We then show, by extensive experiments, that the performance of the proposed algorithm with respect to $sum$ metric is as good as a recently proposed Cost-Sensitive Online Classification(CSOC) algorithm for class-imbalance learning over various benchmarked data sets while keeping running time close to the perception algorithm. Our another conclusion is that other competitive online algorithms do not perform consistently over data sets of varying size. This shows the potential applicability of our proposed approach.
Heterogeneous systems consisting of general-purpose processors and different types of hardware accelerators are becoming more and more common in HPC systems. Especially FPGAs provide a promising opportunity to improve both performance and energy efficiency of such systems. Adding FPGAs to clouds or data centers allows easy access to such reconfigurable resources. In this paper we present our cloud service models and cloud hypervisor called RC3E, which integrates virtualized FPGA-based hardware accelerators into a cloud environment. With our hardware and software framework, multiple (virtual) user designs can be executed on a single physical FPGA device. We demonstrate the performance of our approach by implementing up to four virtual user cores on a single device and present future perspectives for FPGAs in cloud-based data environments.
This extended abstract presents ThreadPoolComposer, a high-level synthesis-based development framework and meta-toolchain that provides a uniform programming interface for FPGAs portable across multiple platforms.