Making decisions in the presence of a strategic opponent requires one to take into account the opponent’s ability to actively mask its intended objective. To describe such strategic situations, we introduce the non-cooperative inverse reinforcement learning (N-CIRL) formalism. The N-CIRL formalism consists of two agents with completely misaligned objectives, where only one of the agents knows the true objective function. Formally, we model the N-CIRL formalism as a zero-sum Markov game with one-sided incomplete information. Through interacting with the more informed player, the less informed player attempts to both infer, and act according to, the true objective function. As a result of the one-sided incomplete information, the multi-stage game can be decomposed into a sequence of single-stage games expressed by a recursive formula. Solving this recursive formula yields the value of the N-CIRL game and the more informed player’s equilibrium strategy. Another recursive formula, constructed by forming an auxiliary game, termed the dual game, yields the less informed player’s strategy. Building upon these two recursive formulas, we develop a computationally tractable algorithm to approximately solve for the equilibrium strategies. Finally, we demonstrate the benefits of our N-CIRL formalism over the existing multi-agent IRL formalism via extensive numerical simulation in a novel cyber security setting.
Data sets for fairness relevant tasks can lack examples or be biased according to a specific label in a sensitive attribute. We demonstrate the usefulness of weight based meta-learning approaches in such situations. For models that can be trained through gradient descent, we demonstrate that there are some parameter configurations that allow models to be optimized from a few number of gradient steps and with minimal data which are both fair and accurate. To learn such weight sets, we adapt the popular MAML algorithm to Fair-MAML by the inclusion of a fairness regularization term. In practice, Fair-MAML allows practitioners to train fair machine learning models from only a few examples when data from related tasks is available. We empirically exhibit the value of this technique by comparing to relevant baselines.
We investigate randomized methods for solving overdetermined linear least-squares problems, where the Hessian is approximated based on a random projection of the data matrix. We consider a random subspace embedding which is either drawn at the beginning and then fixed, or, refreshed at each iteration. We provide an exact finite-time analysis of the refreshed embeddings method for a broad class of random matrices, an exact asymptotic analysis of the fixed embedding method with a Gaussian matrix, and a non-asymptotic analysis of the fixed embedding method for Gaussian and SRHT matrices, with and without momentum acceleration. Surprisingly, we show that, for Gaussian matrices, the refreshed sketching method with no momentum yields the same asymptotic rate of convergence as the fixed embedding method accelerated with momentum. Furthermore, we characterize optimal step sizes and prove that, for a broad class of random matrices including the Gaussian ensemble, momentum does not accelerate the refreshed embeddings method. Hence, among the class of randomized algorithms we consider, a fixed subspace embedding with momentum yields the fastest rate of convergence, along with the lowest computational complexity. Then, picking the accelerated, fixed embedding method as the algorithm of choice, we obtain a faster algorithm by optimizing over the choice of the sketching dimension. Our choice of the sketch size yields an algorithm, for solving overdetermined least-squares problem, with a lower computational complexity compared to current state-of-the-art least-squares iterative methods based on randomized pre-conditioners. In particular, given the sketched data matrix, as the sample size grows, the resulting computational complexity becomes sub-linear in the problem dimensions. We validate numerically our guarantees on large sample datasets, both for Gaussian and SRHT embeddings.
We define and study the problem of modular concept learning, that is, learning a concept that is a cross product of component concepts. If an element’s membership in a concept depends solely on it’s membership in the components, learning the concept as a whole can be reduced to learning the components. We analyze this problem with respect to different types of oracle interfaces, defining different sets of queries. If a given oracle interface cannot answer questions about the components, learning can be difficult, even when the components are easy to learn with the same type of oracle queries. While learning from superset queries is easy, learning from membership, equivalence, or subset queries is harder. However, we show that these problems become tractable when oracles are given a positive example and are allowed to ask membership queries.
This paper introduces a new way to calculate distance-based statistics, particularly when the data are multivariate. The main idea is to pre-calculate the optimal projection directions given the variable dimension, and to project multidimensional variables onto these pre-specified projection directions; by subsequently utilizing the fast algorithm that is developed in Huo and Sz\’ekely [2016] for the univariate variables, the computational complexity can be improved from $O(m^2)$ to $O(n m \cdot \mbox{log}(m))$, where $n$ is the number of projection directions and $m$ is the sample size. When $n \ll m/\log(m)$, computational savings can be achieved. The key challenge is how to find the optimal pre-specified projection directions. This can be obtained by minimizing the worse-case difference between the true distance and the approximated distance, which can be formulated as a nonconvex optimization problem in a general setting. In this paper, we show that the exact solution of the nonconvex optimization problem can be derived in two special cases: the dimension of the data is equal to either $2$ or the number of projection directions. In the generic settings, we propose an algorithm to find some approximate solutions. Simulations confirm the advantage of our method, in comparison with the pure Monte Carlo approach, in which the directions are randomly selected rather than pre-calculated.
Driven by the visions of Internet of Things and 5G communications, the edge computing systems integrate computing, storage and network resources at the edge of the network to provide computing infrastructure, enabling developers to quickly develop and deploy edge applications. Nowadays the edge computing systems have received widespread attention in both industry and academia. To explore new research opportunities and assist users in selecting suitable edge computing systems for specific applications, this survey paper provides a comprehensive overview of the existing edge computing systems and introduces representative projects. A comparison of open source tools is presented according to their applicability. Finally, we highlight energy efficiency and deep learning optimization of edge computing systems. Open issues for analyzing and designing an edge computing system are also studied in this survey.
The use of low numerical precision is a fundamental optimization included in modern accelerators for Deep Neural Networks (DNNs). The number of bits of the numerical representation is set to the minimum precision that is able to retain accuracy based on an offline profiling, and it is kept constant for DNN inference. In this work, we explore the use of dynamic precision selection during DNN inference. We focus on Long Short Term Memory (LSTM) networks, which represent the state-of-the-art networks for applications such as machine translation and speech recognition. Unlike conventional DNNs, LSTM networks remember information from previous evaluations by storing data in the LSTM cell state. Our key observation is that the cell state determines the amount of precision required: time steps where the cell state changes significantly require higher precision, whereas time steps where the cell state is stable can be computed with lower precision without any loss in accuracy. Based on this observation, we implement a novel hardware scheme that tracks the evolution of the elements in the LSTM cell state and dynamically selects the appropriate precision in each time step. For a set of popular LSTM networks, our scheme selects the lowest precision for more than 66% of the time, outperforming systems that fix the precision statically. We evaluate our proposal on top of a modern accelerator highly optimized for LSTM computation, and show that it provides 1.56x speedup and 23% energy savings on average without any loss in accuracy. The extra hardware to determine the appropriate precision represents a small area overhead of 8.8%.
Have you ever wondered how a song might sound if performed by a different artist? In this work, we propose SCM-GAN, an end-to-end non-parallel song conversion system powered by generative adversarial and transfer learning that allows users to listen to a selected target singer singing any song. SCM-GAN first separates songs into vocals and instrumental music using a U-Net network, then converts the vocal segments to the target singer using advanced CycleGAN-VC, before merging the converted vocals with their corresponding background music. SCM-GAN is first initialized with feature representations learned from a state-of-the-art voice-to-voice conversion and then trained on a dataset of non-parallel songs. Furthermore, SCM-GAN is evaluated against a set of metrics including global variance GV and modulation spectra MS on the 24 Mel-cepstral coefficients (MCEPs). Transfer learning improves the GV by 35% and the MS by 13% on average. A subjective comparison is conducted to test the user satisfaction with the quality and the naturalness of the conversion. Results show above par similarity between SCM-GAN’s output and the target (70\% on average) as well as great naturalness of the converted songs.
Support Vector Machines (SVMs) can solve structured multi-output learning problems such as multi-label classification, multiclass classification and vector regression. SVM training is expensive especially for large and high dimensional datasets. The bottleneck of the SVM training often lies in the kernel value computation. In many real-world problems, the same kernel values are used in many iterations during the training, which makes the caching of kernel values potentially useful. The majority of the existing studies simply adopt the LRU (least recently used) replacement strategy for caching kernel values. However, as we analyze in this paper, the LRU strategy generally achieves high hit ratio near the final stage of the training, but does not work well in the whole training process. Therefore, we propose a new caching strategy called EFU (less frequently used) which replaces the less frequently used kernel values that enhances LFU (least frequently used). Our experimental results show that EFU often has 20\% higher hit ratio than LRU in the training with the Gaussian kernel. To further optimize the strategy, we propose a caching strategy called HCST (hybrid caching for the SVM training), which has a novel mechanism to automatically adapt the better caching strategy in the different stages of the training. We have integrated the caching strategy into ThunderSVM, a recent SVM library on many-core processors. Our experiments show that HCST adaptively achieves high hit ratios with little runtime overhead among different problems including multi-label classification, multiclass classification and regression problems. Compared with other existing caching strategies, HCST achieves 20\% more reduction in training time on average.
Despite the recent surge of interest in designing and guaranteeing mathematical formulations of fairness, virtually all existing notions of algorithmic fairness fail to be adaptable to the intricacies and nuances of the decision-making context at hand. We argue that capturing such factors is an inherently human task, as it requires knowledge of the social background in which machine learning tools impact real people’s outcomes and a deep understanding of the ramifications of automated decisions for decision subjects and society. In this work, we present a framework to construct a context-dependent mathematical formulation of fairness utilizing people’s judgment of fairness. We utilize the theoretical model of Heidari et al. (2019)—which shows that most existing formulations of algorithmic fairness are special cases of economic models of Equality of Opportunity (EOP)—and present a practical human-in-the-loop approach to pinpoint the fairness notion in the EOP family that best captures people’s perception of fairness in the given context. To illustrate our framework, we run human-subject experiments designed to learn the parameters of Heidari et al.’s EOP model (including circumstance, desert, and utility) in a hypothetical recidivism decision-making scenario. Our work takes an initial step toward democratizing the formulation of fairness and utilizing human-judgment to tackle a fundamental shortcoming of automated decision-making systems: that the machine on its own is incapable of understanding and processing the human aspects and social context of its decisions.
Good data stewardship requires removal of data at the request of the data’s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to ‘remove’ data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.
This paper presents a detailed comparison of a recently proposed algorithm for optimizing decision trees, tree alternating optimization (TAO), with other popular, established algorithms, such as CART and C5.0. We compare their performance on a number of datasets of different size, dimensionality and number of classes, across different performance factors: accuracy and tree size (in terms of the number of leaves or the depth of the tree). We find that TAO achieves higher accuracy in every single dataset, often by a large margin.
Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised classification tasks. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers’ performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data.
Generative Adversarial Networks (GANs) have become a very popular tool for implicitly learning high-dimensional probability distributions. Several improvements have been made to the original GAN formulation to address some of its shortcomings like mode collapse, convergence issues, entanglement, poor visual quality etc. While a significant effort has been directed towards improving the visual quality of images generated by GANs, it is rather surprising that objective image quality metrics have neither been employed as cost functions nor as regularizers in GAN objective functions. In this work, we show how a distance metric that is a variant of the Structural SIMilarity (SSIM) index (a popular full-reference image quality assessment algorithm), and a novel quality aware discriminator gradient penalty function that is inspired by the Natural Image Quality Evaluator (NIQE, a popular no-reference image quality assessment algorithm) can each be used as excellent regularizers for GAN objective functions. Specifically, we demonstrate state-of-the-art performance using the Wasserstein GAN gradient penalty (WGAN-GP) framework over CIFAR-10, STL10 and CelebA datasets.