Shannon Entropy for Neutrosophic Information

The paper presents an extension of Shannon entropy for neutrosophic information. This extension uses a new formula for distance between two neutrosophic triplets. In addition, the obtained results are particularized for bifuzzy, intuitionistic and paraconsistent fuzzy information.

Text Similarity in Vector Space Models: A Comparative Study

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Relay: A New IR for Machine Learning Frameworks

Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon’s deep learning framework MxNet.

Omega-Regular Objectives in Model-Free Reinforcement Learning

We provide the first solution for model-free reinforcement learning of {\omega}-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of {\omega}-regular objectives to an almost- sure reachability problem and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of {\omega}-regular properties into limit- deterministic Buechi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.

AlphaSeq: Sequence Discovery with Deep Reinforcement Learning

Sequences play an important role in many applications and systems. Discovering sequences with desired properties has long been an interesting intellectual pursuit. This paper puts forth a new paradigm, AlphaSeq, to discover desired sequences algorithmically using deep reinforcement learning (DRL) techniques. AlphaSeq treats the sequence discovery problem as an episodic symbol-filling game, in which a player fills symbols in the vacant positions of a sequence set sequentially during an episode of the game. Each episode ends with a completely-filled sequence set, upon which a reward is given based on the desirability of the sequence set. AlphaSeq models the game as a Markov Decision Process (MDP), and adapts the DRL framework of AlphaGo to solve the MDP. Sequences discovered improve progressively as AlphaSeq, starting as a novice, learns to become an expert game player through many episodes of game playing. Compared with traditional sequence construction by mathematical tools, AlphaSeq is particularly suitable for problems with complex objectives intractable to mathematical analysis. We demonstrate the searching capabilities of AlphaSeq in two applications: 1) AlphaSeq successfully rediscovers a set of ideal complementary codes that can zero-force all potential interferences in multi-carrier CDMA systems. 2) AlphaSeq discovers new sequences that triple the signal-to-interference ratio — benchmarked against the well-known Legendre sequence — of a mismatched filter estimator in pulse compression radar systems.

Throughput Optimizations for FPGA-based Deep Neural Network Inference

Deep neural networks are an extremely successful and widely used technique for various pattern recognition and machine learning tasks. Due to power and resource constraints, these computationally intensive networks are difficult to implement in embedded systems. Yet, the number of applications that can benefit from the mentioned possibilities is rapidly rising. In this paper, we propose novel architectures for the inference of previously learned and arbitrary deep neural networks on FPGA-based SoCs that are able to overcome these limitations. Our key contributions include the reuse of previously transferred weight matrices across multiple input samples, which we refer to as batch processing, and the usage of compressed weight matrices, also known as pruning. An extensive evaluation of these optimizations is presented. Both techniques allow a significant mitigation of data transfers and speed-up the network inference by one order of magnitude. At the same time, we surpass the data throughput of fully-featured x86-based systems while only using a fraction of their energy consumption.

Active Fairness in Algorithmic Decision Making

Society increasingly relies on machine learning models for automated decision making. Yet, efficiency gains from automation have come paired with concern for algorithmic discrimination that can systematize inequality. Substantial work in algorithmic fairness has surged, focusing on either post-processing trained models, constraining learning processes, or pre-processing training data. Recent work has proposed optimal post-processing methods that randomize classification decisions on a fraction of individuals in order to achieve fairness measures related to parity in errors and calibration. These methods, however, have raised concern due to the information inefficiency, intra-group unfairness, and Pareto sub-optimality they entail. The present work proposes an alternative active framework for fair classification, where, in deployment, a decision-maker adaptively acquires information according to the needs of different groups or individuals, towards balancing disparities in classification performance. We propose two such methods, where information collection is adapted to group- and individual-level needs respectively. We show on real-world datasets that these can achieve: 1) calibration and single error parity (e.g., equal opportunity); and 2) parity in both false positive and false negative rates (i.e., equal odds). Moreover, we show that, by leveraging their additional degree of freedom, active approaches can outperform randomization-based classifiers previously considered optimal, while also avoiding limitations such as intra-group unfairness.

Bayesian approach to extreme-value statistics based on conditional maximum-entropy method

Recently, the conditional maximum-entropy method (abbreviated as C-MaxEnt) has been proposed for selecting priors in Bayesian statistics in a very simple way. Here, it is examined for extreme-value statistics. For the Weibull type as an explicit example, it is shown how C-MaxEnt can give rise to a prior satisfying Jeffreys’ rule.

Adversarial Attacks and Defences: A Survey

Deep learning has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using the traditional machine learning techniques in the past. In the last few years, deep learning has advanced radically in such a way that it can surpass human-level performance on a number of tasks. As a consequence, deep learning is being extensively used in most of the recent day-to-day applications. However, security of deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify the output. In recent times, different types of adversaries based on their threat model leverage these vulnerabilities to compromise a deep learning system where adversaries have high incentives. Hence, it is extremely important to provide robustness to deep learning algorithms against these adversaries. However, there are only a few strong countermeasures which can be used in all types of attack scenarios to design a robust deep learning system. In this paper, we attempt to provide a detailed discussion on different types of adversarial attacks with various threat models and also elaborate the efficiency and challenges of recent countermeasures against them.

Reconciling Feature-Reuse and Overfitting in DenseNet with Specialized Dropout

Recently convolutional neural networks (CNNs) achieve great accuracy in visual recognition tasks. DenseNet becomes one of the most popular CNN models due to its effectiveness in feature-reuse. However, like other CNN models, DenseNets also face overfitting problem if not severer. Existing dropout method can be applied but not as effective due to the introduced nonlinear connections. In particular, the property of feature-reuse in DenseNet will be impeded, and the dropout effect will be weakened by the spatial correlation inside feature maps. To address these problems, we craft the design of a specialized dropout method from three aspects, dropout location, dropout granularity, and dropout probability. The insights attained here could potentially be applied as a general approach for boosting the accuracy of other CNN models with similar nonlinear connections. Experimental results show that DenseNets with our specialized dropout method yield better accuracy compared to vanilla DenseNet and state-of-the-art CNN models, and such accuracy boost increases with the model depth.

Barrier Certificates for Assured Machine Teaching

Machine teaching has received significant attention in the past few years as a paradigm shift from machine learning. While machine learning is often concerned with improving the performance of learners, machine teaching pertains to the efficiency of teachers. For example, machine teaching seeks to find the optimal (minimum) number of data samples needed for teaching a target hypothesis to a learner. Hence, it is natural to raise the question of how can we provide assurances for teaching given a machine teaching algorithm. In this paper, we address this question by borrowing notions from control theory. We begin by proposing a model based on partially observable Markov decision processes (POMDPs) for a class of machine teaching problems. We then show that the POMDP formulation can be cast as a special hybrid system, i.e., a discrete-time switched system. Subsequently, we use barrier certificates to verify properties of this special hybrid system. We show how the computation of the barrier certificate can be decomposed and numerically implemented as the solution to a sum-of-squares (SOS) program. For illustration, we show how the proposed framework based on control theory can be used to verify the teaching performance of two well-known machine teaching methods.

Cost-Bounded Active Classification Using Partially Observable Markov Decision Processes

Active classification, i.e., the sequential decision-making process aimed at data acquisition for classification purposes, arises naturally in many applications, including medical diagnosis, intrusion detection, and object tracking. In this work, we study the problem of actively classifying dynamical systems with a finite set of Markov decision process (MDP) models. We are interested in finding strategies that actively interact with the dynamical system, and observe its reactions so that the true model is determined efficiently with high confidence. To this end, we present a decision-theoretic framework based on partially observable Markov decision processes (POMDPs). The proposed framework relies on assigning a classification belief (a probability distribution) to each candidate MDP model. Given an initial belief, some misclassification probabilities, a cost bound, and a finite time horizon, we design POMDP strategies leading to classification decisions. We present two different approaches to find such strategies. The first approach computes the optimal strategy ‘exactly’ using value iteration. To overcome the computational complexity of finding exact solutions, the second approach is based on adaptive sampling to approximate the optimal probability of reaching a classification decision. We illustrate the proposed methodology using two examples from medical diagnosis and intruder detection.

Predicting the Generalization Gap in Deep Networks with Margin Distributions

As shown in recent research, deep neural networks can perfectly fit randomly labeled data, but with very poor accuracy on held out data. This phenomenon indicates that loss functions such as cross-entropy are not a reliable indicator of generalization. This leads to the crucial question of how generalization gap should be predicted from the training data and network parameters. In this paper, we propose such a measure, and conduct extensive empirical studies on how well it can predict the generalization gap. Our measure is based on the concept of margin distribution, which are the distances of training points to the decision boundary. We find that it is necessary to use margin distributions at multiple layers of a deep network. On the CIFAR-10 and the CIFAR-100 datasets, our proposed measure correlates very strongly with the generalization gap. In addition, we find the following other factors to be of importance: normalizing margin values for scale independence, using characterizations of margin distribution rather than just the margin (closest distance to decision boundary), and working in log space instead of linear space (effectively using a product of margins rather than a sum). Our measure can be easily applied to feedforward deep networks with any architecture and may point towards new training loss functions that could enable better generalization.

Improved Gradient-Based Optimization Over Discrete Distributions

In many applications we seek to maximize an expectation with respect to a distribution over discrete variables. Estimating gradients of such objectives with respect to the distribution parameters is a challenging problem. We analyze existing solutions including finite-difference (FD) estimators and continuous relaxation (CR) estimators in terms of bias and variance. We show that the commonly used Gumbel-Softmax estimator is biased and propose a simple method to reduce it. We also derive a simpler piece-wise linear continuous relaxation that also possesses reduced bias. We demonstrate empirically that reduced bias leads to a better performance in variational inference and on binary optimization tasks.

Knowledge-guided Semantic Computing Network

It is very useful to integrate human knowledge and experience into traditional neural networks for faster learning speed, fewer training samples and better interpretability. However, due to the obscured and indescribable black box model of neural networks, it is very difficult to design its architecture, interpret its features and predict its performance. Inspired by human visual cognition process, we propose a knowledge-guided semantic computing network which includes two modules: a knowledge-guided semantic tree and a data-driven neural network. The semantic tree is pre-defined to describe the spatial structural relations of different semantics, which just corresponds to the tree-like description of objects based on human knowledge. The object recognition process through the semantic tree only needs simple forward computing without training. Besides, to enhance the recognition ability of the semantic tree in aspects of the diversity, randomicity and variability, we use the traditional neural network to aid the semantic tree to learn some indescribable features. Only in this case, the training process is needed. The experimental results on MNIST and GTSRB datasets show that compared with the traditional data-driven network, our proposed semantic computing network can achieve better performance with fewer training samples and lower computational complexity. Especially, Our model also has better adversarial robustness than traditional neural network with the help of human knowledge.

On Minimizing the Completion Times of Long Flows over Inter-Datacenter WAN

Long flows contribute huge volumes of traffic over inter-datacenter WAN. The Flow Completion Time (FCT) is a vital network performance metric that affects the running time of distributed applications and the users’ quality of experience. Flow routing techniques based on propagation or queuing latency or instantaneous link utilization are insufficient for minimization of the long flows’ FCT. We propose a routing approach that uses the remaining sizes and paths of all ongoing flows to minimize the worst-case completion time of incoming flows assuming no knowledge of future flow arrivals. Our approach can be formulated as an NP-Hard graph optimization problem. We propose BWRH, a heuristic to quickly generate an approximate solution. We evaluate BWRH against several real WAN topologies and two different traffic patterns. We see that BWRH provides solutions with an average optimality gap of less than 0.25\%. Furthermore, we show that compared to other popular routing heuristics, BWRH reduces the mean and tail FCT by up to 1.46\times and 1.53\times, respectively.

Stakeholders in Explainable AI

There is general consensus that it is important for artificial intelligence (AI) and machine learning systems to be explainable and/or interpretable. However, there is no general consensus over what is meant by ‘explainable’ and ‘interpretable’. In this paper, we argue that this lack of consensus is due to there being several distinct stakeholder communities. We note that, while the concerns of the individual communities are broadly compatible, they are not identical, which gives rise to different intents and requirements for explainability/interpretability. We use the software engineering distinction between validation and verification, and the epistemological distinctions between knowns/unknowns, to tease apart the concerns of the stakeholder communities and highlight the areas where their foci overlap or diverge. It is not the purpose of the authors of this paper to ‘take sides’ – we count ourselves as members, to varying degrees, of multiple communities – but rather to help disambiguate what stakeholders mean when they ask ‘Why?’ of an AI.

Reinforcement Learning in R

Reinforcement learning refers to a group of methods from artificial intelligence where an agent performs learning through trial and error. It differs from supervised learning, since reinforcement learning requires no explicit labels; instead, the agent interacts continuously with its environment. That is, the agent starts in a specific state and then performs an action, based on which it transitions to a new state and, depending on the outcome, receives a reward. Different strategies (e.g. Q-learning) have been proposed to maximize the overall reward, resulting in a so-called policy, which defines the best possible action in each state. Mathematically, this process can be formalized by a Markov decision process and it has been implemented by packages in R; however, there is currently no package available for reinforcement learning. As a remedy, this paper demonstrates how to perform reinforcement learning in R and, for this purpose, introduces the ReinforcementLearning package. The package provides a remarkably flexible framework and is easily applied to a wide range of different problems. We demonstrate its use by drawing upon common examples from the literature (e.g. finding optimal game strategies).

Training Machine Learning Models by Regularizing their Explanations

Neural networks are among the most accurate supervised learning methods in use today. However, their opacity makes them difficult to trust in critical applications, especially when conditions in training may differ from those in practice. Recent efforts to develop explanations for neural networks and machine learning models more generally have produced tools to shed light on the implicit rules behind predictions. These tools can help us identify when models are right for the wrong reasons. However, they do not always scale to explaining predictions for entire datasets, are not always at the right level of abstraction, and most importantly cannot correct the problems they reveal. In this thesis, we explore the possibility of training machine learning models (with a particular focus on neural networks) using explanations themselves. We consider approaches where models are penalized not only for making incorrect predictions but also for providing explanations that are either inconsistent with domain knowledge or overly complex. These methods let us train models which can not only provide more interpretable rationales for their predictions but also generalize better when training data is confounded or meaningfully different from test data (even adversarially so).

Detecting Changes in User Preferences using Hidden Markov Models for Sequential Recommendation Tasks

Recommender systems help users find relevant items of interest based on the past preferences of those users. In many domains, however, the tastes and preferences of users change over time due to a variety of factors and recommender systems should capture these dynamics in user preferences in order to remain tuned to the most current interests of users. In this work we present a recommendation framework based on Hidden Markov Models (HMM) which takes into account the dynamics of user preferences. We propose a HMM-based approach to change point detection in the sequence of user interactions which reflect significant changes in preference according to the sequential behavior of all the users in the data. The proposed framework leverages the identified change points to generate recommendations in two ways. In one approach change points are used to create a sequence-aware non-negative matrix factorization model to generate recommendations that are aligned with the current tastes of user. In the second approach the HMM is used directly to generate recommendations taking into account the identified change points. These models are evaluated in terms of accuracy of change point detection and also the effectiveness of recommendations using a real music streaming dataset.

Resource Management in Fog/Edge Computing: A Survey

Contrary to using distant and centralized cloud data center resources, employing decentralized resources at the edge of a network for processing data closer to user devices, such as smartphones and tablets, is an upcoming computing paradigm, referred to as fog/edge computing. Fog/edge resources are typically resource-constrained, heterogeneous, and dynamic compared to the cloud, thereby making resource management an important challenge that needs to be addressed. This article reviews publications as early as 1991, with 85% of the publications between 2013-2018, to identify and classify the architectures, infrastructure, and underlying algorithms for managing resources in fog/edge computing.

Mini-batch Serialization: CNN Training with Inter-layer Data Reuse

Training convolutional neural networks (CNNs) requires intense computations and high memory bandwidth. We find that bandwidth today is over-provisioned because most memory accesses in CNN training can be eliminated by rearranging computation to better utilize on-chip buffers and avoid traffic resulting from large per-layer memory footprints. We introduce the MBS CNN training approach that significantly reduces memory traffic by partially serializing mini-batch processing across groups of layers. This optimizes reuse within on-chip buffers and balances both intra-layer and inter-layer reuse. We also introduce the WaveCore CNN training accelerator that effectively trains CNNs in the MBS approach with high functional-unit utilization. Combined, WaveCore and MBS reduce DRAM traffic by 73%, improve performance by 45%, and save 24% system energy for modern deep CNN training compared to conventional training mechanisms and accelerators.

An Overview of Blockchain Integration with Robotics and Artificial Intelligence

Blockchain technology is growing everyday at a fast-passed rhythm and it’s possible to integrate it with many systems, namely Robotics with AI services. However, this is still a recent field and there isn’t yet a clear understanding of what it could potentially become. In this paper, we conduct an overview of many different methods and platforms that try to leverage the power of blockchain into robotic systems, to improve AI services or to solve problems that are present in the major blockchains, which can lead to the ability of creating robotic systems with increased capabilities and security. We present an overview, discuss the methods and conclude the paper with our view on the future of the integration of these technologies.

Text Morphing

In this paper, we introduce a novel natural language generation task, termed as text morphing, which targets at generating the intermediate sentences that are fluency and smooth with the two input sentences. We propose the Morphing Networks consisting of the editing vector generation networks and the sentence editing networks which are trained jointly. Specifically, the editing vectors are generated with a recurrent neural networks model from the lexical gap between the source sentence and the target sentence. Then the sentence editing networks iteratively generate new sentences with the current editing vector and the sentence generated in the previous step. We conduct experiments with 10 million text morphing sequences which are extracted from the Yelp review dataset. Experiment results show that the proposed method outperforms baselines on the text morphing task. We also discuss directions and opportunities for future research of text morphing.

Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning

Learning in sparse reward settings remains a challenge in Reinforcement Learning, which is often addressed by using intrinsic rewards. One promising strategy is inspired by human curiosity, requiring the agent to learn to predict the future. In this paper a curiosity-driven agent is extended to use these predictions directly for training. To achieve this, the agent predicts the value function of the next state at any point in time. Subsequently, the consistency of this prediction with the current value function is measured, which is then used as a regularization term in the loss function of the algorithm. Experiments were made on grid-world environments as well as on a 3D navigation task, both with sparse rewards. In the first case the extended agent is able to learn significantly faster than the baselines.

On Regularization and Robustness of Deep Neural Networks

Despite their success, deep neural networks suffer from several drawbacks: they lack robustness to small changes of input data known as ‘adversarial examples’ and training them with small amounts of annotated data is challenging. In this work, we study the connection between regularization and robustness by viewing neural networks as elements of a reproducing kernel Hilbert space (RKHS) of functions and by regularizing them using the RKHS norm. Even though this norm cannot be computed, we consider various approximations based on upper and lower bounds. These approximations lead to new strategies for regularization, but also to existing ones such as spectral norm penalties or constraints, gradient penalties, or adversarial training. Besides, the kernel framework allows us to obtain margin-based bounds on adversarial generalization. We study the obtained algorithms for learning on small datasets, learning adversarially robust models, and discuss implications for learning implicit generative models.

Deep Quality-Value (DQV) Learning

We introduce a novel Deep Reinforcement Learning (DRL) algorithm called Deep Quality-Value (DQV) Learning. Similarly to Advantage-Actor-Critic methods, DQV uses a Value neural network for estimating the temporal-difference errors which are then used by a second Quality network for directly learning the state-action values. We first test DQV’s update rules with Multilayer Perceptrons as function approximators on two classic RL problems, and then extend DQV with the use of Deep Convolutional Neural Networks, `Experience Replay’ and `Target Neural Networks’ for tackling four games of the Atari Arcade Learning environment. Our results show that DQV learns significantly faster and better than Deep Q-Learning and Double Deep Q-Learning, suggesting that our algorithm can potentially be a better performing synchronous temporal difference algorithm than what is currently present in DRL.

Marrying Tracking with ELM: A Metric Constraint Guided Multiple Feature Fusion Method

Object Tracking is one important problem in computer vision and surveillance system. The existing models mainly exploit the single-view feature (i.e. color, texture, shape) to solve the problem, failing to describe the objects comprehensively. In this paper, we solve the problem from multi-view perspective by leveraging multi-view complementary and latent information, so as to be robust to the partial occlusion and background clutter especially when the objects are similar to the target, meanwhile addressing tracking drift. However, one big problem is that multi-view fusion strategy can inevitably result tracking into non-efficiency. To this end, we propose to marry ELM (Extreme learning machine) to multi-view fusion to train the global hidden output weight, to effectively exploit the local information from each view. Following this principle, we propose a novel method to obtain the optimal sample as the target object, which avoids tracking drift resulting from noisy samples. Our method is evaluated over 12 challenge image sequences challenged with different attributes including illumination, occlusion, deformation, etc., which demonstrates better performance than several state-of-the-art methods in terms of effectiveness and robustness.

Privacy-preserving Stochastic Gradual Learning

It is challenging for stochastic optimizations to handle large-scale sensitive data safely. Recently, Duchi et al. proposed private sampling strategy to solve privacy leakage in stochastic optimizations. However, this strategy leads to robustness degeneration, since this strategy is equal to the noise injection on each gradient, which adversely affects updates of the primal variable. To address this challenge, we introduce a robust stochastic optimization under the framework of local privacy, which is called Privacy-pREserving StochasTIc Gradual lEarning (PRESTIGE). PRESTIGE bridges private updates of the primal variable (by private sampling) with the gradual curriculum learning (CL). Specifically, the noise injection leads to the issue of label noise, but the robust learning process of CL can combat with label noise. Thus, PRESTIGE yields ‘private but robust’ updates of the primal variable on the private curriculum, namely an reordered label sequence provided by CL. In theory, we reveal the convergence rate and maximum complexity of PRESTIGE. Empirical results on six datasets show that, PRESTIGE achieves a good tradeoff between privacy preservation and robustness over baselines.

Extending Stan for Deep Probabilistic Programming

Deep probabilistic programming combines deep neural networks (for automatic hierarchical representation learning) with probabilistic models (for principled handling of uncertainty). Unfortunately, it is difficult to write deep probabilistic models, because existing programming frameworks lack concise, high-level, and clean ways to express them. To ease this task, we extend Stan, a popular high-level probabilistic programming language, to use deep neural networks written in PyTorch. Training deep probabilistic models works best with variational inference, so we also extend Stan for that. We implement these extensions by translating Stan programs to Pyro. Our translation clarifies the relationship between different families of probabilistic programming languages. Overall, our paper is a step towards making deep probabilistic programming easier.

Bayesian Transfer Reinforcement Learning with Prior Knowledge Rules

We propose a probabilistic framework to directly insert prior knowledge in reinforcement learning (RL) algorithms by defining the behaviour policy as a Bayesian posterior distribution. Such a posterior combines task specific information with prior knowledge, thus allowing to achieve transfer learning across tasks. The resulting method is flexible and it can be easily incorporated to any standard off-policy and on-policy algorithms, such as those based on temporal differences and policy gradients. We develop a specific instance of this Bayesian transfer RL framework by expressing prior knowledge as general deterministic rules that can be useful in a large variety of tasks, such as navigation tasks. Also, we elaborate more on recent probabilistic and entropy-regularised RL by developing a novel temporal learning algorithm and show how to combine it with Bayesian transfer RL. Finally, we demonstrate our method for solving mazes and show that significant speed ups can be obtained.

Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Neural Networks

Deep neural networks have been shown to be vulnerable to adversarial examples, perturbed inputs that are designed specifically to produce intentional errors in the learning algorithms. However, existing attacks are either computationally expensive or require extensive knowledge of the target model and its dataset to succeed. Hence, these methods are not practical in a deployed adversarial setting. In this paper we introduce an exploratory approach for generating adversarial examples using procedural noise. We show that it is possible to construct practical black-box attacks with low computational cost against robust neural network architectures such as Inception v3 and Inception ResNet v2 on the ImageNet dataset. We show that these attacks successfully cause misclassification with a low number of queries, significantly outperforming state-of-the-art black box attacks. Our attack demonstrates the fragility of these neural networks to Perlin noise, a type of procedural noise used for generating realistic textures. Perlin noise attacks achieve at least 90% top 1 error across all classifiers. More worryingly, we show that most Perlin noise perturbations are ‘universal’ in that they generalize, as adversarial examples, across large portions of the dataset, with up to 73% of images misclassified using a single perturbation. These findings suggest a systemic fragility of DNNs that needs to be explored further. We also show the limitations of adversarial training, a technique used to enhance the robustness against adversarial examples. Thus, the attacker just needs to change the perspective to generate the adversarial examples to craft successful attacks and, for the defender, it is difficult to foresee a priori all possible types of adversarial perturbations.

Identifying Bias in AI using Simulation

Machine learned models exhibit bias, often because the datasets used to train them are biased. This presents a serious problem for the deployment of such technology, as the resulting models might perform poorly on populations that are minorities within the training set and ultimately present higher risks to them. We propose to use high-fidelity computer simulations to interrogate and diagnose biases within ML classifiers. We present a framework that leverages Bayesian parameter search to efficiently characterize the high dimensional feature space and more quickly identify weakness in performance. We apply our approach to an example domain, face detection, and show that it can be used to help identify demographic biases in commercial face application programming interfaces (APIs).

Chasing Similarity: Distribution-aware Aggregation Scheduling (Extended Version)

Parallel aggregation is a ubiquitous operation in data analytics that is expressed as GROUP BY in SQL, reduce in Hadoop, or segment in TensorFlow. Parallel aggregation starts with an optional local pre-aggregation step and then repartitions the intermediate result across the network. While local pre-aggregation works well for low-cardinality aggregations, the network communication cost remains significant for high-cardinality aggregations even after local pre-aggregation. The problem is that the repartition-based algorithm for high-cardinality aggregation does not fully utilize the network. In this work, we first formulate a mathematical model that captures the performance of parallel aggregation. We prove that finding optimal aggregation plans from a known data distribution is NP-hard, assuming the Small Set Expression conjecture. We propose GRASP, a GReedy Aggregation Scheduling Protocol that decomposes parallel aggregation into phases. GRASP is distribution-aware as it aggregates the most similar partitions in each phase to reduce the transmitted data size in subsequent phases. In addition, GRASP takes the available network bandwidth into account when scheduling aggregations in each phase to maximize network utilization. The experimental evaluation on real data shows that GRASP outperforms repartition-based aggregation by 3.5x and LOOM by 2.0x.

FIRE-DES++: Enhanced Online Pruning of Base Classifiers for Dynamic Ensemble Selection

Despite being very effective in several classification tasks, Dynamic Ensemble Selection (DES) techniques can select classifiers that classify all samples in the region of competence as being from the same class. The Frienemy Indecision REgion DES (FIRE-DES) tackles this problem by pre-selecting classifiers that correctly classify at least one pair of samples from different classes in the region of competence of the test sample. However, FIRE-DES applies the pre-selection for the classification of a test sample if and only if its region of competence is composed of samples from different classes (indecision region), even though this criterion is not reliable for determining if a test sample is located close to the borders of classes (true indecision region) when the region of competence is obtained using classical nearest neighbors approach. Because of that, FIRE-DES mistakes noisy regions for true indecision regions, leading to the pre-selection of incompetent classifiers, and mistakes true indecision regions for safe regions, leaving samples in such regions without any pre-selection. To tackle these issues, we propose the FIRE-DES++, an enhanced FIRE-DES that removes noise and reduces the overlap of classes in the validation set; and defines the region of competence using an equal number of samples of each class, avoiding selecting a region of competence with samples of a single class. Experiments are conducted using FIRE-DES++ with 8 different dynamic selection techniques on 64 classification datasets. Experimental results show that FIRE-DES++ increases the classification performance of all DES techniques considered in this work, outperforming FIRE-DES with 7 out of the 8 DES techniques, and outperforming state-of-the-art DES frameworks.

Deep Factor Model

We propose to represent a return model and risk model in a unified manner with deep learning, which is a representative model that can express a nonlinear relationship. Although deep learning performs quite well, it has significant disadvantages such as a lack of transparency and limitations to the interpretability of the prediction. This is prone to practical problems in terms of accountability. Thus, we construct a multifactor model by using interpretable deep learning. We implement deep learning as a return model to predict stock returns with various factors. Then, we present the application of layer-wise relevance propagation (LRP) to decompose attributes of the predicted return as a risk model. By applying LRP to an individual stock or a portfolio basis, we can determine which factor contributes to prediction. We call this model a deep factor model. We then perform an empirical analysis on the Japanese stock market and show that our deep factor model has better predictive capability than the traditional linear model or other machine learning methods. In addition , we illustrate which factor contributes to prediction.

Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network

We present a new algorithm to train a robust neural network against adversarial attacks. Our algorithm is motivated by the following two ideas. First, although recent work has demonstrated that fusing randomness can improve the robustness of neural networks (Liu 2017), we noticed that adding noise blindly to all the layers is not the optimal way to incorporate randomness. Instead, we model randomness under the framework of Bayesian Neural Network (BNN) to formally learn the posterior distribution of models in a scalable way. Second, we formulate the mini-max problem in BNN to learn the best model distribution under adversarial attacks, leading to an adversarial-trained Bayesian neural net. Experiment results demonstrate that the proposed algorithm achieves state-of-the-art performance under strong attacks. On CIFAR-10 with VGG network, our model leads to 14\% accuracy improvement compared with adversarial training (Madry 2017) and random self-ensemble (Liu 2017) under PGD attack with 0.035 distortion, and the gap becomes even larger on a subset of ImageNet.

Laver tables and combinatorics

The Laver tables are finite combinatorial objects with a simple elementary definition, which were introduced by R. Laver from considerations of logic and set theory. Although these objects exhibit some fascinating properties, they seem to have escaped notice from the combinatorics community. My aim is to give a short introduction to this topic, presenting the definition and main properties and stating a few open problems, which should arouse the interest of combinatorialists.

Identifying Rumor Sources Using Dominant Eigenvalue of Nonbacktracking Matrix
Non-linear Attributed Graph Clustering by Symmetric NMF with PU Learning
Attention-based Encoder-Decoder Networks for Spelling and Grammatical Error Correction
Enhancing Geometric Deep Learning via Graph Filter Deconvolution
Hindi-English Code-Switching Speech Corpus
Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation
Nonlinear control for an uncertain electromagnetic actuator
A Preliminary Report on Probabilistic Attack Normal Form for Constellation Semantics
Efficient Seismic fragility curve estimation by Active Learning on Support Vector Machines
Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection
NEXUS Network: Connecting the Preceding and the Following in Dialogue Generation
A Lightweight Music Texture Transfer System
Automatic Data Expansion for Customer-care Spoken Language Understanding
On the Cauchy problem for parabolic integro-differential equations with space-dependent coefficients in generalized Hölder classes
Direct optimization of F-measure for retrieval-based personal question answering
Extremal properties of the multivariate extended skew-normal distribution
Learning Robust, Transferable Sentence Representations for Text Classification
A belief combination rule for a large number of sources
Numerical investigation of evapotranspiration processes in a forested watershed of Central Siberia
Caulking the Leakage Effect in MEEG Source Connectivity Analysis
A nonparametric approach to assess undergraduate performance
Talaia: a Real time Monitor of Social Media and Digital Press
Fluctuation-dissipation relations for stochastic gradient descent
A new approach to the Kasami codes of type 2
Single Snapshot Super-Resolution DOA Estimation for Arbitrary Array Geometries
B2BII – Data conversion from Belle to Belle II
Feedback control of parametrized PDEs via model order reduction and dynamic programming principle
Explainable Black-Box Attacks Against Model-based Authentication
Minimization of Gini impurity via connections with the k-means problem
Orthomodular lattices can be converted into left residuated l-groupoids
Out of Time Ordered Correlators and Entanglement Growth in the Random Field XX Spin Chain
Semiparametric efficient estimation of structural nested mean models with irregularly spaced observations
Adversarial Domain Adaptation for Stable Brain-Machine Interfaces
Estimation-Based Model Predictive Control for Automatic Crosswind Stabilization of Hybrid Aerial Vehicles
Sparse graphs with no polynomial-sized anticomplete pairs
Proof of the Kalai-Meshulam conjecture
Sharp Space-Time Regularity of the Solution to Stochastic Heat Equation Driven by Fractional-Colored Noise
Differentially Private Contextual Linear Bandits
PLL and Costas loop based carrier recovery circuits for 4QAM: non-linear analysis and simulation
Deep Residual Network for Off-Resonance Artifact Correction with Application to Pediatric Body Magnetic Resonance Angiography with 3D Cones
Hoffmann-Ostenhof’s conjecture for claw-free cubic graphs
Millimeter Wave Beam Training: A Survey
TS-MPC for Autonomous Vehicles including a dynamic TS-MHE-UIO
Feedback Stabilization Using Koopman Operator
Cell Grid Architecture for Maritime Route Prediction on AIS Data Streams
The Partially Observable Games We Play for Cyber Deception
Predicting Destinations by Nearest Neighbor Search on Training Vessel Routes
Temporal Cliques admit Sparse Spanners
Superimposition-guided Facial Reconstruction from Skull
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
Open-Ended Content-Style Recombination Via Leakage Filtering
DeepSSM: A Deep Learning Framework for Statistical Shape Modeling from Raw Images
A Problem of Covering the Plane by Circular Discs with a Constraint
Domain-Adversarial Multi-Task Framework for Novel Therapeutic Property Prediction of Compounds
Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data
Multilevel Optimal Transport: a Fast Approximation of Wasserstein-1 distances
Visual Object Tracking based on Adaptive Siamese and Motion Estimation Network
Discovering Interactions Using Covariate Informed Random Partition Models
On the Convergence and Robustness of Batch Normalization
Generalization and Regularization in DQN
Resilient Structural Stabilizability of Undirected Networks
Robot Vision: Calibration of Wide-Angle Lens Cameras Using Collinearity Condition and K-Nearest Neighbour Regression
Optimization of Circuits for IBM’s five-qubit Quantum Computers
A Framework to Support the Trust Process in News and Social Media
A Graph Partitioning Algorithm with Application in Synthesizing Single Flux Quantum Logic Circuits
A Simple Framework for Stability Analysis of State-Dependent Networks of Heterogeneous Agents
FusedLSTM: Fusing frame-level and video-level features for Content-based Video Relevance Prediction
Mean Field Production Output Control with Sticky Prices: Nash and Social Solutions
Modelling Errors in X-ray Fluoroscopic Imaging Systems Using Photogrammetric Bundle Adjustment With a Data-Driven Self-Calibration Approach
Resource Allocation for Secure Communications in Cooperative Cognitive Wireless Powered Communication Networks
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
Interpreting Adversarial Robustness: A View from Decision Surface in Input Space
M^3RL: Mind-aware Multi-agent Management Reinforcement Learning
Linear compactness and combinatorial bialgebras
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
Optimization of the Principal Eigenvalue for Elliptic Operators
Distributed Finite-time Least Squares Solver for Network Linear Equations
NICE: Noise Injection and Clamping Estimation for Neural Network Quantization
On the stable property of projective dimension
Refining Manually-Designed Symbol Grounding and High-Level Planning by Policy Gradients
Foggy: A Platform for Workload Orchestration in a Fog Computing Environment
Continuous Learning of Context-dependent Processing in Neural Networks
Collaborative target-tracking control using multiple autonomous fixed-wing UAVs with constant speeds: Theory and experiments
Elementary moves on lattice polytopes
Design of Intra-body Nano-communication Network for Future Nano-medicine
Wearable Posture Monitoring System with Vibration Feedback
An Empirical Investigation of Four Well-Known Polynomial-Size VRP Formulations
Time-Adaptive Unit Commitment
Non-local NetVLAD Encoding for Video Classification
To compress or not to compress: Understanding the Interactions between Adversarial Attacks and Neural Network Compression
Which country epitomizes the world? A study from the perspective of demographic composition
Why echo chambers form and network interventions fail: Selection outpaces influence in dynamic networks
Parameter Estimation for the Single-Look $\mathcal{G}^0$ Distribution
Rainbow simplices in triangulations of manifolds
Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation
A Survey of e-Biodiversity: Concepts, Practices, and Challenges
Toward single particle reconstruction without particle picking: Breaking the detection limit
A Cyber-Security Investment Game for Networked Control Systems
Deep Adversarial Training for Multi-Organ Nuclei Segmentation in Histopathology Images
Downlink Spectral Efficiency of Cell-Free Massive MIMO with Full-Pilot Zero-Forcing
Eigenvalue crossing in principal eigenvector localized networks
CAAD 2018: Generating Transferable Adversarial Examples
Changing and unchanging 2-rainbow independent domination
Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information
Computational Convergence Analysis of Distributed Gradient Descent for Smooth Convex Objective Functions
Pulsed Schlieren Imaging of Ultrasonic Haptics and Levitation using Phased Arrays
On the Cauchy problem for nondegenerate parabolic integro-differential equations in the scale of generalized Hölder spaces
A fast quasi-Newton-type method for large-scale stochastic optimisation
Hyperuniform and rigid stable matchings
On-line partitioning of width w posets into w^O(log log w) chains
Bayesian network marker selection via the thresholded graph Laplacian Gaussian prior
Wireless Powered Cooperative Relaying using NOMA with Imperfect CSI
A Note on Congruences of Infinite Bounded Involution Lattices
MultiWOZ – A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
META-DES: A Dynamic Ensemble Selection Framework using Meta-Learning
Multi-Layer Cyber-Physical Security and Resilience for Smart Grid
Nonparametric Estimation and Identification in Non-Separable Models Using Panel Data
A Sheet of Maple to Compute Second-Order Edgeworth Expansions and Related Quantities of any Function of the Mean of an iid Sample of an Absolutely Continuous Distribution
Automatic Skin Lesion Segmentation Using GrabCut in HSV Colour Space
Analysis of Limited-Memory BFGS on a Class of Nonsmooth Convex Functions
When Adaptive Diffusion Algorithm Converges to True Parameter?
On Exact and $\infty$-Rényi Common Informations
Convergence and perturbation theory for an infinite-dimensional Metropolis-Hastings algorithm with self-decomposable priors
Zero-Delay Rate Distortion via Filtering for Vector-Valued Gaussian Sources
Pruned and Structurally Sparse Neural Networks
DIMENSION: Dynamic MR Imaging with Both K-space and Spatial Prior Knowledge Obtained via Multi-Supervised Network Training
Newton-MR: Newton’s Method Without Smoothness or Convexity
Correlation Propagation Networks for Scene Text Detection
Posture recognition using an RGB-D camera : exploring 3D body modeling and deep learning approaches
Properties of Switching Jump Diffusions: Maximum Principles and Harnack Inequalities
Recurrence and Ergodicity for A Class of Regime-Switching Jump Diffusions
Finite Horizon Backward Reachability Analysis and Control Synthesis for Uncertain Nonlinear Systems
Convex Relaxation Methods for Community Detection
Sampled-Data State Observation over Lossy Networks under Round-Robin Scheduling
Modeling Uncertainty with Hedged Instance Embedding
Wearable, Epidermal, and Implantable Sensors for Medical Applications
A Deep learning framework for Single sided sound speed inversion in medical ultrasound
On the Winograd Schema Challenge: Levels of Language Understanding and the Phenomenon of the Missing Text
Using Graph-Pattern Association Rules On Yago Knowledge Base
Multi-Level Contextual Network for Biomedical Image Segmentation
Specificity measures and reference
Pixel and Feature Level Based Domain Adaption for Object Detection in Autonomous Driving
Extracting many-particle entanglement entropy from observables using supervised machine learning
Neural Entity Reasoner for Global Consistency in NER
Primal-dual path following method for nonlinear semi-infinite programs with semi-definite constraints
Improving Bag-of-Visual-Words Towards Effective Facial Expressive Image Classification
Spontaneous Facial Expression Recognition using Sparse Representation
An Incremental Iterated Response Model of Pragmatics
Using Hoare logic for quantum circuit optimization
Pseudo-Random Number Generation using Generative Adversarial Networks
Manifold Alignment with Feature Correspondence
Deep, Skinny Neural Networks are not Universal Approximators
Benchmarks of ResNet Architecture for Atrial Fibrillation Classification
Vector Quantized Spectral Clustering applied to Soybean Whole Genome Sequences
Existence of densities for multi-type CBI processes
A Configurable Transport Layer for CAF
Modelling local phase of images and textures with applications in phase denoising and phase retrieval
Quasi-Variational Inequalities in Banach Spaces: Theory and Augmented Lagrangian Methods
Accelerated PDE’s for efficient solution of regularized inversion problems
Nonparametric Regression with Selectively Missing Covariates
Distributed linear regression by averaging
Optical Illusions Images Dataset
Light dual multinets of order six in the projective plane
Dynamic behaviour of a ring coupled boost converter system with passivity-based control
Carr-Nadtochiy’s Weak Reflection Principle for Markov Chains on $\mathbf{Z}^d$
Robust Look-ahead Three-phase Balancing of Uncertain Distribution Loads
Efficient Sequence Labeling with Actor-Critic Training
Lyapunov exponent, universality and phase transition for products of random matrices
CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video
Zero-training Sentence Embedding via Orthogonal Basis
Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
An Application of ASP Theories of Intentions to Understanding Restaurant Scenarios: Insights and Narrative Corpus
Online Resource Allocation under Partially Predictable Demand
Mean Field Control for Efficient Mixing of Energy Loads
AgriColMap: Aerial-Ground Collaborative 3D Mapping for Precision Farming
3D-PSRNet: Part Segmented 3D Point Cloud Reconstruction From a Single Image
Interactive Learning with Corrective Feedback for Policies based on Deep Neural Networks
A central limit theorem for almost local additive tree functionals
Optimization of Bit Mapping and Quantized Decoding for Off-the-Shelf Protograph LDPC Codes with Application to IEEE 802.3ca
Automatic Evaluation of Neural Personality-based Chatbots
Deep Learning for End-to-End Atrial Fibrillation Recurrence Estimation
Two new results about quantum exact learning
Few-Shot Goal Inference for Visuomotor Learning and Planning
Eigenvector Delocalization for Non-Hermitian Random Matrices and Applications
An Empirical Evaluation of Time-Aware LSTM Autoencoder on Chronic Kidney Disease
Decentralized Schemes with Overlap for Solving Graph-Structured Optimization Problems
Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering
Hybrid Noise Removal in Hyperspectral Imagery With a Spatial-Spectral Gradient Network
One Network to Solve All ROIs: Deep Learning CT for Any ROI using Differentiated Backprojection
Simple Algorithms for Learning from Random Counterexamples
Interactive Agent Modeling by Learning to Probe
On the Observability Inequality of Coupled Wave Equations: the Case without Boundary
The $\log\log$ growth of channel capacity for nondispersive nonlinear optical fiber channel in intermediate power range. Extension of the model
Investigating Spatial Error Structures in Continuous Raster Data
Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks
A Simple Machine Learning Method for Commonsense Reasoning? A Short Commentary on Trinh & Le (2018)
End-To-End Alzheimer’s Disease Diagnosis and Biomarker Identification
Safe Adaptive Switching among Dynamical Movement Primitives: Application to 3D Limit-Cycle Walkers
Minimum-Link Rectilinear Covering Tour is NP-hard in $R^{4}$
Learnable Pooling Methods for Video Classification
How Secure are Multicarrier Communication Systems Against Signal Exploitation Attacks?
Classifying the near-equality of ribbon Schur functions
Generative Adversarial Network for Medical Images (MI-GAN)
Power and Level Robustness of A Composite Hypothesis Testing under Independent Non-Homogeneous Data