IL-Net: Using Expert Knowledge to Guide the Design of Furcated Neural Networks

Deep neural networks (DNN) excel at extracting patterns. Through representation learning and automated feature engineering on large datasets, such models have been highly successful in computer vision and natural language applications. Designing optimal network architectures from a principled or rational approach however has been less than successful, with the best successful approaches utilizing an additional machine learning algorithm to tune the network hyperparameters. However, in many technical fields, there exist established domain knowledge and understanding about the subject matter. In this work, we develop a novel furcated neural network architecture that utilizes domain knowledge as high-level design principles of the network. We demonstrate proof-of-concept by developing IL-Net, a furcated network for predicting the properties of ionic liquids, which is a class of complex multi-chemicals entities. Compared to existing state-of-the-art approaches, we show that furcated networks can improve model accuracy by approximately 20-35%, without using additional labeled data. Lastly, we distill two key design principles for furcated networks that can be adapted to other domains.

Explainable time series tweaking via irreversible and reversible temporal transformations

Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP-hard, and focus on two instantiations of the problem, which we refer to as reversible and irreversible time series tweaking. The classifier under investigation is the random shapelet forest classifier. Moreover, we propose two algorithmic solutions for the two problems along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.

Do Your Cores Play Nicely? A Portable Framework for Multi-core Interference Tuning and Analysis

Multi-core architectures can be leveraged to allow independent processes to run in parallel. However, due to resources shared across cores, such as caches, distinct processes may interfere with one another, e.g. affecting execution time. Analysing the extent of this interference is difficult due to: (1) the diversity of modern architectures, which may contain different implementations of shared resources, and (2) the complex nature of modern processors, in which interference might arise due to subtle interactions. To address this, we propose a black-box auto-tuning approach that searches for processes that are effective at causing slowdowns for a program when executed in parallel. Such slowdowns provide lower bounds on worst-case execution time; an important metric in systems with real-time constraints. Our approach considers a set of parameterised ‘enemy’ processes and ‘victim’ programs, each targeting a shared resource. The autotuner searches for enemy process parameters that are effective at causing slowdowns in the victim programs. The idea is that victim programs behave as a proxy for shared resource usage of arbitrary programs. We evaluate our approach on: 5 different chips; 3 resources (cache, memory bus, and main memory); and consider several search strategies and slowdown metrics. Using enemy processes tuned per chip, we evaluate the slowdowns on the autobench and coremark benchmark suites and show that our method is able to achieve slowdowns in 98% of benchmark/chip combinations and provide similar results to manually written enemy processes.

Graph Pattern Mining and Learning through User-defined Relations (Extended Version)

In this work we propose R-GPM, a parallel computing framework for graph pattern mining (GPM) through a user-defined subgraph relation. More specifically, we enable the computation of statistics of patterns through their subgraph classes, generalizing traditional GPM methods. R-GPM provides efficient estimators for these statistics by employing a MCMC sampling algorithm combined with several optimizations. We provide both theoretical guarantees and empirical evaluations of our estimators in application scenarios such as stochastic optimization of deep high-order graph neural network models and pattern (motif) counting. We also propose and evaluate optimizations that enable improvements of our estimators accuracy, while reducing their computational costs in up to 3-orders-of-magnitude. Finally,we show that R-GPM is scalable, providing near-linear speedups on 44 cores in all of our tests.

Real-Time Nonparametric Anomaly Detection in High-Dimensional Settings

Timely and reliable detection of abrupt anomalies, e.g., faults, intrusions/attacks, is crucial for real-time monitoring and security of many modern systems such as the smart grid and the Internet of Things (IoT) networks that produce high-dimensional data. With this goal, we propose effective and scalable algorithms for real-time anomaly detection in high-dimensional settings. Our proposed algorithms are nonparametric (model-free) as both the nominal and anomalous multivariate data distributions are assumed to be unknown. We extract useful univariate summary statistics and perform the anomaly detection task in a single-dimensional space. We model anomalies as persistent outliers and propose to detect them via a cumulative sum (CUSUM)-like algorithm. In case the observed data stream has a low intrinsic dimensionality, we find a low-dimensional submanifold in which the nominal data are embedded and then evaluate whether the sequentially acquired data persistently deviate from the nominal submanifold. Further, in the general case, we determine an acceptance region for nominal data via the Geometric Entropy Minimization (GEM) method and then evaluate whether the sequentially observed data persistently fall outside the acceptance region. We provide an asymptotic lower bound on the average false alarm period of the proposed CUSUM-like algorithm. Moreover, we provide a sufficient condition to asymptotically guarantee that the decision statistic of the proposed algorithm does not diverge in the absence of anomalies. Numerical studies illustrate the effectiveness of the proposed schemes in quick and accurate detection of changes/anomalies in a variety of high-dimensional settings.

SQL-to-Text Generation with Graph-to-Sequence Model

Previous work approaches the SQL-to-text generation task using vanilla Seq2Seq models, which may not fully capture the inherent graph-structured information in SQL query. In this paper, we first introduce a strategy to represent the SQL query as a directed graph and then employ a graph-to-sequence model to encode the global structure information into node embeddings. This model can effectively learn the correlation between the SQL query pattern and its interpretation. Experimental results on the WikiSQL dataset and Stackoverflow dataset show that our model significantly outperforms the Seq2Seq and Tree2Seq baselines, achieving the state-of-the-art performance.

Online Cyber-Attack Detection in Smart Grid: A Reinforcement Learning Approach

Early detection of cyber-attacks is crucial for a safe and reliable operation of the smart grid. In the literature, outlier detection schemes making sample-by-sample decisions and online detection schemes requiring perfect attack models have been proposed. In this paper, we formulate the online attack/anomaly detection problem as a partially observable Markov decision process (POMDP) problem and propose a universal robust online detection algorithm using the framework of model-free reinforcement learning (RL) for POMDPs. Numerical studies illustrate the effectiveness of the proposed RL-based algorithm in timely and accurate detection of cyber-attacks targeting the smart grid.

Random Warping Series: A Random Features Method for Time-Series Embedding

Time series data analytics has been a problem of substantial interests for decades, and Dynamic Time Warping (DTW) has been the most widely adopted technique to measure dissimilarity between time series. A number of global-alignment kernels have since been proposed in the spirit of DTW to extend its use to kernel-based estimation method such as support vector machine. However, those kernels suffer from diagonal dominance of the Gram matrix and a quadratic complexity w.r.t. the sample size. In this work, we study a family of alignment-aware positive definite (p.d.) kernels, with its feature embedding given by a distribution of \emph{Random Warping Series (RWS)}. The proposed kernel does not suffer from the issue of diagonal dominance while naturally enjoys a \emph{Random Features} (RF) approximation, which reduces the computational complexity of existing DTW-based techniques from quadratic to linear in terms of both the number and the length of time-series. We also study the convergence of the RF approximation for the domain of time series of unbounded length. Our extensive experiments on 16 benchmark datasets demonstrate that RWS outperforms or matches state-of-the-art classification and clustering methods in both accuracy and computational time. Our code and data is available at { \url{https://…/RandomWarpingSeries}}.

A linear algorithm for reliable predictive network control

This paper introduces a novel control approach for network scheduling and routing that is predictive and reliable in its nature, yet builds upon a linear program, making it fast in execution. First, we describe the canonical system model and how we expand it to be able to predict the success of transmissions. Furthermore, we define a notion of reliability and then explain the algorithm. With extended simulations, we demonstrate the gains in performance over the well known MaxWeight policy.

Distributed MPC with Prediction of Time-Varying Communication Delay

The novel idea presented in this paper is to interweave distributed model predictive control with a reliable scheduling of the information that is interchanged between local controllers of the plant subsystems. To this end, a dynamic model of the communication network and a predictive scheduling algorithm are proposed, the latter providing predictions of the delay between sending and receiving information. These predictions can be used by the local subsystem controllers to improve their control performance, as exemplary shown for a platooning example.

Marginal Structural Models for Time-varying Endogenous Treatments: A Time-Varying Instrumental Variable Approach

Robins (1998) introduced marginal structural models (MSMs), a general class of counterfactual models for the joint effects of time-varying treatment regimes in complex longitudinal studies subject to time-varying confounding. He established identification of MSM parameters under a sequential randomization assumption (SRA), which essentially rules out unmeasured confounding of treatment assignment over time. In this technical report, we consider sufficient conditions for identification of MSM parameters with the aid of a time-varying instrumental variable, when sequential randomization fails to hold due to unmeasured confounding. Our identification conditions essentially require that no unobserved confounder predicts compliance type for the time-varying treatment, the longitudinal generalization of the identifying condition of Wang and Tchetgen Tchetgen (2018). Under this assumption, We derive a large class of semiparametric estimators that extends standard inverse-probability weighting (IPW), the most popular approach for estimating MSMs under SRA, by incorporating the time-varying IV through a modified set of weights. The set of influence functions for MSM parameters is derived under a semiparametric model with sole restriction on observed data distribution given by the MSM, and is shown to provide a rich class of multiply robust estimators, including a local semiparametric efficient estimator.

Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms

The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.

Hardware-Aware Machine Learning: Modeling and Optimization

Recent breakthroughs in Deep Learning (DL) applications have made DL models a key component in almost every modern computing system. The increased popularity of DL applications deployed on a wide-spectrum of platforms have resulted in a plethora of design challenges related to the constraints introduced by the hardware itself. What is the latency or energy cost for an inference made by a Deep Neural Network (DNN)? Is it possible to predict this latency or energy consumption before a model is trained? If yes, how can machine learners take advantage of these models to design the hardware-optimal DNN for deployment? From lengthening battery life of mobile devices to reducing the runtime requirements of DL models executing in the cloud, the answers to these questions have drawn significant attention. One cannot optimize what isn’t properly modeled. Therefore, it is important to understand the hardware efficiency of DL models during serving for making an inference, before even training the model. This key observation has motivated the use of predictive models to capture the hardware performance or energy efficiency of DL applications. Furthermore, DL practitioners are challenged with the task of designing the DNN model, i.e., of tuning the hyper-parameters of the DNN architecture, while optimizing for both accuracy of the DL model and its hardware efficiency. Therefore, state-of-the-art methodologies have proposed hardware-aware hyper-parameter optimization techniques. In this paper, we provide a comprehensive assessment of state-of-the-art work and selected results on the hardware-aware modeling and optimization for DL applications. We also highlight several open questions that are poised to give rise to novel hardware-aware designs in the next few years, as DL applications continue to significantly impact associated hardware systems and platforms.

Negative Update Intervals in Deep Multi-Agent Reinforcement Learning

In Multi-Agent Reinforcement Learning, independent cooperative learners must overcome a number of pathologies in order to learn optimal joint policies. These pathologies include action-shadowing, stochasticity, the moving target and alter-exploration problems (Matignon, Laurent, and Le Fort-Piat 2012; Wei and Luke 2016). Numerous methods have been proposed to address these pathologies, but evaluations are predominately conducted in repeated strategic-form games and stochastic games consisting of only a small number of state transitions. This raises the question of the scalability of the methods to complex, temporally extended, partially observable domains with stochastic transitions and rewards. In this paper we study such complex settings, which require reasoning over long time horizons and confront agents with the curse of dimensionality. To deal with the dimensionality, we adopt a Multi-Agent Deep Reinforcement Learning (MA-DRL) approach. We find that when the agents have to make critical decisions in seclusion, existing methods succumb to a combination of relative overgeneralisation (a type of action shadowing), the alter-exploration problem, and the stochasticity. To address these pathologies we introduce expanding negative update intervals that enable independent learners to establish the near-optimal average utility values for higher-level strategies while largely discarding transitions from episodes that result in mis-coordination. We evaluate Negative Update Intervals Double-DQN (NUI-DDQN) within a temporally extended Climb Game, a normal form game which has frequently been used to study relative over-generalisation and other pathologies. We show that NUI-DDQN can converge towards optimal joint-policies in deterministic and stochastic reward settings, overcoming relative-overgeneralisation and the alter-exploration problem while mitigating the moving target problem.

Choosing to Rank

Ranking data arises in a wide variety of application areas, generated both by complex algorithms and by human subjects, but remains difficult to model, learn from, and predict. Particularly when generated by humans, ranking datasets often feature multiple modes, intransitive aggregate preferences, or incomplete rankings, but popular probabilistic models such as the Plackett-Luce and Mallows models are too rigid to capture such complexities. In this work, we frame ranking as a sequence of discrete choices and then leverage recent advances in discrete choice modeling to build flexible and tractable models of ranking data. The basic building block of our connection between ranking and choice is the idea of repeated selection, first used to build the Plackett-Luce ranking model from the multinomial logit (MNL) choice model by repeatedly applying the choice model to a dwindling set of alternatives. We derive conditions under which repeated selection can be applied to other choice models to build new ranking models, addressing specific subtleties with modeling mixed-length top-k rankings as repeated selection. We translate several choice axioms through our framework, providing structure to our ranking models inherited from the underlying choice models. To train models from data, we transform ranking data into choice data and employ standard techniques for training choice models. We find that our ranking models provide higher out-of-sample likelihood when compared to Plackett-Luce and Mallows models on a broad collection of ranking tasks including food preferences, ranked-choice elections, car racing, and search engine relevance ranking data.

Defensive Dropout for Hardening Deep Neural Networks under Adversarial Attacks

Deep neural networks (DNNs) are known vulnerable to adversarial attacks. That is, adversarial examples, obtained by adding delicately crafted distortions onto original legal inputs, can mislead a DNN to classify them as any target labels. This work provides a solution to hardening DNNs under adversarial attacks through defensive dropout. Besides using dropout during training for the best test accuracy, we propose to use dropout also at test time to achieve strong defense effects. We consider the problem of building robust DNNs as an attacker-defender two-player game, where the attacker and the defender know each others’ strategies and try to optimize their own strategies towards an equilibrium. Based on the observations of the effect of test dropout rate on test accuracy and attack success rate, we propose a defensive dropout algorithm to determine an optimal test dropout rate given the neural network model and the attacker’s strategy for generating adversarial examples.We also investigate the mechanism behind the outstanding defense effects achieved by the proposed defensive dropout. Comparing with stochastic activation pruning (SAP), another defense method through introducing randomness into the DNN model, we find that our defensive dropout achieves much larger variances of the gradients, which is the key for the improved defense effects (much lower attack success rate). For example, our defensive dropout can reduce the attack success rate from 100% to 13.89% under the currently strongest attack i.e., C&W attack on MNIST dataset.

CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning

We propose CM3, a new deep reinforcement learning method for cooperative multi-agent problems where agents must coordinate for joint success in achieving different individual goals. We restructure multi-agent learning into a two-stage curriculum, consisting of a single-agent stage for learning to accomplish individual tasks, followed by a multi-agent stage for learning to cooperate in the presence of other agents. These two stages are bridged by modular augmentation of neural network policy and value functions. We further adapt the actor-critic framework to this curriculum by formulating local and global views of the policy gradient and learning via a double critic, consisting of a decentralized value function and a centralized action-value function. We evaluated CM3 on a new high-dimensional multi-agent environment with sparse rewards: negotiating lane changes among multiple autonomous vehicles in the Simulation of Urban Mobility (SUMO) traffic simulator. Detailed ablation experiments show the positive contribution of each component in CM3, and the overall synthesis converges significantly faster to higher performance policies than existing cooperative multi-agent methods.

Interpreting search result rankings through intent modeling

Given the recent interest in arguably accurate yet non-interpretable neural models, even with textual features, for document ranking we try to answer questions relating to how to interpret rankings. In this paper we take first steps towards a framework for the interpretability of retrieval models with the aim of answering 3 main questions ‘What is the intent of the query according to the ranker?’, ‘Why is a document ranked higher than another for the query?’ and ‘Why is a document relevant to the query?’ Our framework is predicated on the assumption that text based retrieval model behavior can be estimated using query expansions in conjunction with a simpler retrieval model irrespective of the underlying ranker. We conducted experiments with the Clueweb test collection. We show how our approach performs for both simpler models with a closed form notation (which allows us to measure the accuracy of the interpretation) and neural ranking models. Our results indicate that we can indeed interpret more complex models with reasonable accuracy under certain simplifying assumptions. In a case study we also show our framework can be employed to interpret the results of the DRMM neural retrieval model in various scenarios.

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic performance as model-free methods. We propose Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dynamics models. Using an ensemble of learned dynamic models, MB-MPO meta-learns a policy that can quickly adapt to any model in the ensemble with one policy gradient step. This steers the meta-policy towards internalizing consistent dynamics predictions among the ensemble while shifting the burden of behaving optimally w.r.t. the model discrepancies towards the adaptation step. Our experiments show that MB-MPO is more robust to model imperfections than previous model-based approaches. Finally, we demonstrate that our approach is able to match the asymptotic performance of model-free methods while requiring significantly less experience.

Network Recasting: A Universal Method for Network Architecture Transformation

This paper proposes network recasting as a general method for network architecture transformation. The primary goal of this method is to accelerate the inference process through the transformation, but there can be many other practical applications. The method is based on block-wise recasting; it recasts each source block in a pre-trained teacher network to a target block in a student network. For the recasting, a target block is trained such that its output activation approximates that of the source block. Such a block-by-block recasting in a sequential manner transforms the network architecture while preserving the accuracy. This method can be used to transform an arbitrary teacher network type to an arbitrary student network type. It can even generate a mixed-architecture network that consists of two or more types of block. The network recasting can generate a network with fewer parameters and/or activations, which reduce the inference time significantly. Naturally, it can be used for network compression by recasting a trained network into a smaller network of the same type. Our experiments show that it outperforms previous compression approaches in terms of actual speedup on a GPU.

Supervised Machine Learning for Extractive Query Based Summarisation of Biomedical Data

The automation of text summarisation of biomedical publications is a pressing need due to the plethora of information available on-line. This paper explores the impact of several supervised machine learning approaches for extracting multi-document summaries for given queries. In particular, we compare classification and regression approaches for query-based extractive summarisation using data provided by the BioASQ Challenge. We tackled the problem of annotating sentences for training classification systems and show that a simple annotation approach outperforms regression-based summarisation.

Dueling Bandits with Qualitative Feedback

We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) problem where the duel is carried out by comparing the qualitative feedback. Although we can naively use classic DB algorithms for solving the QDB problem, this reduction significantly worsens the performance—actually, in the QDB problem, the probability that one arm wins the duel over another arm can be directly estimated without carrying out actual duels. In this paper, we propose such direct algorithms for the QDB problem. Our theoretical analysis shows that the proposed algorithms significantly outperform DB algorithms by incorporating the qualitative feedback, and experimental results also demonstrate vast improvement over the existing DB algorithms.

Learning to Fingerprint the Latent Structure in Question Articulation

Abstract Machine understanding of questions is tightly related to recognition of articulation in the context of the computational capabilities of an underlying processing algorithm. In this paper a mathematical model to capture and distinguish the latent structure in the articulation of questions is presented. We propose an objective-driven approach to represent this latent structure and show that such an approach is beneficial when examples of complementary objectives are not available. We show that the latent structure can be represented as a system that maximizes a cost function related to the underlying objective. Further, we show that the optimization formulation can be approximated to building a memory of patterns represented as a trained neural auto-encoder. Experimental evaluation using many clusters of questions, each related to an objective, shows 80% recognition accuracy and negligible false positive across these clusters of questions. We then extend the same memory to a related task where the goal is to iteratively refine a dataset of questions based on the latent articulation. We also demonstrate a refinement scheme called K-fingerprints, that achieves nearly 100% recognition with negligible false positive across the different clusters of questions.

Approximate Query Processing over Static Sets and Sliding Windows

Indexing of static and dynamic sets is fundamental to a large set of applications such as information retrieval and caching. Denoting the characteristic vector of the set by B, we consider the problem of encoding sets and multisets to support approximate versions of the operations rank(i) (i.e., computing sum_{j <= i}B[j]) and select(i) (i.e., finding min{p | rank(p) >= i}) queries. We study multiple types of approximations (allowing an error in the query or the result) and present lower bounds and succinct data structures for several variants of the problem. We also extend our model to sliding windows, in which we process a stream of elements and compute suffix sums. This is a generalization of the window summation problem that allows the user to specify the window size at query time. Here, we provide an algorithm that supports updates and queries in constant time while requiring just (1+o(1)) factor more space than the fixed-window summation algorithms.

Are screening methods useful in feature selection? An empirical study

Filter or screening methods are often used as a preprocessing step for reducing the number of variables used by a learning algorithm in obtaining a classification or regression model. While there are many such filter methods, there is a need for an objective evaluation of these methods. Such an evaluation is needed to compare them with each other and also to answer whether they are at all useful, or a learning algorithm could do a better job without them. For this purpose, many popular screening methods are partnered in this paper with three regression learners and five classification learners and evaluated on ten real datasets to obtain accuracy criteria such as R-square and area under the ROC curve (AUC). The obtained results are compared through curve plots and comparison tables in order to find out whether screening methods help improve the performance of learning algorithms and how they fare with each other. Our findings revealed that the screening methods were only useful in one regression and three classification datasets out of the ten datasets evaluated.

Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning

Fine tuning distributed systems is considered to be a craftsmanship, relying on intuition and experience. This becomes even more challenging when the systems need to react in near real time, as streaming engines have to do to maintain pre-agreed service quality metrics. In this article, we present an automated approach that builds on a combination of supervised and reinforcement learning methods to recommend the most appropriate lever configurations based on previous load. With this, streaming engines can be automatically tuned without requiring a human to determine the right way and proper time to deploy them. This opens the door to new configurations that are not being applied today since the complexity of managing these systems has surpassed the abilities of human experts. We show how reinforcement learning systems can find substantially better configurations in less time than their human counterparts and adapt to changing workloads.

Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization

Creating impact in real-world settings requires artificial intelligence techniques to span the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision. However, the loss function used to train the model may easily be misaligned with the end goal, which is to make the best decisions possible. Hand-tuning the loss function to align with optimization is a difficult and error-prone process (which is often skipped entirely). We focus on combinatorial optimization problems and introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce high-quality decisions. Technically, our contribution is a means of integrating discrete optimization problems into deep learning or other predictive models, which are typically trained via gradient descent. The main idea is to use a continuous relaxation of the discrete problem to propagate gradients through the optimization procedure. We instantiate this framework for two broad classes of combinatorial problems: linear programs and submodular maximization. Experimental results across a variety of domains show that decision-focused learning often leads to improved optimization performance compared to traditional methods. We find that standard measures of accuracy are not a reliable proxy for a predictive model’s utility in optimization, and our method’s ability to specify the true goal as the model’s training objective yields substantial dividends across a range of decision problems.

Extending Neural Generative Conversational Model using External Knowledge Sources

The use of connectionist approaches in conversational agents has been progressing rapidly due to the availability of large corpora. However current generative dialogue models often lack coherence and are content poor. This work proposes an architecture to incorporate unstructured knowledge sources to enhance the next utterance prediction in chit-chat type of generative dialogue models. We focus on Sequence-to-Sequence (Seq2Seq) conversational agents trained with the Reddit News dataset, and consider incorporating external knowledge from Wikipedia summaries as well as from the NELL knowledge base. Our experiments show faster training time and improved perplexity when leveraging external knowledge.

An FPGA Implementation of a Time Delay Reservoir Using Stochastic Logic
Probabilistic Proofs of Some Generalized Mertens’ Formulas Via Generalized Dickman Distributions
3-colored asymmetric bipartite Ramsey number of connected matchings and cycles
A New Secure Network Architecture to Increase Security Among Virtual Machines in Cloud Computing
Sidon sets and 2-caps in $\mathbb{F}_3^n$
Towards Secure Infrastructure-based Cooperative Adaptive Cruise Control
Enhanced Network Embeddings via Exploiting Edge Labels
Approximation of A Class of Non-Zero-Sum Investment and Reinsurance Games for Regime-Switching Jump-Diffusion Models
Balance in signed networks
A Deep Learning and Gamification Approach to Energy Conservation at Nanyang Technological University
Laplace Inference for Multi-fidelity Gaussian Process Classification
Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber-Physical Systems
Deep Reinforcement Learning for Event-Triggered Control
Quantum Information Processing and Composite Quantum Fields
On the Strength of Character Language Models for Multilingual Named Entity Recognition
An Incentive Mechanism for Crowd Sensing with Colluding Agents
Deterministic Inequalities for Smooth M-estimators
Distinguishing Between Roles of Football Players in Play-by-play Match Event Data
Real-Time Model Predictive Control for Energy Management in Autonomous Underwater Vehicle
Spin-current driven spontaneous coupling of ferromagnets
Periodicity in Movement Patterns Shapes Epidemic Risk in Urban Environments
Independent Sets in Algebraic Hypergraphs
A Simple Mechanism for a Budget-Constrained Buyer
A Time Series Graph Cut Image Segmentation Scheme for Liver Tumors
An efficient algorithm for sampling from $\sin^k(x)$ for generating random correlation matrices
Enhanced Optic Disk and Cup Segmentation with Glaucoma Screening from Fundus Images using Position encoded CNNs
Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation
Automatic Catchphrase Extraction from Legal Case Documents via Scoring using Deep Neural Networks
Learning L2 Continuous Regression Functionals via Regularized Riesz Representers
A Variational Observation Model of 3D Object for Probabilistic Semantic SLAM
Probabilistic Optimal Power Flow Considering Correlation of Wind Farms via Markov Chain Quasi-Monte Carlo Sampling
VoxelMorph: A Learning Framework for Deformable Medical Image Registration
Optimal Power Flow for AC/DC System Based on Cooperative Multi-objective Particle Swarm Optimization
Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder
In-Route Task Selection in Crowdsourcing
Follow Me at the Edge: Mobility-Aware Dynamic Service Placement for Mobile Edge Computing
Neural Network Topologies for Sparse Training
Random Fixed Points, Limits and Systemic risk
Distributed and Efficient Resource Balancing Among Many Suppliers and Consumers
Revisiting Random Binning Features: Fast Convergence and Strong Parallelizability
A General Framework for Bounding Approximate Dynamic Programming Schemes
CIMTDetect: A Community Infused Matrix-Tensor Coupled Factorization Based Method for Fake News Detection
New constructions of Hadamard matrices
Sharp conditions for the existence of an even $[a,b]$-factor in a graph
The degrees, number of edges, spectral radius and weakly Hamilton-connectedness of bipartite graphs
Detection-by-Localization: Maintenance-Free Change Object Detector
Macquarie University at BioASQ 6b: Deep learning and deep reinforcement learning for query-based multi-document summarisation
Variational Autoencoder with Implicit Optimal Priors
Keypoint Based Weakly Supervised Human Parsing
Deep CNN Frame Interpolation with Lessons Learned from Natural Language Processing
Dushnik-Miller dimension of d-dimensional tilings with boxes
Characterizing Variation in Crowd-Sourced Data for Training Neural Language Generators to Produce Stylistically Varied Outputs
Lyapunov Theory for Discrete Time Systems
Efficient Rank Minimization via Solving Non-convexPenalties by Iterative Shrinkage-Thresholding Algorithm
Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory
A Domain Agnostic Normalization Layer for Unsupervised Adversarial Domain Adaptation
An On-line Design of Physical Watermarks
Optimal Bayesian design for model discrimination via classification
Stochastic LQ and Associated Riccati equation of PDEs Driven by State-and Control-Dependent White Noise
On Plans With Loops and Noise
Reasoning about Discrete and Continuous Noisy Sensors and Effectors in Dynamical Systems
Spectral shape optimization for the Neumann traces of the Dirichlet-Laplacian eigenfunctions
Not all partial cubes are $Θ$-graceful
Canonical spectral representation for exchangeable max-stable sequences
Fast Iterative Combinatorial Auctions via Bayesian Learning
Adaptive Sampling Towards Fast Graph Representation Learning
TED Talk Recommender Using Speech Transcripts
Numeral Understanding in Financial Tweets for Fine-grained Crowd-based Forecasting
A First Experimental Demonstration of Analog MIMO Radio-over-Copper
Multi-Kernel Diffusion CNNs for Graph-Based Learning on Point Clouds
Non-Gibbs states on a Bose-Hubbard Lattice
Particle system approach to wealth redistribution
Convergence properties of many parallel servers under power-of-D load balancing
Style Augmentation: Data Augmentation via Style Randomization
Regularity, matchings and Cameron-Walker graphs
Index-Based Policy for Risk-Averse Multi-Armed Bandit
Concentration of the empirical spectral distribution of random matrices with dependent entries
Energy Efficient Multi-User MISO Communication using Low Resolution Large Intelligent Surfaces
SCORES: Shape Composition with Recursive Substructure Priors
An invariance principle for one-dimensional random walks among dynamical random conductances
Socially Aware Kalman Neural Networks for Trajectory Prediction
Enhanced Multiuser Superposition Transmission through Structured Modulation
Fractional coloring of planar graphs of girth five
Complexity of energy barriers in mean-field glassy systems
Totally asymmetric exclusion process with site-wise dynamic disorder
Reconfiguration of graphs with connectivity constraints
Resilient Distributed Energy Management for Systems of Interconnected Microgrids
User preferences in Bayesian multi-objective optimization: the expected weighted hypervolume improvement criterion
Bounds on the Redundancy of Huffman Codes with Known and Unknown Probabilities
Spectrum of complex networks
Real Time System for Facial Analysis
Leftover hashing from quantum error correction: Unifying the two approaches to the security proof of quantum key distribution
Multi-Modal Route Planning in Road and Transit Networks
A Multi-Stage Algorithm for Acoustic Physical Model Parameters Estimation
Blameworthiness in Strategic Games
MoSculp: Interactive Visualization of Shape and Time
Feature-specific inference for penalized regression using local false discovery rates
Elastic Registration of Geodesic Vascular Graphs
Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin
Mugeetion: Musical Interface Using Facial Gesture and Emotion
On the Choice of Instruments in Mixed Frequency Specification Tests
Feasibility and coordination of multiple mobile vehicles with mixed equality and inequality constraints
BPE and computer-extracted parenchymal enhancement for breast cancer risk, response monitoring, and prognosis
A Statistical Learning Approach to Ultra-Reliable Low Latency Communication
Identifying Quantum Phase Transitions using Artificial Neural Networks on Experimental Data
Defending Elections Against Malicious Spread of Misinformation
Deep Compressive Autoencoder for Action Potential Compression in Large-Scale Neural Recording
Robustness of Adaptive Quantum-Enhanced Phase Estimation
Secondary gradient descent in higher codimension