Multi-domain Neural Network Language Generation for Spoken Dialogue Systems

Moving from limited-domain natural language generation (NLG) to open domain is difficult because the number of semantic input combinations grows exponentially with the number of domains. Therefore, it is important to leverage existing resources and exploit similarities between domains to facilitate domain adaptation. In this paper, we propose a procedure to train multi-domain, Recurrent Neural Network-based (RNN) language generators via multiple adaptation steps. In this procedure, a model is first trained on counterfeited data synthesised from an out-of-domain dataset, and then fine tuned on a small set of in-domain utterances with a discriminative objective function. Corpus-based evaluation results show that the proposed procedure can achieve competitive performance in terms of BLEU score and slot error rate while significantly reducing the data needed to train generators in new, unseen domains. In subjective testing, human judges confirm that the procedure greatly improves generator performance when only a small amount of data is available in the domain.


Decision Forests, Convolutional Networks and the Models in-Between

This paper investigates the connections between two state of the art classifiers: decision forests (DFs, including decision jungles) and convolutional neural networks (CNNs). Decision forests are computationally efficient thanks to their conditional computation property (computation is confined to only a small region of the tree, the nodes along a single branch). CNNs achieve state of the art accuracy, thanks to their representation learning capabilities. We present a systematic analysis of how to fuse conditional computation with representation learning and achieve a continuum of hybrid models with different ratios of accuracy vs. efficiency. We call this new family of hybrid models conditional networks. Conditional networks can be thought of as: i) decision trees augmented with data transformation operators, or ii) CNNs, with block-diagonal sparse weight matrices, and explicit data routing functions. Experimental validation is performed on the common task of image classification on both the CIFAR and Imagenet datasets. Compared to state of the art CNNs, our hybrid models yield the same accuracy with a fraction of the compute cost and much smaller number of parameters.


The $K_{n+5}$ and $K_{3^2,1^n}$ families are obstructions to $n$-apex

Counter-fitting Word Vectors to Linguistic Constraints

Extremal results for random discrete structures

Uncovering Longitudinal Healthcare Utilization from Patient-Level Medical Claims Data

A simple tool for bounding the deviation of random matrices on geometric sets

Dispersion as a survival strategy

European and Asian Malliavin Monte Carlo Greeks for general Jump Diffusions with nonvanishing Brownian motion part

Doubly-nonparametric generalized linear models

Asymptotic joint distribution of the extremities of a random Young diagram and enumeration of graphical partitions

A Kernel Test for Three-Variable Interactions with Random Processes

Super Mario as a String: Platformer Level Generation Via LSTMs

A size-sensitive inequality for cross-intersecting families

$\{0,\pm1\}$-vectors and $s$-cross-intersecting families

Sparse model selection in the highly under-sampled regime

Training Input-Output Recurrent Neural Networks through Spectral Methods

Enhancing Freebase Question Answering Using Textual Evidence

Using Quadrilaterals to Compute the Shortest Path

Learning Tabletop Object Manipulation by Imitation

MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification

Tight Analysis of a Multiple-Swap Heuristic for Budgeted Red-Blue Median

Regression Analysis for Microbiome Compositional Data

Generation, Ranking and Unranking of Ordered Trees with Degree Bounds

Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Recurrent Neural Networks

Learning Real and Boolean Functions: When Is Deep Better Than Shallow

A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices

Automatic learning of gait signatures for people identification

Nonlinear functions and difference sets on group actions

Analysis of the Packet Loss Probability in Energy Harvesting Cognitive Radio Networks

Convolutional Neural Networks using Logarithmic Data Representation

A stochastic model for speciation by mating preferences

Whitening-Free Least-Squares Non-Gaussian Component Analysis

Right Ideals of a Ring and Sublanguages of Science

Fractional Clique Decompositions of Dense Partite Graphs

Pluriassociative algebras I: The pluriassociative operad

Estimating Quantile Families of Loss Distributions for Non-Life Insurance Modelling via L-moments

Clique decompositions of multipartite graphs and completion of Latin squares

Rich square-free words

Yes-no Bloom filter: A way of representing sets with fewer false positives

What we look at in paintings: A comparison between experienced and inexperienced art viewers

Modeling the Sequence of Brain Volumes by Local Mesh Models for Brain Decoding

Beta-expansion and continued fraction expansion of real numbers

Minimization of Büchi Automata using Fair Simulation

If-Conversion Optimization using Neuro Evolution of Augmenting Topologies

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Relatedness of the Incidence Decay with Exponential Adjustment (IDEA) Model, ‘Farr’s Law’ and Compartmental Difference Equation SIR Models

Multilevel Sequential Monte Carlo Samplers for Normalizing Constants

A spectral sequence for stratified spaces and configuration spaces of points

Overdispersed Black-Box Variational Inference

Spectral Kurtosis Statistics of Transient Signals

L-Kuramoto-Sivashinsky SPDEs vs. time-fractional SPIDEs: exact continuity and gradient moduli, 1/2-derivative criticality, and laws

Permutation polynomials of the form x+c*Tr(x^k)

The Domination Game: Proving the 3/5 Conjecture on Isolate-Free Forests

Network Unfolding Map by Edge Dynamics Modeling

The Orlik-Terao algebra and the cohomology of configuration space

Completing partial schedules for Open Shop with unit processing times and routing

Joint scaling limit of a bipolar-oriented triangulation and its dual in the peanosphere sense

Typical behavior of the harmonic measure in critical Galton-Watson trees with infinite variance offspring distribution

Network modularity in the presence of covariates

Remarks on Frankl’s conjecture

GeoGebra Tools with Proof Capabilities

Fractional Fick’s Law for the Boundary Driven Exclusion Process with Long Jumps

Generating asymptotics for factorially divergent sequences