PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting. Inspired by network pruning techniques, we exploit redundancies in large deep networks to free up parameters that can then be employed to learn new tasks. By performing iterative pruning and network re-training, we are able to sequentially ‘pack’ multiple tasks into a single network while ensuring minimal drop in performance and minimal storage overhead. Unlike prior work that uses proxy losses to maintain accuracy on older tasks, we always optimize for the task at hand. We perform extensive experiments on a variety of network architectures and large-scale datasets, and observe much better robustness against catastrophic forgetting than prior work. In particular, we are able to add three fine-grained classification tasks to a single ImageNet-trained VGG-16 network and achieve accuracies close to those of separately trained networks for each task.

Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models

Deep generative neural networks have proven effective at both conditional and unconditional modeling of complex data distributions. Conditional generation enables interactive control, but creating new controls often requires expensive retraining. In this paper, we develop a method to condition generation without retraining the model. By post-hoc learning latent constraints, value functions that identify regions in latent space that generate outputs with desired attributes, we can conditionally sample from these regions with gradient-based optimization or amortized actor functions. Combining attribute constraints with a universal ‘realism’ constraint, which enforces similarity to the data distribution, we generate realistic conditional images from an unconditional variational autoencoder. Further, using gradient-based optimization, we demonstrate identity-preserving transformations that make the minimal adjustment in latent space to modify the attributes of an image. Finally, with discrete sequences of musical notes, we demonstrate zero-shot conditional generation, learning latent constraints in the absence of labeled data or a differentiable reward function. Code with dedicated cloud instance has been made publicly available ( ).

Quantile Markov Decision Process

In this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of Markov Decision Processes (MDP), to which we refers as Quantile Markov Decision Processes (QMDP). Traditionally, the goal of a Markov Decision Process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly to be infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. (If we have some reference here, it would be good.) Our framework of QMDP provides analytical results characterizing the optimal QMDP solution and presents the algorithm for solving the QMDP. We provide analytical results characterizing the optimal QMDP solution and present the algorithms for solving the QMDP. We illustrate the model with two experiments: a grid game and a HIV optimal treatment experiment.

Lagrange policy gradient

Most algorithms for reinforcement learning work by estimating action-value functions. Here we present a method that uses Lagrange multipliers, the costate equation, and multilayer neural networks to compute policy gradients. We show that this method can find solutions to time-optimal control problems, driving nonlinear mechanical systems quickly to a target configuration. On these tasks its performance is comparable to that of deep deterministic policy gradient, a recent action-value method.

Zero-Shot Learning via Class-Conditioned Deep Generative Models

We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variational autoencoder (VAE), which also facilitates learning highly discriminative feature representations for the inputs. The entire framework is learned end-to-end using only the seen-class training data. The model infers corresponding attributes of a test image by maximizing the VAE lower bound; the inferred attributes may be linked to labels not seen when training. We further extend our model to a (1) semi-supervised/transductive setting by leveraging unlabeled unseen-class data via an unsupervised learning module, and (2) few-shot learning where we also have a small number of labeled inputs from the unseen classes. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of benchmark data sets.

BoostJet: Towards Combining Statistical Aggregates with Neural Embeddings for Recommendations

Recommenders have become widely popular in recent years because of their broader applicability in many e-commerce applications. These applications rely on recommenders for generating advertisements for various offers or providing content recommendations. However, the quality of the generated recommendations depends on user features (like demography, temporality), offer features (like popularity, price), and user-offer features (like implicit or explicit feedback). Current state-of-the-art recommenders do not explore such diverse features concurrently while generating the recommendations. In this paper, we first introduce the notion of Trackers which enables us to capture the above-mentioned features and thus incorporate users’ online behaviour through statistical aggregates of different features (demography, temporality, popularity, price). We also show how to capture offer-to-offer relations, based on their consumption sequence, leveraging neural embeddings for offers in our Offer2Vec algorithm. We then introduce BoostJet, a novel recommender which integrates the Trackers along with the neural embeddings using MatrixNet, an efficient distributed implementation of gradient boosted decision tree, to improve the recommendation quality significantly. We provide an in-depth evaluation of BoostJet on Yandex’s dataset, collecting online behaviour from tens of millions of online users, to demonstrate the practicality of BoostJet in terms of recommendation quality as well as scalability.

On Communication Complexity of Classification Problems

This work introduces a model of distributed learning in the spirit of Yao’s communication complexity model. We consider a two-party setting, where each of the players gets a list of labelled examplesand they communicate in order to jointly perform some learning task. To naturally fit into the framework of learning theory, we allow the players to send each other labelled examples, where each example costs one unit of communication. This model can also be thought of as a distributed version of sample compression schemes. We study several fundamental questions in this model. For example, we define the analogues of the complexity classes P, NP and coNP, and show that in this model P equals the intersection of NP and coNP. The proof does not seem to follow from the analogous statement in classical communication complexity; in particular, our proof uses different techniques, including boosting and metric properties of VC classes. This framework allows to prove, in the context of distributed learning, unconditional separations between various learning contexts, like realizable versus agnostic learning, and proper versus improper learning. The proofs here are based on standard ideas from communication complexity as well as learning theory and geometric constructions in Euclidean space. As a corollary, we also obtain lower bounds that match the performance of algorithms from previous works on distributed classification.

Using Noisy Extractions to Discover Causal Knowledge

Knowledge bases (KB) constructed through information extraction from text play an important role in query answering and reasoning. In this work, we study a particular reasoning task, the problem of discovering causal relationships between entities, known as causal discovery. There are two contrasting types of approaches to discovering causal knowledge. One approach attempts to identify causal relationships from text using automatic extraction techniques, while the other approach infers causation from observational data. However, extractions alone are often insufficient to capture complex patterns and full observational data is expensive to obtain. We introduce a probabilistic method for fusing noisy extractions with observational data to discover causal knowledge. We propose a principled approach that uses the probabilistic soft logic (PSL) framework to encode well-studied constraints to recover long-range patterns and consistent predictions, while cheaply acquired extractions provide a proxy for unseen observations. We apply our method gene regulatory networks and show the promise of exploiting KB signals in causal discovery, suggesting a critical, new area of research.

How Generative Adversarial Nets and its variants Work: An Overview of GAN

Generative Adversarial Networks gets wide attention in machine learning field because of its massive potential to learn high dimensional, complex real data. Specifically, it does not need to do further distribution assumption and can simply infer real-like samples from latent space. This powerful property leads GAN to be applied various applications such as image synthesis, image attribute editing and semantically decomposing of image. In this review paper, we look into details of GAN that firstly show how it operates and fundamental meaning of objective functions and point to GAN variants applied to vast amount of tasks.

Learning to Compare: Relation Network for Few-Shot Learning

We present a conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each. Our method, called the Relation Network (RN), is trained end-to-end from scratch. During meta-learning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting. Once trained, a RN is able to classify images of new classes by computing relation scores between query images and the few examples of each new class without further updating the network. Besides providing improved performance on few-shot learning, our framework is easily extended to zero-shot learning. Extensive experiments on four datasets demonstrate that our simple approach provides a unified and effective approach for both of these two tasks.

Deep Matching Autoencoders

Increasingly many real world tasks involve data in multiple modalities or views. This has motivated the development of many effective algorithms for learning a common latent space to relate multiple domains. However, most existing cross-view learning algorithms assume access to paired data for training. Their applicability is thus limited as the paired data assumption is often violated in practice: many tasks have only a small subset of data available with pairing annotation, or even no paired data at all. In this paper we introduce Deep Matching Autoencoders (DMAE), which learn a common latent space and pairing from unpaired multi-modal data. Specifically we formulate this as a cross-domain representation learning and object matching problem. We simultaneously optimise parameters of representation learning auto-encoders and the pairing of unpaired multi-modal data. This framework elegantly spans the full regime from fully supervised, semi-supervised, and unsupervised (no paired data) multi-modal learning. We show promising results in image captioning, and on a new task that is uniquely enabled by our methodology: unsupervised classifier learning.

An Encoder-Decoder Framework Translating Natural Language to Database Queries

Machine translation is going through a radical revolution, driven by the explosive development of deep learning techniques using Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). In this paper, we consider a special case in machine translation problems, targeting to translate natural language into Structural Query Language (SQL) for data retrieval over relational database. Although generic CNN and RNN learn the grammar structure of SQL when trained with sufficient samples, the accuracy and training efficiency of the model could be dramatically improved, when the translation model is deeply integrated with the grammar rules of SQL. We present a new encoder-decoder framework, with a suite of new approaches, including new semantic features fed into the encoder as well as new grammar-aware states injected into the memory of decoder. These techniques help the neural network focus on understanding semantics of the operations in natural language and save the efforts on SQL grammar learning. The empirical evaluation on real world database and queries show that our approach outperform state-of-the-art solution by a significant margin.

A unified view of gradient-based attribution methods for Deep Neural Networks

Understanding the flow of information in Deep Neural Networks is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, only few attempts to analyze them from a theoretical perspective have been made in the past. In this work we analyze various state-of-the-art attribution methods and prove unexplored connections between them. We also show how some methods can be reformulated and more conveniently implemented. Finally, we perform an empirical evaluation with six attribution methods on a variety of tasks and architectures and discuss their strengths and limitations.

An Abstractive approach to Question Answering

Question Answering has come a long way from answer sentence selection, relational QA to reading and comprehension. We move our attention to abstractive question answering by which we facilitate machine to read passages and answer questions by generating them. We frame the problem as a sequence to sequence learning where the encoder being a network that models the relation between question and passage, thereby relying solely on passage and question content to form an abstraction of the answer. Not being able to retain facts and making repetitions are common mistakes that affect the overall legibility of answers. To counter these issues, we employ copying mechanism and maintenance of coverage vector in our model respectively. Our results on MS-MARCO demonstrates it’s superiority over baselines and we also show qualitative examples where we improved in terms of correctness and readability.

LDMNet: Low Dimensional Manifold Regularized Neural Networks

Deep neural networks have proved very successful on archetypal tasks for which large training sets are available, but when the training data are scarce, their performance suffers from overfitting. Many existing methods of reducing overfitting are data-independent, and their efficacy is often limited when the training set is very small. Data-dependent regularizations are mostly motivated by the observation that data of interest lie close to a manifold, which is typically hard to parametrize explicitly and often requires human input of tangent vectors. These methods typically only focus on the geometry of the input data, and do not necessarily encourage the networks to produce geometrically meaningful features. To resolve this, we propose a new framework, the Low-Dimensional-Manifold-regularized neural Network (LDMNet), which incorporates a feature regularization method that focuses on the geometry of both the input data and the output features. In LDMNet, we regularize the network by encouraging the combination of the input data and the output features to sample a collection of low dimensional manifolds, which are searched efficiently without explicit parametrization. To achieve this, we directly use the manifold dimension as a regularization term in a variational functional. The resulting Euler-Lagrange equation is a Laplace-Beltrami equation over a point cloud, which is solved by the point integral method without increasing the computational complexity. We demonstrate two benefits of LDMNet in the experiments. First, we show that LDMNet significantly outperforms widely-used network regularizers such as weight decay and DropOut. Second, we show that LDMNet can be designed to extract common features of an object imaged via different modalities, which proves to be very useful in real-world applications such as cross-spectral face recognition.

A New Method for Performance Analysis in Nonlinear Dimensionality Reduction

In this paper, we develop a local rank correlation measure which quantifies the performance of dimension reduction methods. The local rank correlation is easily interpretable, and robust against the extreme skewness of nearest neighbor distributions in high dimensions. Some benchmark datasets are studied. We find that the local rank correlation closely corresponds to our visual interpretation of the quality of the output. In addition, we demonstrate that the local rank correlation is useful in estimating the intrinsic dimensionality of the original data, and in selecting a suitable value of tuning parameters used in some algorithms.

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition
Random gradient extrapolation for distributed and stochastic optimization
Online Allocation with Traffic Spikes: Mixing Adversarial and Stochastic Models
Fast Predictive Simple Geodesic Regression
Predicting vehicular travel times by modeling heterogeneous influences between arterial roads
End-to-end Training for Whole Image Breast Cancer Diagnosis using An All Convolutional Design
Large-scale Analysis of Opioid Poisoning Related Hospital Visits in New York State
Detecting Egregious Conversations between Customers and Virtual Agents
A Parameter Estimation Method Using Linear Response Statistics: Numerical Scheme
WebRelate: Integrating Web Data with Spreadsheets using Examples
CMU LiveMedQA at TREC 2017 LiveQA: A Consumer Health Question Answering System
Rigid Graph Compression: Motif-based rigidity analysis for disordered fiber networks
Finer Grained Entity Typing with TypeNet
ORBIT: Ordering Based Information Transfer Across Space and Time for Global Surface Water Monitoring
Avalanche precursors of failure in hierarchical fuse networks
Robust and Precise Vehicle Localization based on Multi-sensor Fusion in Diverse City Scenes
Set complexity of construction of a regular polygon
Hierarchical Modeling of Seed Variety Yields and Decision Making for Future Planting Plans
Global convergence rates of augmented Lagrangian methods for constrained convex programming
K3, L3, LP, RM3, A3, FDE: How to Make Many-Valued Logics Work for You
Fronthaul-Aware Group Sparse Precoding and Signal Splitting in SWIPT C-RAN
Understanding the Changing Roles of Scientific Publications via Citation Embeddings
Bootstrapped synthetic likelihood
The Mpemba index and anomalous relaxation
Cograph Editing in $O(3^n n)$ time and $O(2^n)$ space
Least informative distributions in Maximum q-log-likelihood estimation
On monotonicity of some functionals with variable exponent under symmetrization
AOGNets: Deep AND-OR Grammar Networks for Visual Recognition
Knowledge transfer for surgical activity prediction
Local eigenvalue statistics of one-dimensional random non-selfadjoint pseudo-differential operators
Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning
Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy
An Optimal and Progressive Approach to Online Search of Top-k Important Communities
Categorical data analysis using a skewed Weibull regression model
Pricing Football Players using Neural Networks
Predictive Independence Testing, Predictive Conditional Independence Testing, and Predictive Graphical Modelling
A new characterization of the dual polar graphs
Packing nearly optimal Ramsey R(3,t) graphs
Solution Uniqueness of Convex Piecewise Affine Functions Based Optimization with Applications to Constrained $\ell_1$ Minimization
Crowdsourcing Question-Answer Meaning Representations
On Analyzing Job Hop Behavior and Talent Flow Networks
Occlusion Aware Unsupervised Learning of Optical Flow
Linear-Cost Covariance Functions for Gaussian Random Fields
Using experimental game theory to transit human values to ethical AI
NISP: Pruning Networks using Neuron Importance Score Propagation
Consistency of Hill Estimators in a Linear Preferential Attachment Model
On Channel Reciprocity to Activate Uplink Channel Training for Downlink Data Transmission
Optimal Load Balancing in Millimeter Wave Cellular Heterogeneous Networks
Priming Neural Networks
Learning Deeply Supervised Visual Descriptors for Dense Monocular Reconstruction
Enhanced Array Aperture using Higher Order Statistics for DoA Estimation
Budget-Constrained Multi-Armed Bandits with Multiple Plays
Defense against Universal Adversarial Perturbations
A Design-Time/Run-Time Application Mapping Methodology for Predictable Execution Time in MPSoCs
Enhanced Attacks on Defensively Distilled Deep Neural Networks
When Mobile Blockchain Meets Edge Computing: Challenges and Applications
Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition
Learning from Millions of 3D Scans for Large-scale 3D Face Recognition
HandSeg: A Dataset for Hand Segmentation from Depth Images
3D Face Reconstruction from Light Field Images: A Model-free Approach
Zero-Annotation Object Detection with Web Knowledge Transfer
HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation
Less-forgetful Learning for Domain Expansion in Deep Neural Networks
Physical-Layer Schemes for Wireless Coded Caching
Learning to Find Good Correspondences
Optimal Selection of Interconnections in Composite Systems for Structural Controllability
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
Gamma-positivity in combinatorics and geometry
A kind of orthogonal polynomials and related identities II
On evolutionary selection of blackjack strategies
Superpixel clustering with deep features for unsupervised road segmentation
Bayesian uncertainty quantification in linear models for diffusion MRI
Remedies against the Vocabulary Gap in Information Retrieval
Hindsight policy gradients
A Law of Large Numbers in the Supremum Norm for a Multiscale Stochastic Spatial Gene Network
Parametric Manifold Learning Via Sparse Multidimensional Scaling
Stability of optimal spherical codes
A Revisit on Deep Hashings for Large-scale Content Based Image Retrieval
Global versus Localized Generative Adversarial Nets
Probabilities of incidence between lines and a plane curve over finite fields
Sub-committee Approval Voting and Generalised Justified Representation Axioms
Natural Language Guided Visual Relationship Detection
Utility maximization via decoupling fields
From Algorithmic Black Boxes to Adaptive White Boxes: Declarative Decision-Theoretic Ethical Programs as Codes of Ethics
Permutations sorted by a finite and an infinite stack in series
Frame Interpolation with Multi-Scale Deep Loss Functions and Generative Adversarial Networks
On the critical densities of minor-closed classes
Dynamical characterization of combinatorially rich sets near zero
Integrated Face Analytics Networks through Cross-Dataset Hybrid Training
Gaussian Process Decentralized Data Fusion Meets Transfer Learning in Large-Scale Distributed Cooperative Perception
The signature of robot action success in EEG signals of a human observer: Decoding and visualization using deep convolutional neural networks
Adjusting for selective non-participation with re-contact data in the FINRISK 2012 survey
The Perception-Distortion Tradeoff
Two Birds with One Stone: Iteratively Learn Facial Attributes with GANs
Extensions of the Hitsuda-Skorokhod integral
Sequences, Items And Latent Links: Recommendation With Consumed Item Packs
Improving Consistency and Correctness of Sequence Inpainting using Semantically Guided Generative Adversarial Network
On first exit times and their means for Brownian bridges
Robust Unsupervised Domain Adaptation for Neural Networks via Moment Alignment
SUPRA: Open Source Software Defined Ultrasound Processing for Real-Time Applications
Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification
3D Trajectory Reconstruction of Dynamic Objects Using Planarity Constraints
Switch chain mixing times through triangle counts
ConvAMR: Abstract meaning representation parsing
Learning Compositional Visual Concepts with Mutual Consistency
Reliable Video Streaming over mmWave with Multi Connectivity and Network Coding
Zero-Shot Learning via Category-Specific Visual-Semantic Mapping
Beyond Sparsity: Tree Regularization of Deep Models for Interpretability
A novel low-rank matrix completion approach to estimate missing entries in Euclidean distance matrices
Nonlinear dependencies on Brazilian equity network from mutual information minimum spanning trees
A Low-Rank Rounding Heuristic for Semidefinite Relaxation of Hydro Unit Commitment Problems
Neurology-as-a-Service for the Developing World
Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian
Stationary states of boundary driven exclusion processes with nonreversible boundary dynamics
Power Diagram Detection with Applications to Information Elicitation
On the probability of nonexistence in binomial subsets
Converting P-Values in Adaptive Robust Lower Bounds of Posterior Probabilities to increase the reproducible Scientific ‘Findings’
A Forward-Backward Approach for Visualizing Information Flow in Deep Networks
Boolean Extremes and Dagum Distributions
A Novel Framework for Robustness Analysis of Visual QA Models
Deceptiveness of internet data for disease surveillance
Uniform weak convergence of poverty measures with relative poverty lines