Swish: a Self-Gated Activation Function

The choice of activation functions in deep networks has a significant effect on the training dynamics and task performance. Currently, the most successful and widely-used activation function is the Rectified Linear Unit (ReLU). Although various alternatives to ReLU have been proposed, none have managed to replace it due to inconsistent gains. In this work, we propose a new activation function, named Swish, which is simply f(x) = x \cdot \text{sigmoid}(x). Our experiments show that Swish tends to work better than ReLU on deeper models across a number of challenging datasets. For example, simply replacing ReLUs with Swish units improves top-1 classification accuracy on ImageNet by 0.9% for Mobile NASNet-A and 0.6% for Inception-ResNet-v2. The simplicity of Swish and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.

Gradient-free Policy Architecture Search and Adaptation

We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent’s lifetime as it learns to drive in a realistic simulated environment.

Sparse Linear Isotonic Models

In machine learning and data mining, linear models have been widely used to model the response as parametric linear functions of the predictors. To relax such stringent assumptions made by parametric linear models, additive models consider the response to be a summation of unknown transformations applied on the predictors; in particular, additive isotonic models (AIMs) assume the unknown transformations to be monotone. In this paper, we introduce sparse linear isotonic models (SLIMs) for highdimensional problems by hybridizing ideas in parametric sparse linear models and AIMs, which enjoy a few appealing advantages over both. In the high-dimensional setting, a two-step algorithm is proposed for estimating the sparse parameters as well as the monotone functions over predictors. Under mild statistical assumptions, we show that the algorithm can accurately estimate the parameters. Promising preliminary experiments are presented to support the theoretical results.

Linear Regression with Sparsely Permuted Data

In regression analysis of multivariate data, it is tacitly assumed that response and predictor variables in each observed response-predictor pair correspond to the same entity or unit. In this paper, we consider the situation of ‘permuted data’ in which this basic correspondence has been lost. Several recent papers have considered this situation without further assumptions on the underlying permutation. In applications, the latter is often to known to have additional structure that can be leveraged. Specifically, we herein consider the common scenario of ‘sparsely permuted data’ in which only a small fraction of the data is affected by a mismatch between response and predictors. However, an adverse effect already observed for sparsely permuted data is that the least squares estimator as well as other estimators not accounting for such partial mismatch are inconsistent. One approach studied in detail herein is to treat permuted data as outliers which motivates the use of robust regression formulations to estimate the regression parameter. The resulting estimate can subsequently be used to recover the permutation. A notable benefit of the proposed approach is its computational simplicity given the general lack of procedures for the above problem that are both statistically sound and computationally appealing.

The Bayesian Sorting Hat: A Decision-Theoretic Approach to Size-Constrained Clustering

Size-constrained clustering (SCC) refers to the dual problem of using observations to determine latent cluster structure while at the same time assigning observations to the unknown clusters subject to an analyst defined constraint on cluster sizes. While several approaches have been proposed, SCC remains a difficult problem due to the combinatorial dependency between observations introduced by the size-constraints. Here we reformulate SCC as a decision problem and introduce a novel loss function to capture various types of size constraints. As opposed to prior work, our approach is uniquely suited to situations in which size constraints reflect and external limitation or desire rather than an internal feature of the data generation process. To demonstrate our approach, we develop a Bayesian mixture model for clustering respondents using both simulated and real categorical survey data. Our motivation for the development of this decision theoretic approach to SCC was to determine optimal team assignments for a Harry Potter themed scavenger hunt based on categorical survey data from participants.

Reply With: Proactive Recommendation of Email Attachments

Email responses often contain items-such as a file or a hyperlink to an external document-that are attached to or included inline in the body of the message. Analysis of an enterprise email corpus reveals that 35% of the time when users include these items as part of their response, the attachable item is already present in their inbox or sent folder. A modern email client can proactively retrieve relevant attachable items from the user’s past emails based on the context of the current conversation, and recommend them for inclusion, to reduce the time and effort involved in composing the response. In this paper, we propose a weakly supervised learning framework for recommending attachable items to the user. As email search systems are commonly available, we constrain the recommendation task to formulating effective search queries from the context of the conversations. The query is submitted to an existing IR system to retrieve relevant items for attachment. We also present a novel strategy for generating labels from an email corpus—without the need for manual annotations—that can be used to train and evaluate the query formulation model. In addition, we describe a deep convolutional neural network that demonstrates satisfactory performance on this query formulation task when evaluated on the publicly available Avocado dataset and a proprietary dataset of internal emails obtained through an employee participation program.

Constrained Factor Models for High-Dimensional Matrix-Variate Time Series

In many scientific fields, including economics, biology, and meteorology, high dimensional matrix-variate data are routinely collected over time. To incorporate the structural interrelations between columns and rows and to achieve significant dimension reduction when dealing with high-dimensional matrix-variate time series, \cite{Wang-Liu-Chen-2017} proposed a matrix factor model that is shown to be effective in analyzing such data. In this paper, we establish a general framework for incorporating domain or prior knowledge induced linear constraints in the matrix-variate factor model. The constraints can be used to achieve parsimony in parameterization, to facilitate interpretation of the latent matrix factor, and to target specific factors of interest based on domain theories. Fully utilizing the constraints results in more efficient and accurate modeling, inference, dimension reduction as well as a clear and better interpretation of the results. In this paper, constrained, multi-term, and partially constrained factor models for matrix-variate time series are developed, with efficient estimation procedures and their asymptotic properties. We show that the convergence rates of the constrained factor loading matrices are much faster than those of the conventional matrix factor analysis under many situations. Simulation studies are carried out to demonstrate the performance of the proposed method and the associated asymptotic properties. We demonstrate the proposed model with three applications, where the constrained matrix factor models outperform their unconstrained counterparts in the power of variance explanation under the out-of-sample 10-fold cross-validation setting.

On the challenges of learning with inference networks on sparse, high-dimensional data

We study parameter estimation in Nonlinear Factor Analysis (NFA) where the generative model is parameterized by a deep neural network. Recent work has focused on learning such models using inference (or recognition) networks; we identify a crucial problem when modeling large, sparse, high-dimensional datasets — underfitting. We study the extent of underfitting, highlighting that its severity increases with the sparsity of the data. We propose methods to tackle it via iterative optimization inspired by stochastic variational inference \citep{hoffman2013stochastic} and improvements in the sparse data representation used for inference. The proposed techniques drastically improve the ability of these powerful models to fit sparse data, achieving state-of-the-art results on a benchmark text-count dataset and excellent results on the task of top-N recommendation.

Map-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of Robots by Deep Reinforcement Learning

In order for robots to perform mission-critical tasks, it is essential that they are able to quickly adapt to changes in their environment as well as to injuries and or other bodily changes. Deep reinforcement learning has been shown to be successful in training robot control policies for operation in complex environments. However, existing methods typically employ only a single policy. This can limit the adaptability since a large environmental modification might require a completely different behavior compared to the learning environment. To solve this problem, we propose Map-based Multi-Policy Reinforcement Learning (MMPRL), which aims to search and store multiple policies that encode different behavioral features while maximizing the expected reward in advance of the environment change. Thanks to these policies, which are stored into a multi-dimensional discrete map according to its behavioral feature, adaptation can be performed within reasonable time without retraining the robot. An appropriate pre-trained policy from the map can be recalled using Bayesian optimization. Our experiments show that MMPRL enables robots to quickly adapt to large changes without requiring any prior knowledge on the type of injuries that could occur. A highlight of the learned behaviors can be found here: https://youtu.be/qcCepAKL32U .

Convolutional Recurrent Neural Networks for Electrocardiogram Classification

We propose two deep neural network architectures for classification of arbitrary-length electrocardiogram (ECG) recordings and evaluate them on the atrial fibrillation (AF) classification data set provided by the PhysioNet/CinC Challenge 2017. The first architecture is a deep convolutional neural network (CNN) with averaging-based feature aggregation across time. The second architecture combines convolutional layers for feature extraction with long-short term memory (LSTM) layers for temporal aggregation of features. As a key ingredient of our training procedure we introduce a simple data augmentation scheme for ECG data and demonstrate its effectiveness in the AF classification task at hand. The second architecture was found to outperform the first one, obtaining an F_1 score of 82.1% on the hidden challenge testing set.

Deep Gaussian Covariance Network

The correlation length-scale next to the noise variance are the most used hyperparameters for the Gaussian processes. Typically, stationary covariance functions are used, which are only dependent on the distances between input points and thus invariant to the translations in the input space. The optimization of the hyperparameters is commonly done by maximizing the log marginal likelihood. This works quite well, if the distances are uniform distributed. In the case of a locally adapted or even sparse input space, the prediction of a test point can be worse dependent of its position. A possible solution to this, is the usage of a non-stationary covariance function, where the hyperparameters are calculated by a deep neural network. So that the correlation length scales and possibly the noise variance are dependent on the test point. Furthermore, different types of covariance functions are trained simultaneously, so that the Gaussian process prediction is an additive overlay of different covariance matrices. The right covariance functions combination and its hyperparameters are learned by the deep neural network. Additional, the Gaussian process will be able to be trained by batches or online and so it can handle arbitrarily large data sets. We call this framework Deep Gaussian Covariance Network (DGCP). There are also further extensions to this framework possible, for example sequentially dependent problems like time series or the local mixture of experts. The basic framework and some extension possibilities will be presented in this work. Moreover, a comparison to some recent state of the art surrogate model methods will be performed, also for a time dependent problem.

Iterative Supervised Principal Components

In high-dimensional prediction problems, where the number of features may greatly exceed the number of training instances, fully Bayesian approach with a sparsifying prior is known to produce good results but is computationally challenging. To alleviate this computational burden, we propose to use a preprocessing step where we first apply a dimension reduction to the original data to reduce the number of features to something that is computationally conveniently handled by Bayesian methods. To do this, we propose a new dimension reduction technique, called iterative supervised principal components (ISPC), which combines variable screening and dimension reduction and can be considered as an extension to the existing technique of supervised principal components (SPCs). Our empirical evaluations confirm that, although not foolproof, the proposed approach provides very good results on several microarray benchmark datasets with very affordable computation time, and can also be very useful for visualizing high-dimensional data.

Deep Spectral Descriptors: Learning the point-wise correspondence metric via Siamese deep neural networks

A robust and informative local shape descriptor plays an important role in mesh registration. In this regard, spectral descriptors that are based on the spectrum of the Laplace-Beltrami operator have gained a spotlight among the researchers for the last decade due to their desirable properties, such as isometry invariance. Despite such, however, spectral descriptors often fail to give a correct similarity measure for non-isometric cases where the metric distortion between the models is large. Hence, they are in general not suitable for the registration problems, except for the special cases when the models are near-isometry. In this paper, we investigate a way to develop shape descriptors for non-isometric registration tasks by embedding the spectral shape descriptors into a different metric space where the Euclidean distance between the elements directly indicates the geometric dissimilarity. We design and train a Siamese deep neural network to find such an embedding, where the embedded descriptors are promoted to rearrange based on the geometric similarity. We found our approach can significantly enhance the performance of the conventional spectral descriptors for the non-isometric registration tasks, and outperforms recent state-of-the-art method reported in literature.

Embedding an Edge-colored $K(a^{(p)};λ,μ)$ into a Hamiltonian Decomposition of $K(a^{(p+r)};λ,μ)$
Embedding factorizations for 3-uniform hypergraphs
Dominating 2-broadcast in graphs: complexity, bounds and extremal graphs
HyperDense-Net: A hyper-densely connected CNN for multi-modal image semantic segmentation
Bisected theta series, least $r$-gaps in partitions, and polygonal numbers
A Short Note on Improved ROSETA
Asymptotic distribution of least square estimators for linear models with dependent errors : regular designs
Derivation of the Chapman-Kolmogorov type equation from a stochastic hybrid system
Selection of calibrated subaction when temperature goes to zero in the discounted problem
Convolutional Neural Networks for Sentiment Classification on Business Reviews
Safe Medicine Recommendation via Medical Knowledge Graph Embedding
SpecWatch: A Framework for Adversarial Spectrum Monitoring with Unknown Statistics
Pushing the envelope in deep visual recognition for mobile platforms
An operational characterization of mutual information in algorithmic information theory
Free mutual information for two projections
Contributed Discussion to Uncertainty Quantification for the Horseshoe by Stéphanie van der Pas, Botond Szabó and Aad van der Vaart
Volumetric Data Exploration with Machine Learning-Aided Visualization in Neutron Science
The Sandpile Group of a Thick Cycle Graph
When Do Birds of a Feather Flock Together? $K$-Means, Proximity, and Conic Programming
The quantum adjacency algebra and subconstituent algebra of a graph
VAMPnets: Deep learning of molecular kinetics
Estimating reducible stochastic differential equations by conversion to a least-squares problem
Global exact controllability of the bilinear of Schroedinger potential type models on quantum graphs
Quantum query complexity of entropy estimation
Targeting Interventions in Networks
Large Scale Replication Projects in Contemporary Psychological Research
Checking the Soundness of Statistical Tests for Random Number Generators by Using a Three-Level Test
Stochastic Variance Reduction for Policy Gradient Estimation
On Hamilton Decompositions of Line Graphs of Non-Hamiltonian Graphs and Graphs without Separating Transitions
Renormalized Solutions to Stochastic Continuity Equations with Rough Coefficients
Repetition in Colored Sequences of Balls
Evolution in Virtual Worlds
Asymptotically Optimal Sequential Design for Rank Aggregation
CancerLinker: Explorations of Cancer Study Network
Data analysis recipes: Using Markov Chain Monte Carlo
PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts
Incremental Subgradient Methods for Minimizing The Sum of Quasi-convex Functions
Estimate exponential memory decay in Hidden Markov Model and its applications
Optimal Actuator Location of the Minimum Norm Controls for Stochastic Heat Equations
Discovering Adversarial Examples with Momentum
Box-Cox elliptical distributions with application
Matroids and Canonical Forms: Theory and Applications
Hierarchical Fog-Cloud Computing for IoT Systems: A Computation Offloading Game
Face Transfer with Generative Adversarial Network
Multi-Tenant C-RAN With Spectrum Pooling: Downlink Optimization Under Privacy Constraints
Spontaneous Symmetry Breaking in Neural Networks
Primal-Dual $π$ Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems
Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55
CASICT Tibetan Word Segmentation System for MLWS2017
A tightness property of relatively smooth permutations
Countable infinitary theories admitting an invariant measure
Higher Nerves of Simplicial Complexes
Scalable Dense Monocular Surface Reconstruction
Saddle representations of positively homogeneous functions by linear functions
Operational thermal load forecasting in district heating networks using machine learning and expert advice
Universal-homogeneous structures are generic
Planck-scale distribution of nodal length of arithmetic random waves
Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks
Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection
Detecting Bias in Black-Box Models Using Transparent Model Distillation
Learning to Learn Image Classifiers with Informative Visual Analogy
Accretion-induced spin-wandering effects on the neutron star in Scorpius X-1: Implications for continuous gravitational wave searches
Bits through queues with feedback
Strong Consistency of Spectral Clustering for Stochastic Block Models
Hybrid Precoder and Combiner Design with Low Resolution Phase Shifters in mmWave MIMO Systems
A New Coherence-Penalized Minimal Path Model with Application to Retinal Vessel Centerline Delineation
Partition C*-algebras
Continuants, run lengths, and Barry’s modified Pascal triangle
Projective reconstruction in algebraic vision
Learning to Transfer Initializations for Bayesian Hyperparameter Optimization
Extremes of $2d$ Coulomb gas: universal intermediate deviation regime
Fusion of LiDAR and Camera Sensor Data for Environment Sensing in Driverless Vehicles
3D Object Discovery and Modeling Using Single RGB-D Images Containing Multiple Object Instances
Analysis of feature detector and descriptor combinations with a localization experiment for various performance metrics
Nonlinear Interference Mitigation via Deep Neural Networks
Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks
Single Shot Temporal Action Detection
Integrated mmWave Access and Backhaul in 5G: Bandwidth Partitioning and Downlink Analysis
Stochastic reaction networks with input processes: Analysis and applications to reporter gene systems
Convergence Rate of Riemannian Hamiltonian Monte Carlo and Faster Polytope Volume Computation
Procedural Modeling and Physically Based Rendering for Synthetic Data Generation in Automotive Applications
Combinatorial Penalties: Which structures are preserved by convex relaxations?
Smooth and Sparse Optimal Transport
Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions
Existence and uniqueness of reflecting diffusions in cusps
A tight Erdős-Pósa function for wheel minors
Preliminary steps toward a universal economic dynamics for monetary and fiscal policy
On the skeleton of the pyramidal tours polytope
A Deep Learning Approach for Reconstruction Filter Kernel Discretization
VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition
A generative model for sparse, evolving digraphs
Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance
Towards CT-quality Ultrasound Imaging using Deep Learning
A group version of stable regularity
Paying Attention to Multi-Word Expressions in Neural Machine Translation
DASHMM Accelerated Adaptive Fast Multipole Poisson-Boltzmann Solver on Distributed Memory Architecture
Beat by Beat: Classifying Cardiac Arrhythmias with Recurrent Neural Networks
Symbol Erasure Correction Capability of Spread Codes
Factor Models for High-Dimensional Dynamic Networks: with Application to International Trade Flow Time Series 1981-2015
Distributed algorithm for empty vehicles management in personal rapid transit (PRT) network
Hemisystems of the Hermitian Surface
Understanding the Correlation Gap for Matchings
Compound Poisson approximation of subgraph counts in stochastic block models with multiple edges
Reflection local times of diffusions at elastic boundaries
Multivariate Spatio-temporal Kriging on Latent Low-dimensional Functional Structures with Non-stationarity
Spectra of Wishart Matrices with size-dependent entries
Wigner functions for the pair angle and orbital angular momentum: Possible applications in quantum information theories
Good Arm Identification via Bandit Feedback
On the spectrum of directed uniform and non-uniform hypergraphs
Specialising Word Vectors for Lexical Entailment
The Hard Problems Are Almost Everywhere For Random CNF-XOR Formulas
Containment problem and combinatorics
Convergence diagnostics for stochastic gradient descent with constant step size
Projective planes and set multipartite Ramsey numbers for $C_4$ versus star
Efficient Neighbor-Finding on Space-Filling Curves
Fishing for Clickbaits in Social Images and Texts with Linguistically-Infused Neural Network Models
RETUYT in TASS 2017: Sentiment Analysis for Spanish Tweets using SVM and CNN
Laying Down the Yellow Brick Road: Development of a Wizard-of-Oz Interface for Collecting Human-Robot Dialogue
Multi-task Domain Adaptation for Deep Learning of Instance Grasping from Simulation
Domain Randomization and Generative Models for Robotic Grasping