A Unified Framework for Training Neural Networks

The lack of mathematical tractability of Deep Neural Networks (DNNs) has hindered progress towards having a unified convergence analysis of training algorithms, in the general setting. We propose a unified optimization framework for training different types of DNNs, and establish its convergence for arbitrary loss, activation, and regularization functions, assumed to be smooth. We show that framework generalizes well-known first- and second-order training methods, and thus allows us to show the convergence of these methods for various DNN architectures and learning tasks, as a special case of our approach. We discuss some of its applications in training various DNN architectures (e.g., feed-forward, convolutional, linear networks), to regression and classification tasks.

Optimal Transport for structured data

Optimal transport has recently gained a lot of interest in the machine learning community thanks to its ability to compare probability distributions while respecting the underlying space’s geometry. Wasserstein distance deals with feature information through its metric or cost function, but fails in exploiting the structural information, i.e the specific relations existing among the components of the distribution. Recently adapted to a machine learning context, the Gromov-Wasserstein distance defines a metric well suited for comparing distributions that live in different metric spaces by exploiting their inner structural information. In this paper we propose a new optimal transport distance, called the Fused Gromov-Wasserstein distance, capable of leveraging both structural and feature information by combining both views and prove its metric properties over very general manifolds. We also define the barycenter of structured objects as their Fr\’echet mean, leveraging both feature and structural information. We illustrate the versatility of the method for problems where structured objects are involved, computing barycenters in graph and time series contexts. We also use this new distance for graph classification where we obtain comparable or superior results than state-of-the-art graph kernel methods and end-to-end graph CNN approach.

Hyperbolic Neural Networks

Hyperbolic spaces have recently gained momentum in the context of machine learning due to their high capacity and tree-likeliness properties. However, the representational power of hyperbolic geometry is not yet on par with Euclidean geometry, mostly because of the absence of corresponding hyperbolic neural network layers. This makes it hard to use hyperbolic embeddings in downstream tasks. Here, we bridge this gap in a principled manner by combining the formalism of M\’obius gyrovector spaces with the Riemannian geometry of the Poincar\’e model of hyperbolic spaces. As a result, we derive hyperbolic versions of important deep learning tools: multinomial logistic regression, feed-forward and recurrent neural networks such as gated recurrent units. This allows to embed sequential data and perform classification in the hyperbolic space. Empirically, we show that, even if hyperbolic optimization tools are limited, hyperbolic sentence embeddings either outperform or are on par with their Euclidean variants on textual entailment and noisy-prefix recognition tasks.

Model Selection in Time Series Analysis: Using Information Criteria as an Alternative to Hypothesis Testing

The issue of model selection in applied research is of vital importance. Since the true model in such research is not known, which model should be used from among various potential ones is an empirical question. There might exist several competitive models. A typical approach to dealing with this is classic hypothesis testing using an arbitrarily chosen significance level based on the underlying assumption that a true null hypothesis exists. In this paper we investigate how successful this approach is in determining the correct model for different data generating processes using time series data. An alternative approach based on more formal model selection techniques using an information criterion or cross-validation is suggested and evaluated in the time series environment via Monte Carlo experiments. This paper also explores the effectiveness of deciding what type of general relation exists between two variables (e.g. relation in levels or relation in first differences) using various strategies based on hypothesis testing and on information criteria with the presence or absence of unit roots.

Approximate Random Dropout

The training phases of Deep neural network (DNN) consume enormous processing time and energy. Compression techniques for inference acceleration leveraging the sparsity of DNNs, however, can be hardly used in the training phase. Because the training involves dense matrix-multiplication using GPGPU, which endorse regular and structural data layout. In this paper, we exploit the sparsity of DNN resulting from the random dropout technique to eliminate the unnecessary computation and data access for those dropped neurons or synapses in the training phase. Experiments results on MLP and LSTM on standard benchmarks show that the proposed Approximate Random Dropout can reduce the training time by half on average with ignorable accuracy loss.

Distribution Aware Active Learning

Discriminative learning machines often need a large set of labeled samples for training. Active learning (AL) settings assume that the learner has the freedom to ask an oracle to label its desired samples. Traditional AL algorithms heuristically choose query samples about which the current learner is uncertain. This strategy does not make good use of the structure of the dataset at hand and is prone to be misguided by outliers. To alleviate this problem, we propose to distill the structural information into a probabilistic generative model which acts as a \emph{teacher} in our model. The active \emph{learner} uses this information effectively at each cycle of active learning. The proposed method is generic and does not depend on the type of learner and teacher. We then suggest a query criterion for active learning that is aware of distribution of data and is more robust against outliers. Our method can be combined readily with several other query criteria for active learning. We provide the formulation and empirically show our idea via toy and real examples.

Amortized Inference Regularization

The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.

Multi-task Maximum Entropy Inverse Reinforcement Learning

Multi-task Inverse Reinforcement Learning (IRL) is the problem of inferring multiple reward functions from expert demonstrations. Prior work, built on Bayesian IRL, is unable to scale to complex environments due to computational constraints. This paper contributes the first formulation of multi-task IRL in the more computationally efficient Maximum Causal Entropy (MCE) IRL framework. Experiments show our approach can perform one-shot imitation learning in a gridworld environment that single-task IRL algorithms require hundreds of demonstrations to solve. Furthermore, we outline how our formulation can be applied to state-of-the-art MCE IRL algorithms such as Guided Cost Learning. This extension, based on meta-learning, could enable multi-task IRL to be performed for the first time in high-dimensional, continuous state MDPs with unknown dynamics as commonly arise in robotics.

Adversarial Labeling for Learning without Labels

We consider the task of training classifiers without labels. We propose a weakly supervised method—adversarial label learning—that trains classifiers to perform well against an adversary that chooses labels for training data. The weak supervision constrains what labels the adversary can choose. The method therefore minimizes an upper bound of the classifier’s error rate using projected primal-dual subgradient descent. Minimizing this bound protects against bias and dependencies in the weak supervision. Experiments on three real datasets show that our method can train without labels and outperforms other approaches for weakly supervised learning.

Counterfactual Mean Embedding: A Kernel Method for Nonparametric Causal Inference

This paper introduces a novel Hilbert space representation of a counterfactual distribution—called counterfactual mean embedding (CME)—with applications in nonparametric causal inference. Counterfactual prediction has become an ubiquitous tool in machine learning applications, such as online advertisement, recommendation systems, and medical diagnosis, whose performance relies on certain interventions. To infer the outcomes of such interventions, we propose to embed the associated counterfactual distribution into a reproducing kernel Hilbert space (RKHS) endowed with a positive definite kernel. Under appropriate assumptions, the CME allows us to perform causal inference over the entire landscape of the counterfactual distribution. The CME can be estimated consistently from observational data without requiring any parametric assumption about the underlying distributions. We also derive a rate of convergence which depends on the smoothness of the conditional mean and the Radon-Nikodym derivative of the underlying marginal distributions. Our framework can deal with not only real-valued outcome, but potentially also more complex and structured outcomes such as images, sequences, and graphs. Lastly, our experimental results on off-policy evaluation tasks demonstrate the advantages of the proposed estimator.

Optimal Record and Replay under Causal Consistency

We investigate the minimum record needed to replay executions of processes that share causally consistent memory. For a version of causal consistency, we identify optimal records under both offline and online recording setting. Under the offline setting, a central authority has information about every process’ view of the execution and can decide what information to record for each process. Under the online setting, each process has to decide on the record at runtime as the operations are observed.

Expectation propagation: a probabilistic view of Deep Feed Forward Networks

We present a statistical mechanics model of deep feed forward neural networks (FFN). Our energy-based approach naturally explains several known results and heuristics, providing a solid theoretical framework and new instruments for a systematic development of FFN. We infer that FFN can be understood as performing three basic steps: encoding, representation validation and propagation. We obtain a set of natural activations — such as sigmoid, \tanh and ReLu — together with a state-of-the-art one, recently obtained by Ramachandran et al.(arXiv:1710.05941) using an extensive search algorithm. We term this activation ESP (Expected Signal Propagation), explain its probabilistic meaning, and study the eigenvalue spectrum of the associated Hessian on classification tasks. We find that ESP allows for faster training and more consistent performances over a wide range of network architectures.

Semi-supervised learning: When and why it works

Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of unclassified data, to perform a classification in situations when, typically, there is little labelled data. Even though this is not always possible (it depends on how useful, for inferring the labels, it would be to know the distribution of the unlabelled data), several algorithm have been proposed recently. A new algorithm is proposed, that under almost necessary conditions, attains asymptotically the performance of the best theoretical rule as the amount of unlabelled data tends to infinity. The set of necessary assumptions, although reasonable, show that semi-parametric classi- fication only works for very well conditioned problems. The perfor- mance of the algorithm is assessed in the well known ‘Isolet’ real-data of phonemes, where a strong dependence on the choice of the initial training sample is shown.

The Topology ToolKit

This system paper presents the Topology ToolKit (TTK), a software platform designed for topological data analysis in scientific visualization. TTK provides a unified, generic, efficient, and robust implementation of key algorithms for the topological analysis of scalar data, including: critical points, integral lines, persistence diagrams, persistence curves, merge trees, contour trees, Morse-Smale complexes, fiber surfaces, continuous scatterplots, Jacobi sets, Reeb spaces, and more. TTK is easily accessible to end users due to a tight integration with ParaView. It is also easily accessible to developers through a variety of bindings (Python, VTK/C++) for fast prototyping or through direct, dependence-free, C++, to ease integration into pre-existing complex systems. While developing TTK, we faced several algorithmic and software engineering challenges, which we document in this paper. In particular, we present an algorithm for the construction of a discrete gradient that complies to the critical points extracted in the piecewise-linear setting. This algorithm guarantees a combinatorial consistency across the topological abstractions supported by TTK, and importantly, a unified implementation of topological data simplification for multi-scale exploration and analysis. We also present a cached triangulation data structure, that supports time efficient and generic traversals, which self-adjusts its memory usage on demand for input simplicial meshes and which implicitly emulates a triangulation for regular grids with no memory overhead. Finally, we describe an original software architecture, which guarantees memory efficient and direct accesses to TTK features, while still allowing for researchers powerful and easy bindings and extensions. TTK is open source (BSD license) and its code, online documentation and video tutorials are available on TTK’s website.

Communication Algorithms via Deep Learning
A Graphical Measure of Aggregate Flexibility for Energy-Constrained Distributed Resources
Operator Dynamics in Brownian Quantum Circuit
Depth versus Breadth in Convolutional Polar Codes
Input and Weight Space Smoothing for Semi-supervised Learning
SNIPER: Efficient Multi-Scale Training
Learning towards Minimum Hyperspherical Energy
Likelihood-free inference with emulator networks
Interior Point Methods with Adversarial Networks
On self-play computation of equilibrium in poker
Variational Inference for Data-Efficient Model Learning in POMDPs
Dungeons and Dragons: Combinatorics for the $dP_3$ Quiver
WisenetMD: Motion Detection Using Dynamic Background Region Analysis
Computational Complexity of Enumerative 3-Manifold Invariants
Existence and Uniqueness For Variational Data Assimilation in Continuous Time
Reinforcement Learning for Heterogeneous Teams with PALO Bounds
Collective Online Learning via Decentralized Gaussian Processes in Massive Multi-Agent Systems
Learning Illuminant Estimation from Object Recognition
On the Secrecy Capacity of Fisher-Snedecor F Fading Channels
Cleaning up the neighborhood: A full classification for adversarial partial monitoring
Concentric ESN: Assessing the Effect of Modularity in Cycle Reservoirs
Subspace Clustering by Block Diagonal Representation
Highway State Gating for Recurrent Highway Networks: improving information flow through time
Differentially Private Uniformly Most Powerful Tests for Binomial Data
Cramer-Wold AutoEncoder
Segmentation of Liver Lesions with Reduced Complexity Deep Models
SymmSLIC: Symmetry Aware Superpixel Segmentation and its Applications
Monte Carlo Tree Search for Asymmetric Trees
Tight Bounds for Collaborative PAC Learning via Multiplicative Weights
Spatio-temporal modelling of forest monitoring data: Modelling German tree defoliation data collected between 1989 and 2015 for trend estimation and survey grid examination using GAMMs
Learning latent variable structured prediction models with Gaussian perturbations
Construnctions of LOCC indistinguishable set of generalized Bell states
How much does a word weigh Weighting word embeddings for word sense induction
Pushing the bounds of dropout
The first passage sets of the 2D Gaussian free field: convergence and isomorphisms
Attributes in Multiple Facial Images
ASR-based Features for Emotion Recognition: A Transfer Learning Approach
On the nature of the Swiss cheese in dimension 3
Robust Perception through Analysis by Synthesis
Improved Bounds for Pencils of Lines
Alternating Randomized Block Coordinate Descent
A New Approach to the Statistical Analysis of Non-Central Complex Gaussian Quadratic Forms with Applications
The upper bounds on the edge numbers of flag odd dimensional normal pseudomanifolds
Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics
Detecting SNPs with interactive effects on a quantitative trait
Efficient online algorithms for fast-rate regret bounds under sparsity
Border Avoidance: Necessary Regularity for Coefficients and Viscosity Approach
Matrix Co-completion for Multi-label Classification with Missing Features and Labels
Agilit{é} de d{é}veloppement des SI informatis{é}s et outils MDE : d{é}marche p{é}dagogique dans un cours de conception de syst{è}mes d’information informatis{é}s
RDF2Vec-based Classification of Ontology Alignment Changes
Longest increasing paths with gaps
Approximation of Sweeping Processes and Controllability for a Set Valued Evolution
Probabilistic Riemannian submanifold learning with wrapped Gaussian process latent variable models
A logical representation of Arabic questions toward automatic passage extraction from the Web
Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System
Optimal Load Ensemble Control in Chance-Constrained Optimal Power Flow
Maize Haploid Identification via LSTM-CNN and Hyperspectral Imaging Technology
Algorithms and Performance Analysis for Stochastic Wiener System Identification
Image Restoration by Estimating Frequency Distribution of Local Patches
Excitation Dropout: Encouraging Plasticity in Deep Neural Networks
Neural networks for post-processing ensemble weather forecasts
Volunteers in the Smart City: Comparison of Contribution Strategies on Human-Centered Measures
Unequal Sized Stable Marriage Problem
Local Tomography of Large Networks under the Low-Observability Regime
Supplement to the article ‘irreducible polynomials with bounded height’
Accelerating the Fast Gradient Method
Constrained Graph Variational Autoencoders for Molecule Design
Asymptotic Performance Analysis of GSVD-NOMA Systems with a Large-Scale Antenna Array
Local time for lattice paths and the associated limit laws
Dynamical generalizations of the Drake equation: the linear and non-linear theories
A Simple Re-Derivation of Onsager’s Solution of the 2D Ising Model using Experimental Mathematics
Perspectives of Using Oscillators for Computing and Signal Processing
Grounding the Semantics of Part-of-Day Nouns Worldwide using Twitter
Quantum error-correcting codes: the unit-derived strategy
Guessing with a Bit of Help
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms
Representation Balancing MDPs for Off-Policy Policy Evaluation
Generalisation of structural knowledge in the Hippocampal-Entorhinal system
Amortized Context Vector Inference for Sequence-to-Sequence Networks
Trans-Gaussian Kriging in a Bayesian framework : a case study
Saliency deep embedding for aurora image search
Evidence of nanoscale Anderson localization induced by intrinsic compositional disorder in InGaN/GaN quantum wells by scanning tunneling luminescence spectroscopy
Efficient Relaxations for Dense CRFs with Sparse Higher Order Potentials
Joint String Complexity for Markov Sources: Small Data Matters
Addressing the Item Cold-start Problem by Attribute-driven Active Learning
CNN+CNN: Convolutional Decoders for Image Captioning
Cloud Brokerage: A Systematic Survey
Rectangular Young tableaux with local decreases and the density method for uniform random generation (short version)
Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages
Networked Control Systems Secured by Quantum Key Distribution
Concentration of dynamic risk measures in a Brownian filtration
A Transition-based Algorithm for Unrestricted AMR Parsing
Minimum number of non-zero-entries in a $7\times 7$ stable matrix
On the Relation of Impulse Propagation to Synaptic Strength
Hydrodynamic limit for an activated exclusion process
Lower bounds and asymptotics of real double Hurwitz numbers
GPU Accelerated Cascade Hashing Image Matching for Large Scale 3D Reconstruction
Eigenvector correlations in the complex Ginibre ensemble
Propriety of the reference posterior distribution in Gaussian Process regression
Some degree and distance-based invariants of wreath products of graphs
On the Formal Model for IEC 61499 Composite Function Blocks
Self-Attention-Based Message-Relevant Response Generation for Neural Conversation Model
RGB-T Object Tracking:Benchmark and Baseline
Game of Coins
Particle Filter Networks: End-to-End Probabilistic Localization From Visual Observations
Do Better ImageNet Models Transfer Better
DRPose3D: Depth Ranking in 3D Human Pose Estimation
Toward a Thinking Microscope: Deep Learning in Optical Microscopy and Image Reconstruction
Neural Network Interpretation via Fine Grained Textual Summarization
Análisis estadístico ex post del conteo rápido institucional de la elección de gobernador del Estado de México en el año 2017
Dynamics of Kuramoto oscillators with time-delayed positive and negative couplings
Discovering Blind Spots in Reinforcement Learning
3D Human Pose Estimation with Relational Networks
ICADx: Interpretable computer aided diagnosis of breast masses
A Brand-level Ranking System with the Customized Attention-GRU Model
Semi-Supervised Learning with GANs: Revisiting Manifold Regularization
Hypergraph Spectral Clustering in the Weighted Stochastic Block Model
Coded Caching via Line Graphs of Bipartite Graphs
Maximal and maximum transitive relation contained in a given binary relation
Dictionary Learning by Dynamical Neural Networks
Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow
Scalable Coordinated Exploration in Concurrent Reinforcement Learning
Building Extraction at Scale using Convolutional Neural Network: Mapping of the United States
Gamma expansions of $q$-Narayana polynomials, pattern avoidance and the $(-1)$-phenomenon
Opportunistic Scheduling in Underlay Cognitive Radio based MIMO-RF/FSO Networks
Congruences modulo powers of 3 for 2-color partition triples
AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference
Monochromatic Hilbert cubes and arithmetic progressions
Predicting football tables by a maximally parsimonious model
Large-Scale Neuromorphic Spiking Array Processors: A quest to mimic the brain
On The Estimation of the Hurst Exponent Using Adjusted Rescaled Range Analysis, Detrended Fluctuation Analysis and Variance Time Plot: A Case of Exponential Distribution
Analysis of Thompson Sampling for Graphical Bandits Without the Graphs
Determining the Number of Samples Required to Estimate Entropy in Natural Sequences
Efficient estimation of stable Levy process with symmetric jumps
Covert Transmission with Harvested Energy by a Wireless Powered Relay
Approximate Newton-based statistical inference using only stochastic gradients
A Psychopathological Approach to Safety Engineering in AI and AGI
Enhancing Chinese Intent Classification by Dynamically Integrating Character Features into Word Embeddings with Ensemble Techniques
MmWave Beam Prediction with Situational Awareness: A Machine Learning Approach
Estimating forest biodiversity in airborne laser scanning assisted inventories using spatial measures
Joint Optimal Design for Outage Minimization in DF Relay-assisted Underwater Acoustic Networks
AffinityNet: semi-supervised few-shot learning for disease type prediction
Community Detection with Side Information: Exact Recovery under the Stochastic Block Model
A new bound on Erdős distinct distances problem in the plane over prime fields
EcoRNN: Fused LSTM RNN Implementation with Data Layout Optimization
Energy Sustainable IoT with Individual QoS Constraints Through MISO SWIPT Multicasting
Teacher’s Perception in the Classroom
On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing
Step Size Matters in Deep Learning
Spiking Linear Dynamical Systems on Neuromorphic Hardware for Low-Power Brain-Machine Interfaces
Sensitivity of Regular Estimators
ARiA: Utilizing Richard’s Curve for Controlling the Non-monotonicity of the Activation Function in Deep Neural Nets
Soteria: Automated IoT Safety and Security Analysis
Unsupervised Domain Adaptation using Regularized Hyper-graph Matching
Superconducting receiver arrays for magnetic resonance imaging
Langevin Markov Chain Monte Carlo with stochastic gradients
Deep Denoising: Rate-Optimal Recovery of Structured Signals with a Deep Prior
Distribution Matching Losses Can Hallucinate Features in Medical Image Translation
Tiling the plane with equilateral triangles
Clustering – What Both Theoreticians and Practitioners are Doing Wrong
Quantum classification of the MNIST dataset via Slow Feature Analysis
Nonparametric Density Estimation under Adversarial Losses
The Impact of Uncle Rewards on Selfish Mining in Ethereum
A note on Stein’s method on the third and fourth Wiener chaoses
Rapid seismic domain transfer: Seismic velocity inversion and modeling using deep generative neural networks
Estimates for functionals of solutions to higher-order heat-type equations with random initial conditions
Global-and-local attention networks for visual recognition
copMEM: Finding maximal exact matches via sampling both genomes
From Dissipativity Theory to Compositional Abstractions of Interconnected Stochastic Hybrid Systems
A uniform $L^1$ law of large numbers for functions of i.i.d. random variables that are translated by a consistent estimator
Infinite-Task Learning with Vector-Valued RKHSs
Deformable Part Networks
Lévy-driven causal CARMA random fields
Resource Aware Person Re-identification across Multiple Resolutions
Multi-View Graph Convolutional Network and Its Applications on Neuroimage Analysis for Parkinson’s Disease
A scene perception system for visually impaired based on object detection and classification using multi-modal DCNN
Non-saturating large magnetoresistance in semimetals
Stability of the centers of group algebras of $GL_n(q)$
A distinct approach to diagnose Dengue Fever with the help of Soft Set Theory
Early Cancer Detection in Blood Vessels Using Mobile Nanosensors
Online shortest paths with confidence intervals for routing in a time varying random network
Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients
AdGraph: A Machine Learning Approach to Automatic and Effective Adblocking
A New Finitely Controllable Class of Tuple Generating Dependencies: The Triangularly-Guarded Class
Approach-Level Real-Time Crash Risk Analysis for Signalized Intersections
CUDACLAW: A high-performance programmable GPU framework for the solution of hyperbolic PDEs
Characterization of graphs with exactly two positive eigenvalues
Distributed Regularized Dual Gradient Algorithm for Constrained Convex Optimization over Time-Varying Directed Graphs
Immersive Virtual Reality Serious Games for Evacuation Training and Research: A Systematic Literature Review
Image Captioning