Implicit Regularization in Deep Learning

In an attempt to better understand generalization in deep learning, we study several possible explanations. We show that implicit regularization induced by the optimization method is playing a key role in generalization and success of deep learning models. Motivated by this view, we study how different complexity measures can ensure generalization and explain how optimization algorithms can implicitly regularize complexity measures. We empirically investigate the ability of these measures to explain different observed phenomena in deep learning. We further study the invariances in neural networks, suggest complexity measures and optimization algorithms that have similar invariances to those in neural networks and evaluate them on a number of learning tasks.

Artificial Intelligence and Data Science in the Automotive Industry

Data science and machine learning are the key technologies when it comes to the processes and products with automatic learning and optimization to be used in the automotive industry of the future. This article defines the terms ‘data science’ (also referred to as ‘data analytics’) and ‘machine learning’ and how they are related. In addition, it defines the term ‘optimizing analytics’ and illustrates the role of automatic optimization as a key technology in combination with data analytics. It also uses examples to explain the way that these technologies are currently being used in the automotive industry on the basis of the major subprocesses in the automotive value chain (development, procurement; logistics, production, marketing, sales and after-sales, connected customer). Since the industry is just starting to explore the broad range of potential uses for these technologies, visionary application examples are used to illustrate the revolutionary possibilities that they offer. Finally, the article demonstrates how these technologies can make the automotive industry more efficient and enhance its customer focus throughout all its operations and activities, extending from the product and its development process to the customers and their connection to the product.

CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training

We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph. We show that adversarial training can be used to learn a generative model with true observational and interventional distributions if the generator architecture is consistent with the given causal graph. We consider the application of generating faces based on given binary labels where the dependency structure between the labels is preserved with a causal graph. This problem can be seen as learning a causal implicit generative model for the image and labels. We devise a two-stage procedure for this problem. First we train a causal implicit generative model over binary labels using a neural network consistent with a causal graph as the generator. We empirically show that WassersteinGAN can be used to output discrete labels. Later, we propose two new conditional GAN architectures, which we call CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained causal implicit generative model for the labels is then a causal implicit generative model over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset.

Geometry of Information Integration

Information geometry is used to quantify the amount of information integration within multiple terminals of a causal dynamical system. Integrated information quantifies how much information is lost when a system is split into parts and information transmission between the parts is removed. Multiple measures have been proposed as a measure of integrated information. Here, we analyze four of the previously proposed measures and elucidate their relations from a viewpoint of information geometry. Two of them use dually flat manifolds and the other two use curved manifolds to define a split model. We show that there are hierarchical structures among the measures. We provide explicit expressions of these measures.

Translating Domain-Specific Expressions in Knowledge Bases with Neural Machine Translation

Our work presented in this paper focuses on the translation of domain-specific expressions represented in semantically structured resources, like ontologies or knowledge graphs. To make knowledge accessible beyond language borders, these resources need to be translated into different languages. The challenge of translating labels or terminological expressions represented in ontologies lies in the highly specific vocabulary and the lack of contextual information, which can guide a machine translation system to translate ambiguous words into the targeted domain. Due to the challenges, we train and translate the terminological expressions in the medial and financial domain with statistical as well as with neural machine translation methods. We evaluate the translation quality of domain-specific expressions with translation systems trained on a generic dataset and experiment domain adaptation with terminological expressions. Furthermore we perform experiments on the injection of external knowledge into the translation systems. Through these experiments, we observed a clear advantage in domain adaptation and terminology injection of NMT methods over SMT. Nevertheless, through the specific and unique terminological expressions, subword segmentation within NMT does not outperform a word based neural translation model.

Discrete-Time Statistical Inference for Multiscale Diffusions in the Averaging and Homogenization Regime

We study statistical inference for small-noise-perturbed multiscale dynamical systems under the assumption that we observe a single time series from the slow process. We study both averaging and homogenization regimes, constructing statistical estimators which we prove to be consistent, asymptotically normal (with explicit characterization of the limiting variance), and, in certain cases, asymptotically efficient. In the case of a fixed number of observations the proposed methods produce consistent and asymptotically normal estimates, making the results readily applicable. For high-frequency observations, we prove consistency and asymptotic normality under a condition restricting the rate at which the number of observations may grow vis-\`a-vis the separation of scales. The estimators are based on an appropriate misspecified model motivated by a second-order stochastic Taylor expansion of the slow component with respect to a function of the time-scale separation parameter. Numerical simulations illustrate the theoretical results.

Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis

Modern software systems provide many configuration options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space.

Basic Filters for Convolutional Neural Networks: Training or Design?

When convolutional neural networks are used to tackle learning problems based on time series, e.g., audio data, raw one-dimensional data are commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients, which are then used as input to the actual neural network. In this contribution, we investigate, both theoretically and experimentally, the influence of this pre-processing step on the network’s performance and pose the question, whether replacing it by applying adaptive or learned filters directly to the raw data, can improve learning success. The theoretical results show that approximately reproducing mel-spectrogram coefficients by applying adaptive filters and subsequent time-averaging is in principle possible. On the other hand, extensive experimental work leads to the conclusion, that the invariance induced by mel-spectrogram coefficients is both desirable and hard to infer by the learning process. Thus, the results achieved by adaptive end-to-end learning approaches are close to but slightly worse than results achieved by state-of-the-art reference architectures using standard input coefficients derived from the spectrogram.

Clustering and Model Selection via Penalized Likelihood for Different-sized Categorical Data Vectors

In this study, we consider unsupervised clustering of categorical vectors that can be of different size using mixture. We use likelihood maximization to estimate the parameters of the underlying mixture model and a penalization technique to select the number of mixture components. Regardless of the true distribution that generated the data, we show that an explicit penalty, known up to a multiplicative constant, leads to a non-asymptotic oracle inequality with the Kullback-Leibler divergence on the two sides of the inequality. This theoretical result is illustrated by a document clustering application. To this aim a novel robust expectation-maximization algorithm is proposed to estimate the mixture parameters that best represent the different topics. Slope heuristics are used to calibrate the penalty and to select a number of clusters.

Representation Learning for Visual-Relational Knowledge Graphs

Much progress has been made towards the goal of developing ML systems that are able to recognize and interpret visual scenes. With this paper, we propose query answering in visual-relational knowledge graphs (KGs) as a novel and important reasoning problem. A visual-relational KG is a KG whose entities are associated with image data. We introduce \textsc{ImageGraph}, a publicly available KG with 1330 relation types, 14,870 entities, and 829,931 images. Visual-relational KGs naturally lead to several novel query types treating images as first-class citizens. We approach the query answering problems by combining ideas from the areas of KG embedding learning and deep learning for computer vision. The resulting ML models can answer queries such as \textit{‘How are these two unseen images related to each other?’} We also explore a novel zero-shot learning scenario where an image of an entirely new entity is linked to entities of an existing visual-relational KG. An extensive set of experiments shows that the proposed deep neural networks are able to answer the visual-relational queries efficiently and accurately.

Feature selection in high-dimensional dataset using MapReduce

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.

A Deep Reinforcement Learning Chatbot

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-words models, sequence-to-sequence neural network and latent variable neural network models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than competing systems. Due to its machine learning architecture, the system is likely to improve with additional data.

Adaptive PCA for Time-Varying Data

In this paper, we present an online adaptive PCA algorithm that is able to compute the full dimensional eigenspace per new time-step of sequential data. The algorithm is based on a one-step update rule that considers all second order correlations between previous samples and the new time-step. Our algorithm has O(n) complexity per new time-step in its deterministic mode and O(1) complexity per new time-step in its stochastic mode. We test our algorithm on a number of time-varying datasets of different physical phenomena. Explained variance curves indicate that our technique provides an excellent approximation to the original eigenspace computed using standard PCA in batch mode. In addition, our experiments show that the stochastic mode, despite its much lower computational complexity, converges to the same eigenspace computed using the deterministic mode.

Feedback Synthesis for Controllable Underactuated Systems using Sequential Second Order Actions
‘Having 2 hours to write a paper is fun!’: Detecting Sarcasm in Numerical Portions of Text
Learning Dilation Factors for Semantic Segmentation of Street Scenes
Constructing Light Spanners Deterministically in Near-Linear Time
A Quasi-isometric Embedding Algorithm
Compression Driven Jamming of Athermal Frictionless Spherocylinders in Two Dimensions
Stabilizing Weighted Graphs
Coded Aperture Ptychography: Uniqueness and Reconstruction
Proof of Northshield’s conjecture concerning an analogue of Stern’s sequence for $\mathbb{Z}[\sqrt{2}]$
Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images
Regularly log-periodic functions and some applications
Koopman-based lifting techniques for nonlinear systems identification
Size scaling of failure strength with fat-tailed disorder in a fiber bundle model
On Fairness and Calibration
On conditional fault tolerance of hierarchical cubic networks
Image Splicing Localization Using A Multi-Task Fully Convolutional Network (MFCN)
Additive structures on $f$-vector sets of polytopes
Towards high-throughput 3D insect capture for species discovery and diagnostics
Detecting Patterns Can Be Hard: Circuit Lower Bounds for the String Matching Problem
Capturing natural-colour 3D models of insects for species discovery
On Distributed Linear Estimation With Observation Model Uncertainties
The Mating Rituals of Deep Neural Networks: Learning Compact Feature Representations through Sexual Evolutionary Synthesis
A remark on Gibbs-type measures for Hamiltonian PDE
Properties of Kinetic Transition Networks for Atomic Clusters and Glassy Solids
Focusing Attention: Towards Accurate Text Recognition in Natural Images
On the quasi-Latin hypercube property of some lattice-based designs
Ranking ideas for diversity and quality
Formulation of Deep Reinforcement Learning Architecture Toward Autonomous Driving for On-Ramp Merge
An Alternative Approach to Functional Linear Partial Quantile Regression
Lozenge tilings of a halved hexagon with an array of triangles removed from the boundary, part II
Deep Embedding Convolutional Neural Network for Synthesizing CT Image from T1-Weighted MR Image
Composition by Conversation
An unsupervised long short-term memory neural network for event detection in cell videos
A deep generative model for gene expression profiles from single-cell RNA sequencing
Sampling for approximating $R$-limited functions
Sharp Bounds for Generalized Uniformity Testing
Asynchronous COMID: the theoretic basis for sparse gradient tricks on Parameter Server
Probabilistic Analysis Based On Symbolic Game Semantics and Model Counting
On the Complexity of Model Checking for Syntactically Maximal Fragments of the Interval Temporal Logic HS with Regular Expressions
An Existence Theorem of Nash Equilibrium in Coq and Isabelle
Robust Exponential Worst Cases for Divide-et-Impera Algorithms for Parity Games
Dynamics and Coalitions in Sequential Games
ParaPlan: A Tool for Parallel Reachability Analysis of Planar Polygonal Differential Inclusion Systems
Smoothness of Flow and Path-by-Path Uniqueness in Stochastic Differential Equations
Document similarity measures can support semi-automated identification of unreported links between trial registrations and published reports
Metric methods for heteroclinic connections in infinite dimensional spaces
Efficient implementation and performance evaluation of the Wigner branching random walk
Integrating Specialized Classifiers Based on Continuous Time Markov Chain
Beyond 16GB: Out-of-Core Stencil Computations
Proceedings First Workshop on Formal Verification of Autonomous Vehicles
Data Aggregation and Packet Bundling of Uplink Small Packets for Monitoring Applications in LTE
Ruin probability for discrete risk processes
continuous time random walk as a random walk in a random environment
Rotational Subgroup Voting and Pose Clustering for Robust 3D Object Recognition
Improving Sonar Image Patch Matching via Deep Learning
Real-time convolutional networks for sonar image classification in low-power embedded systems
An iterative approximate method of solving boundary value problems using dual Bernstein polynomials
Functional Insights into Google AdWords
Wyner’s Common Information under Rényi Divergence Measures
Bayesian Optimisation for Safe Navigation under Localisation Uncertainty
On the stability and instability of finite dynamical systems with prescribed interaction graphs
Frameless ALOHA with Reliability-Latency Guarantees
Grant-Free Radio Access for Short-Packet Communications over 5G Networks
Structurally Parameterized $d$-Scattered Set
Extreme Value Estimates using Vibration Energy Harvesting
Approximating meta-heuristics with homotopic recurrent neural networks
Uniqueness of codes using semidefinite programming
The speed of a general random walk reinforced by its recent history
Enhancing KiWi – Scalable Concurrent Key-Value Map – Niv Gabso and Assaf Yifrach
FingerNet: An Unified Deep Network for Fingerprint Minutiae Extraction
Do physicists stop searches too early? A remote-science, optimization landscape investigation
RNN-based Early Cyber-Attack Detection for the Tennessee Eastman Process
Universality in Random Moment Problems
On the Divergence and Vorticity of Vector Ambit Fields
Leveraging Discourse Information Effectively for Authorship Attribution
Optical sensing with Anderson-localised light
Duality and free measures in vector spaces, the spectral theory of actions of non-locally compact groups
Extended Laplace Principle for Empirical Measures of a Markov Chain
Cynical Selection of Language Model Training Data
Monocular Navigation in Large Scale Dynamic Environments
The face of crystals: insightful classification using deep learning
Adaptive restart of accelerated gradient methods under local quadratic growth condition
A Tight Lower Bound for Counting Hamiltonian Cycles via Matrix Rank
Computing optimal experimental designs with respect to a compound Bayes risk criterion
An Efficient Calculation Method for the Expected Value of Sample Information: Can we do it? Yes, we can
Nearest Embedded and Embedding Self-Nested Trees
Optimal velocity control of a viscous Cahn-Hilliard system with convection and dynamic boundary conditions
Secure Full-Duplex Device-to-Device Communication
TIPS: Mining Top-K Locations to Minimize User-Inconvenience for Trajectory-Aware Services
Boltzmann-type models with uncertain binary interactions
Learning from lions: inferring the utility of agents from their trajectories
Stopping Times of Random Walks on a Hypercube
Transversal magnetoresistance and Shubnikov-de Haas oscillations in Weyl semimetals
Random Coin Tossing with unknown bias
Content Analysis of Items: A Statistical Approach
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume
Rank-Select Indices Without Tears