Focusing on What is Relevant: Time-Series Learning and Understanding using Attention

This paper is a contribution towards interpretability of the deep learning models in different applications of time-series. We propose a temporal attention layer that is capable of selecting the relevant information to perform various tasks, including data completion, key-frame detection and classification. The method uses the whole input sequence to calculate an attention value for each time step. This results in more focused attention values and more plausible visualisation than previous methods. We apply the proposed method to three different tasks. Experimental results show that the proposed network produces comparable results to a state of the art. In addition, the network provides better interpretability of the decision, that is, it generates more significant attention weight to related frames compared to similar techniques attempted in the past.

A configuration model for correlation matrices

Correlation matrices are a major type of multivariate data. To examine properties of a given correlation matrix, a common practice is to compare the same quantity between the original correlation matrix and reference correlation matrices, such as those derived from random matrix theory, that partially preserve properties of the original matrix. We propose a model to generate such reference correlation and covariance matrices for the given matrix. Correlation matrices are often analysed as networks, which are heterogeneous across nodes in terms of the total connectivity to other nodes for each node. Given this background, the present algorithm generates random networks that preserve the expectation of total connectivity of each node to other nodes, akin to configuration models for conventional networks. Our algorithm is derived from the maximum entropy principle. We will apply the proposed algorithm to measurement of clustering coefficients and community detection, both of which require a null model to assess the statistical significance of the obtained results.

Data-driven Spatiotemporal Modal Decomposition for Time Frequency Analysis

We propose a new solution to the blind source separation problem that factors mixed time-series signals into a sum of spatiotemporal modes, with the constraint that the temporal components are intrinsic mode functions (IMF’s). The key motivation is that IMF’s allow the computation of meaningful Hilbert transforms of non-stationary data, from which instantaneous time-frequency representations may be derived. Our spatiotemporal intrinsic mode decomposition (STIMD) method leverages spatial correlations to generalize the extraction of IMF’s from one-dimensional signals, commonly performed using the empirical mode decomposition (EMD), to multi-dimensional signals. Further, this data-driven method enables future-state prediction. We demonstrate STIMD on several synthetic examples, comparing it to common matrix factorization techniques, namely singular value decomposition (SVD), independent component analysis (ICA), and dynamic mode decomposition (DMD). We show that STIMD outperforms these methods at reconstruction and extracting interpretable modes. Next, we apply STIMD to analyze two real-world datasets, gravitational wave data and neural recordings from the rodent hippocampus.

Visualizing and Understanding Deep Neural Networks in CTR Prediction

Although deep learning techniques have been successfully applied to many tasks, interpreting deep neural network models is still a big challenge to us. Recently, many works have been done on visualizing and analyzing the mechanism of deep neural networks in the areas of image processing and natural language processing. In this paper, we present our approaches to visualize and understand deep neural networks for a very important commercial task–CTR (Click-through rate) prediction. We conduct experiments on the productive data from our online advertising system with daily varying distribution. To understand the mechanism and the performance of the model, we inspect the model’s inner status at neuron level. Also, a probe approach is implemented to measure the layer-wise performance of the model. Moreover, to measure the influence from the input features, we calculate saliency scores based on the back-propagated gradients. Practical applications are also discussed, for example, in understanding, monitoring, diagnosing and refining models and algorithms.

Deploying Deep Neural Networks in the Embedded Space

Recently, Deep Neural Networks (DNNs) have emerged as the dominant model across various AI applications. In the era of IoT and mobile systems, the efficient deployment of DNNs on embedded platforms is vital to enable the development of intelligent applications. This paper summarises our recent work on the optimised mapping of DNNs on embedded settings. By covering such diverse topics as DNN-to-accelerator toolflows, high-throughput cascaded classifiers and domain-specific model design, the presented set of works aim to enable the deployment of sophisticated deep learning models on cutting-edge mobile and embedded systems.

Learning Qualitatively Diverse and Interpretable Rules for Classification

There has been growing interest in developing accurate models that can also be explained to humans. Unfortunately, if there exist multiple distinct but accurate models for some dataset, current machine learning methods are unlikely to find them: standard techniques will likely recover a complex model that combines them. In this work, we introduce a way to identify a maximal set of distinct but accurate models for a dataset. We demonstrate empirically that, in situations where the data supports multiple accurate classifiers, we tend to recover simpler, more interpretable classifiers rather than more complex ones.

Jack the Reader – A Machine Reading Framework

Many Machine Reading and Natural Language Understanding tasks require reading supporting text in order to answer questions. For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions. Providing a set of useful primitives operating in a single framework of related tasks would allow for expressive modelling, and easier model comparison and replication. To that end, we present Jack the Reader (Jack), a framework for Machine Reading that allows for quick model prototyping by component reuse, evaluation of new models on existing datasets as well as integrating new datasets and applying them on a growing set of implemented baseline models. Jack is currently supporting (but not limited to) three tasks: Question Answering, Natural Language Inference, and Link Prediction. It is developed with the aim of increasing research efficiency and code reuse.

Deep Orthogonal Representations: Fundamental Properties and Applications

Several representation learning and, more broadly, dimensionality reduction techniques seek to produce representations of the data that are orthogonal (uncorrelated). Examples include PCA, CCA, Kernel/Deep CCA, the ACE algorithm and correspondence analysis (CA). For a fixed data distribution, all finite variance representations belong to the same function space regardless of how they are derived. In this work, we present a theoretical framework for analyzing this function space, and demonstrate how a basis for this space can be found using neural networks. We show that this framework (i) underlies recent multi-view representation learning methods, (ii) enables classical exploratory statistical techniques such as CA to be scaled via neural networks, and (iii) can be used to derive new methods for comparing black-box models. We illustrate these applications empirically through different datasets.

Probabilistic Natural Language Generation with Wasserstein Autoencoders

Probabilistic generation of natural language sentences is an important task in NLP. Existing models such as variational autoencoders (VAE) for sequence generation are extremely difficult to train due to the issues associated with the Kullback-Leibler (KL) loss collapsing to zero. One has to implement various heuristics such as KL weight annealing and word dropout in a carefully engineered manner to successfully train a text VAE. In this paper, we propose the use of Wasserstein autoencoders (WAE) for probabilistic natural language sentence generation. We show that sequence-to-sequence WAEs are more robust towards hyperparameters and can be trained in a straightforward manner without the need for any weight annealing. Empirical evidence shows that the latent space learned by WAEs exhibits properties of continuity and smoothness as in VAEs, while simultaneously achieving much higher BLEU scores for sentence reconstruction.

Analysis of Evolutionary Algorithms in Dynamic and Stochastic Environments

Many real-world optimization problems occur in environments that change dynamically or involve stochastic components. Evolutionary algorithms and other bio-inspired algorithms have been widely applied to dynamic and stochastic problems. This survey gives an overview of major theoretical developments in the area of runtime analysis for these problems. We review recent theoretical studies of evolutionary algorithms and ant colony optimization for problems where the objective functions or the constraints change over time. Furthermore, we consider stochastic problems under various noise models and point out some directions for future research.

Learning-to-Ask: Knowledge Acquisition via 20 Questions

Almost all the knowledge empowered applications rely upon accurate knowledge, which has to be either collected manually with high cost, or extracted automatically with unignorable errors. In this paper, we study 20 Questions, an online interactive game where each question-response pair corresponds to a fact of the target entity, to acquire highly accurate knowledge effectively with nearly zero labor cost. Knowledge acquisition via 20 Questions predominantly presents two challenges to the intelligent agent playing games with human players. The first one is to seek enough information and identify the target entity with as few questions as possible, while the second one is to leverage the remaining questioning opportunities to acquire valuable knowledge effectively, both of which count on good questioning strategies. To address these challenges, we propose the Learning-to-Ask (LA) framework, within which the agent learns smart questioning strategies for information seeking and knowledge acquisition by means of deep reinforcement learning and generalized matrix factorization respectively. In addition, a Bayesian approach to represent knowledge is adopted to ensure robustness to noisy user responses. Simulating experiments on real data show that LA is able to equip the agent with effective questioning strategies, which result in high winning rates and rapid knowledge acquisition. Moreover, the questioning strategies for information seeking and knowledge acquisition boost the performance of each other, allowing the agent to start with a relatively small knowledge set and quickly improve its knowledge base in the absence of constant human supervision.

Towards safe deep learning: accurately quantifying biomarker uncertainty in neural network predictions

Automated medical image segmentation, specifically using deep learning, has shown outstanding performance in semantic segmentation tasks. However, these methods rarely quantify their uncertainty, which may lead to errors in downstream analysis. In this work we propose to use Bayesian neural networks to quantify uncertainty within the domain of semantic segmentation. We also propose a method to convert voxel-wise segmentation uncertainty into volumetric uncertainty, and calibrate the accuracy and reliability of confidence intervals of derived measurements. When applied to a tumour volume estimation application, we demonstrate that by using such modelling of uncertainty, deep learning systems can be made to report volume estimates with well-calibrated error-bars, making them safer for clinical use. We also show that the uncertainty estimates extrapolate to unseen data, and that the confidence intervals are robust in the presence of artificial noise. This could be used to provide a form of quality control and quality assurance, and may permit further adoption of deep learning tools in the clinic.

On the Spectral Bias of Deep Neural Networks

It is well known that over-parametrized deep neural networks (DNNs) are an overly expressive class of functions that can memorize even random data with 100\% training accuracy. This raises the question why they do not easily overfit real data. To answer this question, we study deep networks using Fourier analysis. We show that deep networks with finite weights (or trained for finite number of steps) are inherently biased towards representing smooth functions over the input space. Specifically, the magnitude of a particular frequency component (k) of deep ReLU network function decays at least as fast as \mathcal{O}(k^{-2}), with width and depth helping polynomially and exponentially (respectively) in modeling higher frequencies. This shows for instance why DNNs cannot perfectly \textit{memorize} peaky delta-like functions. We also show that DNNs can exploit the geometry of low dimensional data manifolds to approximate complex functions that exist along the manifold with simple functions when seen with respect to the input space. As a consequence, we find that all samples (including adversarial samples) classified by a network to belong to a certain class are connected by a path such that the prediction of the network along that path does not change. Finally we find that DNN parameters corresponding to functions with higher frequency components occupy a smaller volume in the parameter.

Combination of Domain Knowledge and Deep Learning for Sentiment Analysis

The emerging technique of deep learning has been widely applied in many different areas. However, when adopted in a certain specific domain, this technique should be combined with domain knowledge to improve efficiency and accuracy. In particular, when analyzing the applications of deep learning in sentiment analysis, we found that the current approaches are suffering from the following drawbacks: (i) the existing works have not paid much attention to the importance of different types of sentiment terms, which is an important concept in this area; and (ii) the loss function currently employed does not well reflect the degree of error of sentiment misclassification. To overcome such problem, we propose to combine domain knowledge with deep learning. Our proposal includes using sentiment scores, learnt by regression, to augment training data; and introducing penalty matrix for enhancing the loss function of cross entropy. When experimented, we achieved a significant improvement in classification results.

Simulation Study on a New Peer Review Approach
Matrix Completion and Performance Guarantees for Single Individual Haplotyping
Francy – An Interactive Discrete Mathematics Framework for GAP
DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity through Unified Recurrent and Convolutional Neural Networks
Addressing Class Imbalance in Classification Problems of Noisy Signals by using Fourier Transform Surrogates
Bounds on current fluctuations in periodically driven systems
The Natural Language Decathlon: Multitask Learning as Question Answering
Team dynamics during the delivery of a large-scale, engineered system
Nonparametric Inference for Location Parameters of Veronese Whitney means and antimeans on Kendall Shape Spaces
Is the 1-norm the best convex sparse regularization
Learning to Rank from Samples of Variable Quality
Proving Linearizability Using Reduction
Quasiconvex risk measures on variable exponent Bochner-Lebesgue spaces
Determining water mass flow control strategies for a turbocharged SI engine using a two-stage calculation method
Mining Gravitational-wave Catalogs To Understand Binary Stellar Evolution: A New Hierarchical Bayesian Framework
Solving the Buyer and Seller’s Dilemma: A Dual-Deposit Escrow Smart Contract for Provably Cheat-Proof Delivery and Payment for a Digital Good without a Trusted Mediator
Novel Selectivity Estimation Strategy for Modern DBMS
Critical slowing down associated with critical transition and risk of collapse in cryptocurrency
Using Nonlinear Normal Modes for Execution of Efficient Cyclic Motions in Soft Robots
On the weight distribution of random binary linear codes
Stanley’s non-Ehrhart-positive order polytopes
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
Explicit formulae for all higher order exponential lacunary generating functions of Hermite polynomials
A pattern search bound constrained optimization method with a nonmonotone line search strategy
Sparse Sensing with Semi-Coprime Arrays
Emulating the coherent Ising machine with a mean-field algorithm
S-maup: Statistic test to measure the sensitivity to the Modifiable Areal Unit Problem
Star Shape Prior in Fully Convolutional Networks for Skin Lesion Segmentation
Truncation Error Estimation in the p-Anisotropic Discontinuous Galerkin Spectral Element Method
What Makes An Asset Useful
Quantitative quenched Voronoi percolation and applications
Stochastic Persistence
Annealed scaling relations for Voronoi percolation
Bayesian hierarchical models for SNP discovery from genome-wide association studies, a semi-supervised machine learning approach
Hypothesis testing near singularities and boundaries
TriResNet: A Deep Triple-stream Residual Network for Histopathology Grading
A Novel ECOC Algorithm with Centroid Distance Based Soft Coding Scheme
Paragraph-based complex networks: application to document classification and authenticity verification
Personalized Thread Recommendation for MOOC Discussion Forums
Finite-time stability and stabilization of linear discrete time-varying stochastic systems
Learning a High Fidelity Pose Invariant Model for High-resolution Face Frontalization
Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning
Video Inpainting by Jointly Learning Temporal Structure and Spatial Details
Shape-from-Mask: A Deep Learning Based Human Body Shape Reconstruction from Binary Mask Images
Optimal Design of Virtual Inertia and Damping Coefficients for Virtual Synchronous Machines
On the range of lattice models in high dimensions – extended version
Visual-Inertial Object Detection and Mapping
Global Semantic Consistency for Zero-Shot Learning
Grouped Mixture of Regressions
Virtual Codec Supervised Re-Sampling Network for Image Compression
Efficient Semantic Segmentation using Gradual Grouping
Upgrade of the Analog Integrator for EAST Device
Second order stochastic target problems with generalized market impact
Distributed Average Consensus under Quantized Communication via Event-Triggered Mass Summation
Subgradient-Free Stochastic Optimization Algorithm for Non-smooth Convex Functions over Time-Varying Networks
Removing the Curse of Superefficiency: an Effective Strategy For Distributed Computing in Isotonic Regression
Game AI Research with Fast Planet Wars Variants
Basic invariants of the Hopf monoid of hypergraphs and its sub-monoids
New Exact Algorithm and Solution Properties for the Vehicle Routing Problem with Stochastic Demands
Multivariable Iterative Learning Control Design Procedures: from Decentralized to Centralized, Illustrated on an Industrial Printer
The Temporal Singularity: time-accelerated simulated civilizations and their implications
Deep Spectral Convolution Network for HyperSpectral Unmixing
An accurate retrieval through R-MAC+ descriptors for landmark recognition
Coloring hypergraphs of low connectivity
Continuous Learning in Single-Incremental-Task Scenarios
Point cloud segmentation using hierarchical tree for architectural models
A new look at the interfaces in percolation
Tensor Monte Carlo: particle methods for the GPU era
KinshipGAN: Synthesizing of Kinship Faces From Family Photos by Regularizing a Deep Face Network
On generalized ARCH model with stationary liquidity
Ad-Net: Audio-Visual Convolutional Neural Network for Advertisement Detection In Videos
Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions
On the Robustness and Scalability of Semidefinite Relaxation for Optimal Power Flow Problems
Weakly Supervised Training of Speaker Identification Models
Cross-layer framework and optimization for efficient use of the energy budget of IoT Nodes
Computation over Wide-Band MAC: Improved Achievable Rate through Sub-Function Allocation
A probabilistic atlas of the human thalamic nuclei combining ex vivo MRI and histology
Compact Deep Neural Networks for Computationally Efficient Gesture Classification From Electromyography Signals
Optimal size of linear matrix inequalities in semidefinite approaches to polynomial optimization
Acyclicity in finite groups and groupoids
Graph-counting polynomials for oriented graphs
Variational Bi-domain Triplet Autoencoder
Replica Symmetry and Replica Symmetry Breaking for the Traveling Salesperson Problem
A Predictive Model for Music Based on Learned Interval Representations
Generic Unlabeled Global Rigidity
Improved bounds for multipass pairing heaps and path-balanced binary search trees
To Skip or to Switch Minimizing Age of Information under Link Capacity Constraint
Quantum computing cryptography: Unveiling cryptographic Boolean functions with quantum annealing
Schelling Segregation with Strategic Agents
Fully Connected Networks and Generative Neural Networks Applied to Sclera Segmentation
Keypoint Transfer for Fast Whole-Body Segmentation
Theorems of Carathéodory, Helly, and Tverberg without dimension
POMDP Structural Results for Controlled Sensing using Lehmann Precision
Scalable Simple Linear Iterative Clustering (SSLIC) Using a Generic and Parallel Approach
A Universal Hypercomputer
Persistent Hidden States and Nonlinear Transformation for Long Short-Term Memory
Fluctuations for linear eigenvalue statistics of sample covariance random matrices
Perfect 3-Colorings on 4-Regular Graph of Order 8
Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation
Synchronization of Singularly Perturbed Systems with Time Scales
Election Score Can Be Harder Than Winner
Learning Traffic Flow Dynamics using Random Fields
Wireless Channel Dynamics and Robustness for Ultra-Reliable Low-Latency Communications
Quantum Codes from Neural Networks
Finding Local Minima via Stochastic Nested Variance Reduction