An Introductory Survey on Attention Mechanisms in NLP Problems

First derived from human intuition, later adapted to machine translation for automatic token alignment, attention mechanism, a simple method that can be used for encoding sequence data based on the importance score each element is assigned, has been widely applied to and attained significant improvement in various tasks in natural language processing, including sentiment classification, text summarization, question answering, dependency parsing, etc. In this paper, we survey through recent works and conduct an introductory summary of the attention mechanism in different NLP problems, aiming to provide our readers with basic knowledge on this widely used method, discuss its different variants for different tasks, explore its association with other techniques in machine learning, and examine methods for evaluating its performance.

Interactive dimensionality reduction using similarity projections

Recent advances in machine learning allow us to analyze and describe the content of high-dimensional data like text, audio, images or other signals. In order to visualize that data in 2D or 3D, usually Dimensionality Reduction (DR) techniques are employed. Most of these techniques, e.g., PCA or t-SNE, produce static projections without taking into account corrections from humans or other data exploration scenarios. In this work, we propose the interactive Similarity Projection (iSP), a novel interactive DR framework based on similarity embeddings, where we form a differentiable objective based on the user interactions and perform learning using gradient descent, with an end-to-end trainable architecture. Two interaction scenarios are evaluated. First, a common methodology in multidimensional projection is to project a subset of data, arrange them in classes or clusters, and project the rest unseen dataset based on that manipulation, in a kind of semi-supervised interpolation. We report results that outperform competitive baselines in a wide range of metrics and datasets. Second, we explore the scenario of manipulating some classes, while enriching the optimization with high-dimensional neighbor information. Apart from improving classification precision and clustering on images and text documents, the new emerging structure of the projection unveils semantic manifolds. For example, on the Head Pose dataset, by just dragging the faces looking far left to the left and those looking far right to the right, all faces are re-arranged on a continuum even on the vertical axis (face up and down). This end-to-end framework can be used for fast, visual semi-supervised learning, manifold exploration, interactive domain adaptation of neural embeddings and transfer learning.

Cross-lingual Short-text Matching with Deep Learning

The problem of short text matching is formulated as follows: given a pair of sentences or questions, a matching model determines whether the input pair mean the same or not. Models that can automatically identify questions with the same meaning have a wide range of applications in question answering sites and modern chatbots. In this article, we describe the approach by team hahu to solve this problem in the context of the ‘CIKM AnalytiCup 2018 – Cross-lingual Short-text Matching of Question Pairs’ that is sponsored by Alibaba. Our solution is an end-to-end system based on current advances in deep learning which avoids heavy feature-engineering and achieves improved performance over traditional machine-learning approaches. The log-loss scores for the first and second rounds of the contest are 0.35 and 0.39 respectively. The team was ranked 7th from 1027 teams in the overall ranking scheme by the organizers that consisted of the two contest scores as well as: innovation and system integrity, understanding data as well as practicality of the solution for business.

Aequitas: A Bias and Fairness Audit Toolkit

Recent work has raised concerns on the risk of unintended bias in algorithmic decision making systems being used nowadays that can affect individuals unfairly based on race, gender or religion, among other possible characteristics. While a lot of bias metrics and fairness definitions have been proposed in recent years, there is no consensus on which metric/definition should be used and there are very few available resources to operationalize them. Therefore, despite recent awareness, auditing for bias and fairness when developing and deploying algorithmic decision making systems is not yet a standard practice. We present Aequitas, an open source bias and fairness audit toolkit that is an intuitive and easy to use addition to the machine learning workflow, enabling users to seamlessly test models for several bias and fairness metrics in relation to multiple population sub-groups. We believe Aequitas will facilitate informed and equitable decisions around developing and deploying algorithmic decision making systems for both data scientists, machine learning researchers and policymakers.

Emergence of Addictive Behaviors in Reinforcement Learning Agents

This paper presents a novel approach to the technical analysis of wireheading in intelligent agents. Inspired by the natural analogues of wireheading and their prevalent manifestations, we propose the modeling of such phenomenon in Reinforcement Learning (RL) agents as psychological disorders. In a preliminary step towards evaluating this proposal, we study the feasibility and dynamics of emergent addictive policies in Q-learning agents in the tractable environment of the game of Snake. We consider a slightly modified settings for this game, in which the environment provides a ‘drug’ seed alongside the original ‘healthy’ seed for the consumption of the snake. We adopt and extend an RL-based model of natural addiction to Q-learning agents in this settings, and derive sufficient parametric conditions for the emergence of addictive behaviors in such agents. Furthermore, we evaluate our theoretical analysis with three sets of simulation-based experiments. The results demonstrate the feasibility of addictive wireheading in RL agents, and provide promising venues of further research on the psychopathological modeling of complex AI safety problems.

Controllability, Multiplexing, and Transfer Learning in Networks using Evolutionary Learning

Networks are fundamental building blocks for representing data, and computations. Remarkable progress in learning in structurally defined (shallow or deep) networks has recently been achieved. Here we introduce evolutionary exploratory search and learning method of topologically flexible networks under the constraint of producing elementary computational steady-state input-output operations. Our results include; (1) the identification of networks, over four orders of magnitude, implementing computation of steady-state input-output functions, such as a band-pass filter, a threshold function, and an inverse band-pass function. Next, (2) the learned networks are technically controllable as only a small number of driver nodes are required to move the system to a new state. Furthermore, we find that the fraction of required driver nodes is constant during evolutionary learning, suggesting a stable system design. (3), our framework allows multiplexing of different computations using the same network. For example, using a binary representation of the inputs, the network can readily compute three different input-output functions. Finally, (4) the proposed evolutionary learning demonstrates transfer learning. If the system learns one function A, then learning B requires on average less number of steps as compared to learning B from tabula rasa. We conclude that the constrained evolutionary learning produces large robust controllable circuits, capable of multiplexing and transfer learning. Our study suggests that network-based computations of steady-state functions, representing either cellular modules of cell-to-cell communication networks or internal molecular circuits communicating within a cell, could be a powerful model for biologically inspired computing. This complements conceptualizations such as attractor based models, or reservoir computing.

An Introduction to Fuzzy & Annotated Semantic Web Languages

We present the state of the art in representing and reasoning with fuzzy knowledge in Semantic Web Languages such as triple languages RDF/RDFS, conceptual languages of the OWL 2 family and rule languages. We further show how one may generalise them to so-called annotation domains, that cover also e.g. temporal and provenance extensions.

Deep Bayesian Inversion

Characterizing statistical properties of solutions of inverse problems is essential for decision making. Bayesian inversion offers a tractable framework for this purpose, but current approaches are computationally unfeasible for most realistic imaging applications in the clinic. We introduce two novel deep learning based methods for solving large-scale inverse problems using Bayesian inversion: a sampling based method using a WGAN with a novel mini-discriminator and a direct approach that trains a neural network using a novel loss function. The performance of both methods is demonstrated on image reconstruction in ultra low dose 3D helical CT. We compute the posterior mean and standard deviation of the 3D images followed by a hypothesis test to assess whether a ‘dark spot’ in the liver of a cancer stricken patient is present. Both methods are computationally efficient and our evaluation shows very promising performance that clearly supports the claim that Bayesian inversion is usable for 3D imaging in time critical applications.

Multiscale change point detection for dependent data

In this paper we study the theoretical properties of the simultaneous multiscale change point estimator (SMUCE) proposed by Frick et al. (2014) in regression models with dependent error processes. Empirical studies show that in this case the change point estimate is inconsistent, but it is not known if alternatives suggested in the literature for correlated data are consistent. We propose a modification of SMUCE scaling the basic statistic by the long run variance of the error process, which is estimated by a difference-type variance estimator calculated from local means from different blocks. For this modification we prove model consistency for physical dependent error processes and illustrate the finite sample performance by means of a simulation study.

Age of Information Scaling in Large Networks

We study age of information in a multiple source-multiple destination setting with a focus on its scaling in large wireless networks. There are n nodes that are randomly paired with each other on a fixed area to form n source-destination (S-D) pairs. We propose a three-phase transmission scheme which utilizes local cooperation between the nodes by forming what we call mega update packets to serve multiple S-D pairs at once. We show that under the proposed scheme average age of an S-D pair scales as O(n^{\frac{1}{4}}) as the number of users, n, in the network grows. To the best of our knowledge, this is the best age scaling result for a multiple source-multiple destination setting.

Composing Modeling and Inference Operations with Probabilistic Program Combinators

Probabilistic programs with dynamic computation graphs can define measures over sample spaces with unbounded dimensionality, and thereby constitute programmatic analogues to Bayesian nonparametrics. Owing to the generality of this model class, inference relies on ‘black-box’ Monte Carlo methods that are generally not able to take advantage of conditional independence and exchangeability, which have historically been the cornerstones of efficient inference. We here seek to develop a ‘middle ground’ between probabilistic models with fully dynamic and fully static computation graphs. To this end, we introduce a combinator library for the Probabilistic Torch framework. Combinators are functions that accept models and return transformed models. We assume that models are dynamic, but that model composition is static, in the sense that combinator application takes place prior to evaluating the model on data. Combinators provide primitives for both model and inference composition. Model combinators take the form of classic functional programming constructs such as map and reduce. These constructs define a computation graph at a coarsened level of representation, in which nodes correspond to models, rather than individual variables. Inference combinators – such as enumeration, importance resampling, and Markov Chain Monte Carlo operators – assume a sampling semantics for model evaluation, in which application of combinators preserves proper weighting. Owing to this property, models defined using combinators can be trained using stochastic methods that optimize either variational or wake-sleep style objectives. As a validation of this principle, we use combinators to implement black box inference for hidden Markov models.

Char2char Generation with Reranking for the E2E NLG Challenge
Parser Extraction of Triples in Unstructured Text
Construction of an algebra corresponding to a statistical model of the square ladder (square lattice with two lines)
Internal Wiring of Cartesian Verbs and Prepositions
Native Language Identification using i-vector
Non-intrusive model reduction of static parametric non-linear systems and application to global optimization and uncertainty quantification
Conformal Bootstrap Analysis for Localization: Symplectic Case
Towards Neural Machine Translation for African Languages
Jointly identifying opinion mining elements and fuzzy measurement of opinion intensity to analyze product features
Learning to Compensate Photovoltaic Power Fluctuations from Images of the Sky by Imitating an Optimal Policy
Two-stream convolutional networks for end-to-end learning of self-driving cars
Few-shot Learning for Named Entity Recognition in Medical Text
Optimal Scalar Linear Index Codes for Symmetric and Neighboring Side-information Problems
ML-Net: multi-label classification of biomedical texts with deep neural networks
On the number of sets with a given doubling constant
ROMAN: Reduced-Order Modeling with Artificial Neurons
Sampling from manifold-restricted distributions using tangent bundle projections
Phase transition for the frog model on biregular trees
Novel Inter-file Coded Placement and D2D Delivery for a Cache-aided Fog-RAN Architecture
A combinatorial $\mathfrak{sl}_2$-action and the Sperner property for the weak order
Random periodic solutions and ergodicity for stochastic differential equations
Many cusped hyperbolic 3-manifolds do not bound geometrically
A geometric study of Strassen’s asymptotic rank conjecture and its variants
Evaluating GANs via Duality
A survey on graphs with convex quadratic stability number
Wavelet Based Dictionaries for Dimensionality Reduction of ECG Signals
Deep Q learning for fooling neural networks
Semi-dual Regularized Optimal Transport
Heuristic Voting as Ordinal Dominance Strategies
Towards Characterising Bayesian Network Models under Selection
Robust Dynamic CPU Resource Provisioning in Virtualized Servers
Staging Human-computer Dialogs: An Application of the Futamura Projections
Data Driven Governing Equations Approximation Using Deep Neural Networks
Discourse in Multimedia: A Case Study in Information Extraction
An Analysis of the Semantic Annotation Task on the Linked Data Cloud
Neural Wavetable: a playable wavetable synthesizer using neural networks
Corpus Phonetics Tutorial
Identification of semiparametric discrete outcome models with bounded covariates
What is really needed to justify ignoring the response mechanism for modelling purposes?
A New SVDD-Based Multivariate Non-parametric Process Capability Index
Text Assisted Insight Ranking Using Context-Aware Memory Network
Region-Referenced Spectral Power Dynamics of EEG Signals: A Hierarchical Modeling Approach
Estimation of High-Dimensional Seemingly Unrelated Regression Models
Consensus and Sectioning-based ADMM with Norm-1 Regularization for Imaging with a Compressive Reflector Antenna
An Overview of Semiparametric Extensions of Finite Mixture Models
Improving constant in end-point Poincaré inequality on Hamming cube
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers
TrolleyMod v1.0: An Open-Source Simulation and Data-Collection Platform for Ethical Decision Making in Autonomous Vehicles
Transform Methods for Heavy-Traffic Analysis
Extractive Summary as Discrete Latent Variables
Boundary Braids
Fokker-Planck equations for nonlinear dynamical systems driven by multiplicative $α$-stable Lévy motions
Central limit theorem and moderate deviations for a class of semilinear SPDES
Bayesian Reinforcement Learning in Factored POMDPs
SepNE: Bringing Separability to Network Embedding
Improving Distantly Supervised Relation Extraction with Neural Noise Converter and Conditional Optimal Selector
Style and Content Disentanglement in Generative Adversarial Networks
A Game Theoretic Approach for Dynamic Information Flow Tracking to Detect Multi-Stage Advanced Persistent Threats
Multi-Winner Contests for Strategic Diffusion in Social Networks
How Drones Look: Crowdsourced Knowledge Transfer for Aerial Video Saliency Prediction
A framework for covert and secret key expansion over quantum channels
Memory-Efficient Quantum Circuit Simulation by Using Lossy Data Compression
Translating a Math Word Problem to an Expression Tree
On the Capacity of MISO Channels with One-Bit ADCs and DACs
Gaussian Reciprocal Sequences from the Viewpoint of Conditionally Markov Sequences
Dropping Symmetry for Fast Symmetric Nonnegative Matrix Factorization
Fast Distribution Grid Line Outage Identification with $μ$PMU
Analysis of Gaussian Spatial Models with Covariate Measurement Error
Submodular Optimization Over Streams with Inhomogeneous Decays
Sample complexity of partition identification using multi-armed bandits
MT-CGCNN: Integrating Crystal Graph Convolutional Neural Network with Multitask Learning for Material Property Prediction
Cutting resilient networks — complete binary trees
Saddlepoint adjusted inversion of characteristic functions
Off-grid Variational Bayesian Inference of Line Spectral Estimation from One-bit Samples
Modeling Coherence for Discourse Neural Machine Translation
Layout Design for Intelligent Warehouse by Evolution with Fitness Approximation
Melodic Phrase Segmentation By Deep Neural Networks
Leveraging Aspect Phrase Embeddings for Cross-Domain Review Rating Prediction
Efficient and Scalable Multi-task Regression on Massive Number of Tasks
Generating Multiple Diverse Responses for Short-Text Conversation
Preventive Equipment Repair Planning Model
A Radiomics Approach to Traumatic Brain Injury Prediction in CT Scans
Plan-And-Write: Towards Better Automatic Storytelling
AMGCL: an Efficient, Flexible, and Extensible Algebraic Multigrid Implementation
An analysis of a fair division protocol for drawing legislative districts
Plateau Polycubes and Lateral Area
From Free Text to Clusters of Content in Health Records: An Unsupervised Graph Partitioning Approach
A Deterministic Algorithm for Bridging Anaphora Resolution
Universal Polarization for Processes with Memory
Acyclic subgraphs with high chromatic number
Lattice paths and submonoids of $\mathbb Z^2$
Optimal stopping of Brownian motion with broken drift
Neural Based Statement Classification for Biased Language
Stochastic Algorithmic Differentiation of (Expectations of) Discontinuous Functions (Indicator Functions)
Measuring Road Network Topology Vulnerability by Ricci Curvature
Rice-Marlin Codes: Tiny and Efficient Variable-to-Fixed Codes
Space-time localisation for the dynamic $Φ^4_3$ model
SLIM: Simultaneous Logic-in-Memory Computing Exploiting Bilayer Analog OxRAM Devices
LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks
Creatures great and SMAL: Recovering the shape and motion of animals from video
Robustness of spectral methods for community detection
Groups with few maximal sum-free sets
ProstateGAN: Mitigating Data Bias via Prostate Diffusion Imaging Synthesis with Generative Adversarial Networks
Distortion Robust Image Classification with Deep Convolutional Neural Network based on Discrete Cosine Transform
Statistical post-processing of dual-resolution ensemble forecasts
Revisiting Projection-Free Optimization for Strongly Convex Constraint Sets
Time-Varying Isotropic Vector Random Fields on Compact Two-Point Homogeneous Spaces
A Learning-Based Framework for Line-Spectra Super-resolution
A combinatorial classification of 2-regular simple modules for Nakayama algebras
A structural characterization of tree-based phylogenetic networks
Drop-Activation: Implicit Parameter Reduction and Harmonic Regularization
Predicting the time-evolution of multi-physics systems with sequence-to-sequence models
Robust low-rank multilinear tensor approximation for a joint estimation of the multilinear rank and the loading matrices
Pitfalls of Graph Neural Network Evaluation
Large-scale Interactive Recommendation with Tree-structured Policy Gradient
Design of Spectrally Shaped Binary Sequences via Randomized Convex Relaxation
Experimental 3D Coherent Diffractive Imaging from photon-sparse random projections
Dependency Grammar Induction with a Neural Variational Transition-based Parser
Data-Enabled Predictive Control: In the Shallows of the DeePC
Reduced Order Controller Design for Robust Output Regulation of Parabolic Systems
Development of Real-time ADAS Object Detector for Deployment on CPU
QUENN: QUantization Engine for low-power Neural Networks
A Simulated Cyberattack on Twitter: Assessing Partisan Vulnerability to Spear Phishing and Disinformation ahead of the 2018 U.S. Midterm Elections
The ADAPT System Description for the IWSLT 2018 Basque to English Translation Task
The Greedy Dirichlet Process Filter – An Online Clustering Multi-Target Tracker
Structural and temporal heterogeneities on networks
Applications of mesoscopic CLTs in random matrix theory
A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest
Matrix rigidity and the ill-posedness of Robust PCA and matrix completion
Bandana: Using Non-volatile Memory for Storing Deep Learning Models
SCORE+ for Network Community Detection
Evolving intrinsic motivations for altruistic behavior
Streaming Network Embedding through Local Actions
Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems
Opinion dynamics with Lotka-Volterra type interactions
Domain Randomization for Scene-Specific Car Detection and Pose Estimation
Virtual Net: a Decentralized Architecture for Interaction in Mobile Virtual Worlds
Mayall: A Framework for Desktop JavaScript Auditing and Post-Exploitation Analysis
EdgeBench: Benchmarking Edge Computing Platforms
Jointly Learning to Label Sentences and Tokens
The exchange-driven growth model: basic properties and longtime behavior
Pulse radar with FPGA range compression for real time displacement and vibration monitoring
Strong Feller property for SDEs driven by multiplicative cylindrical stable noise
Tower Cranes and Supply Points Locating Problem Using CBO, ECBO, and VPS
No-Frills Human-Object Interaction Detection: Factorization, Appearance and Layout Encodings, and Training Techniques
Geometry of Gaussian free field sign clusters and random interlacements