Minimal I-MAP MCMC for Scalable Structure Discovery in Causal DAG Models

Learning a Bayesian network (BN) from data can be useful for decision-making or discovering causal relationships. However, traditional methods often fail in modern applications, which exhibit a larger number of observed variables than data points. The resulting uncertainty about the underlying network as well as the desire to incorporate prior information recommend a Bayesian approach to learning the BN, but the highly combinatorial structure of BNs poses a striking challenge for inference. The current state-of-the-art methods such as order MCMC are faster than previous methods but prevent the use of many natural structural priors and still have running time exponential in the maximum indegree of the true directed acyclic graph (DAG) of the BN. We here propose an alternative posterior approximation based on the observation that, if we incorporate empirical conditional independence tests, we can focus on a high-probability DAG associated with each order of the vertices. We show that our method allows the desired flexibility in prior specification, removes timing dependence on the maximum indegree and yields provably good posterior approximations; in addition, we show that it achieves superior accuracy, scalability, and sampler mixing on several datasets.

Event Correlation and Forecasting over Multivariate Streaming Sensor Data

Event management in sensor networks is a multidisciplinary field involving several steps across the processing chain. In this paper, we discuss the major steps that should be performed in real- or near real-time event handling including event detection, correlation, prediction and filtering. First, we discuss existing univariate and multivariate change detection schemes for the online event detection over sensor data. Next, we propose an online event correlation scheme that intends to unveil the internal dynamics that govern the operation of a system and are responsible for the generation of various types of events. We show that representation of event dependencies can be accommodated within a probabilistic temporal knowledge representation framework that allows the formulation of rules. We also address the important issue of identifying outdated dependencies among events by setting up a time-dependent framework for filtering the extracted rules over time. The proposed theory is applied on the maritime domain and is validated through extensive experimentation with real sensor streams originating from large-scale sensor networks deployed in ships.

Theory and Algorithms for Forecasting Time Series

We present data-dependent learning bounds for the general scenario of non-stationary non-mixing stochastic processes. Our learning guarantees are expressed in terms of a data-dependent measure of sequential complexity and a discrepancy measure that can be estimated from data under some mild assumptions. We also also provide novel analysis of stable time series forecasting algorithm using this new notion of discrepancy that we introduce. We use our learning bounds to devise new algorithms for non-stationary time series forecasting for which we report some preliminary experimental results.

Capturing Structure Implicitly from Time-Series having Limited Data

Scientific fields such as insider-threat detection and highway-safety planning often lack sufficient amounts of time-series data to estimate statistical models for the purpose of scientific discovery. Moreover, the available limited data are quite noisy. This presents a major challenge when estimating time-series models that are robust to overfitting and have well-calibrated uncertainty estimates. Most of the current literature in these fields involve visualizing the time-series for noticeable structure and hard coding them into pre-specified parametric functions. This approach is associated with two limitations. First, given that such trends may not be easily noticeable in small data, it is difficult to explicitly incorporate expressive structure into the models during formulation. Second, it is difficult to know \textit{a priori} the most appropriate functional form to use. To address these limitations, a nonparametric Bayesian approach was proposed to implicitly capture hidden structure from time series having limited data. The proposed model, a Gaussian process with a spectral mixture kernel, precludes the need to pre-specify a functional form and hard code trends, is robust to overfitting and has well-calibrated uncertainty estimates.

SentEval: An Evaluation Toolkit for Universal Sentence Representations

We introduce SentEval, a toolkit for evaluating the quality of universal sentence representations. SentEval encompasses a variety of tasks, including binary and multi-class classification, natural language inference and sentence similarity. The set of tasks was selected based on what appears to be the community consensus regarding the appropriate evaluations for universal sentence representations. The toolkit comes with scripts to download and preprocess datasets, and an easy interface to evaluate sentence encoders. The aim is to provide a fairer, less cumbersome and more centralized way for evaluating sentence representations.

Advancing Connectionist Temporal Classification With Attention Modeling

In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network representing an implicit language model. Finally, we introduce vector based attention weights that are applied on context vectors across both time and their individual components. We evaluate our system on a 3400 hours Microsoft Cortana voice assistant task and demonstrate that our proposed model consistently outperforms the baseline model achieving about 20% relative reduction in word error rates.

Large Margin Deep Networks for Classification

We present a formulation of deep learning that aims at producing a large margin classifier. The notion of margin, minimum distance to a decision boundary, has served as the foundation of several theoretically profound and empirically successful results for both classification and regression tasks. However, most large margin algorithms are applicable only to shallow models with a preset feature representation; and conventional margin methods for neural networks only enforce margin at the output layer. Such methods are therefore not well suited for deep networks. In this work, we propose a novel loss function to impose a margin on any chosen set of layers of a deep network (including input and hidden layers). Our formulation allows choosing any norm on the metric measuring the margin. We demonstrate that the decision boundary obtained by our loss has nice properties compared to standard classification loss functions. Specifically, we show improved empirical results on the MNIST, CIFAR-10 and ImageNet datasets on multiple tasks: generalization from small training sets, corrupted labels, and robustness against adversarial perturbations. The resulting loss is general and complementary to existing data augmentation (such as random/adversarial input transform) and regularization techniques (such as weight decay, dropout, and batch norm).

Sylvester Normalizing Flows for Variational Inference

Variational inference relies on flexible approximate posterior distributions. Normalizing flows provide a general recipe to construct flexible variational posteriors. We introduce Sylvester normalizing flows, which can be seen as a generalization of planar flows. Sylvester normalizing flows remove the well-known single-unit bottleneck from planar flows, making a single transformation much more flexible. We compare the performance of Sylvester normalizing flows against planar flows and inverse autoregressive flows and demonstrate that they compare favorably on several datasets.

Word2Bits – Quantized Word Vectors

Word vectors require significant amounts of memory and storage, posing issues to resource limited devices like mobile phones and GPUs. We show that high quality quantized word vectors using 1-2 bits per parameter can be learned by introducing a quantization function into Word2Vec. We furthermore show that training with the quantization function acts as a regularizer. We train word vectors on English Wikipedia (2017) and evaluate them on standard word similarity and analogy tasks and on question answering (SQuAD). Our quantized word vectors not only take 8-16x less space than full precision (32 bit) word vectors but also outperform them on word similarity tasks and question answering.

A Study of Recent Contributions on Information Extraction

This paper reports on modern approaches in Information Extraction (IE) and its two main sub-tasks of Named Entity Recognition (NER) and Relation Extraction (RE). Basic concepts and the most recent approaches in this area are reviewed, which mainly include Machine Learning (ML) based approaches and the more recent trend to Deep Learning (DL) based methods.

Neural Network Quine

Self-replication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train self-replicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradient-based or non-gradient-based methods. We also describe a method we call regeneration to train the network without explicit optimization, by injecting the network with predictions of its own parameters. The best solution for a self-replicating network was found by alternating between regeneration and optimization steps. Finally, we describe a design for a self-replicating neural network that can solve an auxiliary task such as MNIST image classification. We observe that there is a trade-off between the network’s ability to classify images and its ability to replicate, but training is biased towards increasing its specialization at image classification at the expense of replication. This is analogous to the trade-off between reproduction and other tasks observed in nature. We suggest that a self-replication mechanism for artificial intelligence is useful because it introduces the possibility of continual improvement through natural selection.

GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent

In this paper, we present GossipGraD – a gossip communication protocol based Stochastic Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on large-scale systems. The salient features of GossipGraD are: 1) reduction in overall communication complexity from {\Theta}(log(p)) for p compute nodes in well-studied SGD to O(1), 2) model diffusion such that compute nodes exchange their updates (gradients) indirectly after every log(p) steps, 3) rotation of communication partners for facilitating direct diffusion of gradients, 4) asynchronous distributed shuffle of samples during the feedforward phase in SGD to prevent over-fitting, 5) asynchronous communication of gradients for further reducing the communication cost of SGD and GossipGraD. We implement GossipGraD for GPU and CPU clusters and use NVIDIA GPUs (Pascal P100) connected with InfiniBand, and Intel Knights Landing (KNL) connected with Aries network. We evaluate GossipGraD using well-studied dataset ImageNet-1K (~250GB), and widely studied neural network topologies such as GoogLeNet and ResNet50 (current winner of ImageNet Large Scale Visualization Research Challenge (ILSVRC)). Our performance evaluation using both KNL and Pascal GPUs indicates that GossipGraD can achieve perfect efficiency for these datasets and their associated neural network topologies. Specifically, for ResNet50, GossipGraD is able to achieve ~100% compute efficiency using 128 NVIDIA Pascal P100 GPUs – while matching the top-1 classification accuracy published in literature.

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Review of Multi-Agent Algorithms for Collective Behavior: a Structural Taxonomy
Subexponential-Time and FPT Algorithms for Embedded Flat Clustered Planarity
Cake-Cutting with Different Entitlements: How Many Cuts are Needed?
Computer-aided diagnosis of lung carcinoma using deep learning – a pilot study
SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping
Targeted change detection in remote sensing images
Removing Skill Bias from Gaming Statistics
Braess paradox in a network with stochastic dynamics and fixed strategies
CLT for supercritical branching processes with heavy-tailed branching law
Improving Object Counting with Heatmap Regulation
Challenges in Discriminating Profanity from Hate Speech
Logical Gates via Gliders Collisions
Computational complexity of the avalanche problem on one dimensional Kadanoff sandpiles
A Distributed Architecture for Edge Service Orchestration with Guarantees
Max-Min Greedy Matching
Probing the non-Debye low frequency excitations in glasses through random pinning
Sequential and exact formulae for the subdifferential of nonconvex integral functionals
Unpaired Image Captioning by Language Pivoting
Lowering the Upper Bounds of the Cost of Robust Distributed Controllers Beyond Quadratic Invariance
Self-Supervised Monocular Image Depth Learning and Confidence Estimation
Low coherence unit norm tight frames
Evaluation of Dense 3D Reconstruction from 2D Face Images in the Wild
Hovering stochastic oscillations in self-organized critical systems
Tutte Invariants for Alternating Dimaps
Context-Aware Mixed Reality: A Framework for Ubiquitous Interaction
A Game-Theoretic Framework for the Virtual Machines Migration Timing Problem
Testing the homogeneity of risk differences with sparse count data
Geometric duality and parametric duality for multiple objective linear programs are equivalent
A Simple and Effective Approach to the Story Cloze Test
Object Detection in Video with Spatiotemporal Sampling Networks
Advancing Acoustic-to-Word CTC Model
Achieving Human Parity on Automatic Chinese to English News Translation
Improving GANs Using Optimal Transport
Global Stabilization for Causally Consistent Partial Replication
Facelet-Bank for Fast Portrait Manipulation
The Penetration Effect of Connected Automated Vehicles in Urban Traffic: An Energy Impact Study
$\texttt{A2BCD}$: An Asynchronous Accelerated Block Coordinate Descent Algorithm With Optimal Complexity
On the Underspread/Overspread Classification of Random Processes
Micky: A Cheaper Alternative for Selecting Cloud Instances
Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment
Variational Message Passing with Structured Inference Networks
On the insufficiency of existing momentum schemes for Stochastic Optimization
Deriving constant coefficient linear recurrences for enumerating standard Young tableaux of periodic shape
Tuning a random field mechanism in a frustrated magnet
Reconstructing Gaussian sources by spatial sampling
Hydrodynamic Limit of the Inhomogeneous $\ell$-TASEP with Open Boundaries: Derivation and Solution
Generalized Proximal Smoothing for Phase Retrieval
Demyanov-Ryabova conjecture is false
A generalized projection-based scheme for solving convex constrained optimization problems
Fast End-to-End Trainable Guided Filter
Proximal SCOPE for Distributed Sparse Learning: Better Data Partition Implies Faster Convergence Rate
Measure-valued branching processes associated with Neumann nonlinear semiflows
Optimal Weight Allocation of Dynamic Distribution Networks and Positive Semi-definiteness of Signed Laplacians
Resource Allocation in NOMA based Fog Radio Access Networks
Graph codes and local systems
Existence of (Markovian) solutions to martingale problems associated with Lévy-type operators
LEGO: Learning Edge with Geometry all at Once by Watching Videos
Relaxed Locally Correctable Codes in Computationally Bounded Channels
Laws of large numbers for Hayashi-Yoshida-type functionals
Kolmogorov equations associated to the stochastic 2D Euler equations
HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension
Fast Subspace Clustering Based on the Kronecker Product
Does agricultural subsidies foster Italian southern farms? A Spatial Quantile Regression Approach
Mm-wave specific challenges in designing 5G transceiver architectures and air-interfaces
Structure Regularized Neural Network for Entity Relation Classification for Chinese Literature Text
Conditional Model Selection in Mixed-Effects Models with cAIC4
Performance and Impairment Modelling for Hardware Components in Millimetre-wave Transceivers
Motion optimization and parameter identification for a human and lower-back exoskeleton model
Spectral radii of asymptotic mappings and the convergence speed of the standard fixed point algorithm
The Hot Hand in Professional Darts
Training of Convolutional Networks on Multiple Heterogeneous Datasets for Street Scene Semantic Segmentation
Efficient First-order Methods for Convex Minimization: a Constructive Approach
Harmonic functions of random walks in a semigroup via ladder heights
Some insights into the behaviour of millimetre wave spectrum on key 5G cellular KPIs
Achieving Spatial Scalability for Coded Caching over Wireless Networks
Combinatorial analogs of topological zeta functions
i-HUMO: An Interactive Human and Machine Cooperation Framework for Entity Resolution with Quality Guarantees
Approximating Max-Cut under Graph-MSO Constraints
Exploring Linear Relationship in Feature Map Subspace for ConvNets Compression
Diverse M-Best Solutions by Dynamic Programming
Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning
What Catches the Eye? Visualizing and Understanding Deep Saliency Models
FDD Massive MIMO via UL/DL Channel Covariance Extrapolation and Active Channel Sparsification
A two-stage method for estimating the association between blood pressure variability and cardiovascular disease: An application using the ARIC Study
Salient Region Segmentation
Unmanned Aerial Vehicle Assisted Cellular Communication
PAC-Reasoning in Relational Domains
$\mathfrak{q}$-crystal structure on primed tableaux and on signed unimodal factorizations of reduced words of type $B$
Gaussian Processes Over Graphs
Minimax optimal rates for Mondrian trees and forests
Aggregated Sparse Attention for Steering Angle Prediction
Temporal Human Action Segmentation via Dynamic Clustering
Hierarchical Species Sampling Models
RUSSE’2018: A Shared Task on Word Sense Induction for the Russian Language
Deep architectures for learning context-dependent ranking functions
Hypergraph Saturation Irregularities
A geometric model for the module category of a gentle algebra
Stability analysis by dynamic dissipation inequalities: On merging frequency-domain techniques with time-domain conditions
Hyperbolic Geometry and Amplituhedra in 1+2 dimensions
On a General Dynamic Programming Approach for Decentralized Stochastic Control
OFDM-Autoencoder for End-to-End Learning of Communications Systems
2D Reconstruction of Small Intestine’s Interior Wall
RUSSE: The First Workshop on Russian Semantic Similarity
Dynamic Approximate Matchings with an Optimal Recourse Bound
Local Spectral Graph Convolution for Point Set Feature Learning
Enriching Frame Representations with Distributionally Induced Senses
A policy iteration algorithm for nonzero-sum stochastic impulse games
Statistical harmonization and uncertainty assessment in the comparison of satellite and radiosonde climate variables
Asymptotic theory for longitudinal data with missing responses adjusted by inverse probability weights
Effective Connectivity from Single Trial fMRI Data by Sampling Biologically Plausible Models
Joint Turbo Receiver for LDPC-Coded MIMO Systems Based on Semi-definite Relaxation
Pseudo Mask Augmented Object Detection
Learned Iterative Decoding for Lossy Image Compression Systems
The complexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics
Distributed Data Vending on Blockchain
Virtual CNN Branching: Efficient Feature Ensemble for Person Re-Identification
Deep Structure Inference Network for Facial Action Unit Recognition
Strategies to facilitate access to detailed geocoding information using synthetic data
Maxiset point of view for signal detection in inverse problems
The Laplace transform of the lognormal distribution
Identifiability of dynamical networks with partial node measurements
Coulomb-gas electrostatics controls large fluctuations of the KPZ equation
Blow-up results for space-time fractional stochastic partial differential equations
Contrasting information theoretic decompositions of modulatory and arithmetic interactions in neural information processing systems
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions