Tensor Valued Common and Individual Feature Extraction: Multi-dimensional Perspective

A novel method for common and individual feature analysis from exceedingly large-scale data is proposed, in order to ensure the tractability of both the computation and storage and thus mitigate the curse of dimensionality, a major bottleneck in modern data science. This is achieved by making use of the inherent redundancy in so-called multi-block data structures, which represent multiple observations of the same phenomenon taken at different times, angles or recording conditions. Upon providing an intrinsic link between the properties of the outer vector product and extracted features in tensor decompositions (TDs), the proposed common and individual information extraction from multi-block data is performed through imposing physical meaning to otherwise unconstrained factorisation approaches. This is shown to dramatically reduce the dimensionality of search spaces for subsequent classification procedures and to yield greatly enhanced accuracy. Simulations on a multi-class classification task of large-scale extraction of individual features from a collection of partially related real-world images demonstrate the advantages of the ‘blessing of dimensionality’ associated with TDs.

A Universal Marginalizer for Amortized Inference in Generative Models

We consider the problem of inference in a causal generative model where the set of available observations differs between data instances. We show how combining samples drawn from the graphical model with an appropriate masking function makes it possible to train a single neural network to approximate all the corresponding conditional marginal distributions and thus amortize the cost of inference. We further demonstrate that the efficiency of importance sampling may be improved by basing proposals on the output of the neural network. We also outline how the same network can be used to generate samples from an approximate joint posterior via a chain decomposition of the graph.

Efficient Training of Convolutional Neural Nets on Large Distributed Systems

Deep Neural Networks (DNNs) have achieved im- pressive accuracy in many application domains including im- age classification. Training of DNNs is an extremely compute- intensive process and is solved using variants of the stochastic gradient descent (SGD) algorithm. A lot of recent research has focussed on improving the performance of DNN training. In this paper, we present optimization techniques to improve the performance of the data parallel synchronous SGD algorithm using the Torch framework: (i) we maintain data in-memory to avoid file I/O overheads, (ii) we present a multi-color based MPI Allreduce algorithm to minimize communication overheads, and (iii) we propose optimizations to the Torch data parallel table framework that handles multi-threading. We evaluate the performance of our optimizations on a Power 8 Minsky cluster with 32 nodes and 128 NVidia Pascal P100 GPUs. With our optimizations, we are able to train 90 epochs of the ResNet-50 model on the Imagenet-1k dataset using 256 GPUs in just 48 minutes. This significantly improves on the previously best known performance of training 90 epochs of the ResNet-50 model on the same dataset using 256 GPUs in 65 minutes. To the best of our knowledge, this is the best known training performance demonstrated for the Imagenet- 1k dataset.

Channel masking for multivariate time series shapelets

Time series shapelets are discriminative sub-sequences and their similarity to time series can be used for time series classification. Initial shapelet extraction algorithms searched shapelets by complete enumeration of all possible data sub-sequences. Research on shapelets for univariate time series proposed a mechanism called shapelet learning which parameterizes the shapelets and learns them jointly with a prediction model in an optimization procedure. Trivial extension of this method to multivariate time series does not yield very good results due to the presence of noisy channels which lead to overfitting. In this paper we propose a shapelet learning scheme for multivariate time series in which we introduce channel masks to discount noisy channels and serve as an implicit regularization.

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents’ policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker.

Learning with Latent Language
Spatial Statistical Downscaling for Constructing High-Resolution Nature Runs in Global Observing System Simulation Experiments
Don’t Decay the Learning Rate, Increase the Batch Size
Post-selection estimation and testing following aggregated association tests
Widening siamese architectures for stereo matching
Learning One-hidden-layer Neural Networks with Landscape Design
User Scheduling for Millimeter Wave MIMO Communications with Low-Resolution ADCs
Evaluating Discourse Phenomena in Neural Machine Translation
Random walks on primitive lattice points
Uncovering Latent Style Factors for Expressive Speech Synthesis
Text Annotation Graphs: Annotating Complex Natural Language Phenomena
Solving the school bus routing problem by maximizing trip compatibility
An iterative school decomposition algorithm for solving the multi-school bus routing and scheduling problem
Beautiful and damned. Combined effect of content quality and social ties on user engagement
Analysis of the Communication Traffic for Blockchain Synchronization of IoT Devices
TasNet: time-domain audio separation network for real-time, single-channel speech separation
Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding
Exploiting Apache Spark platform for CMS computing analytics
Spatio-Temporal Analysis of Surveillance Data
Recognizing Textures with Mobile Cameras for Pedestrian Safety Applications
Anomalous Diffusion and the Generalized Langevin Equation
Percent Change Estimation in Large Scale Online Experiments
Sophisticated and small versus simple and sizeable: When does it pay off to introduce drifting coefficients in Bayesian VARs?
Typically-Correct Derandomization for Small Time and Space
Efficient $\widetilde{O}(n/ε)$ Spectral Sketches for the Laplacian and its Pseudoinverse
Consistent estimation of the spectrum of trace class data augmentation algorithms
On Optimization over Tail Distributions
Random Subspace Two-dimensional LDA for Face Recognition
Universality of the least singular value for sparse random matrices
Grant-free Radio Access IoT Networks: Scalability Analysis in Coexistence Scenarios
Deep Learning from Noisy Image Labels with Quality Embedding
A Bio-Inspired Multi-Exposure Fusion Framework for Low-light Image Enhancement
Refining glass structure in two dimensions
Optimal Parametric Search for Path and Tree Partitioning
Efficient Constrained Tensor Factorization by Alternating Optimization with Primal-Dual Splitting
Deep learning for evaluating the effects of a layout of photon sensors on event reconstructions
An alternative approach for compatibility of two discrete conditional distributions
Security Against Impersonation Attacks in Distributed Systems
A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-based Variational Autoencoder
ThrottleBot – Performance without Insight
Decentralized Deep Scheduling for Interference Channels
Sleep Stage Classification Based on Multi-level Feature Learning and Recurrent Neural Networks via Wearable Device
Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in Spatial-Temporal Data
A topology for Team Policies and Existence of Optimal Team Policies in Stochastic Team Theory
Data Augmentation in Classification using GAN
A bound for the shortest reset words for semisimple synchronizing automata via the packing number
Semi-Robust Communications over a Broadcast Channel
Candidates v.s. Noises Estimation for Large Multi-Class Classification Problem
Concave losses for robust dictionary learning
Erdős-Pósa property of chordless cycles and its applications
On the Isoperimetric constant, covariance inequalities and $L_p$-Poincaré inequalities in dimension one
Development and validation of a novel dementia of Alzheimer’s type (DAT) score based on metabolism FDG-PET imaging
Fast Information-theoretic Bayesian Optimisation
Understanding and Predicting The Attractiveness of Human Action Shot
Collapsibility of marginal models for categorical data
Extracting an English-Persian Parallel Corpus from Comparable Corpora
On the Complexity of Random Quantum Computations and the Jones Polynomial
Statistical evaluation of visual quality metrics for image denoising
Interpretable and Pedagogical Examples
Approximating quantum channels by completely positive maps with small Kraus rank
Adaptive coordination of working-memory and reinforcement learning in non-human primates performing a trial-and-error problem solving task
Output feedback control of general linear heterodirectional hyperbolic PDE-ODE systems with spatially-varying coefficients
On Game-Theoretic Risk Management (Part Three) – Modeling and Applications
Directed path-decompositions
Implementation of the AC-based economic dispatch in the Russian electricity market
Estimating Historical Hourly Traffic Volumes via Machine Learning and Vehicle Probe Data: A Maryland Case Study
A mating-of-trees approach to graph distances in random planar maps
Asymptotic Signal Detection Rates with 1-bit Array Measurements
Stochastic Routing and Scheduling Policies for Energy Harvesting Communication Networks
Geometric k-nearest neighbor estimation of entropy and mutual information
Network-size independent covering number bounds for deep networks
REAP: An Efficient Incentive Mechanism for Reconciling Aggregation Accuracy and Individual Privacy in Crowdsensing
Improved Lower Bounds for the Fourier Entropy/Influence Conjecture via Lexicographic Functions
Approximation of Functions over Manifolds: A Moving Least-Squares Approach
SRL4ORL: Improving Opinion Role Labelling using Multi-task Learning with Semantic Role Labeling
Scientific co-authorship networks
A Paradox about Likelihood Ratios?
A non-local problem for the Fokker-Planck equation related to the Becker-Döring model
A variational method for analyzing stochastic limit cycle oscillators
A Comprehensive Low and High-level Feature Analysis for Early Rumor Detection on Twitter
On the complexity of optimal homotopies
Partition mixture of 1D wavelets for multi-dimensional data
Limit shapes for cube groves with periodic conductances
3D Mobile Localization Using Distance-only Measurements
Deep Recurrent Gaussian Process with Variational Sparse Spectrum Approximation
Estimating Under Five Mortality in Space and Time in a Developing World Context
Linear Programming Based Optimality Conditions and Approximate Solution of a Deterministic Infinite Horizon Discounted Optimal Control Problem in Discrete Time
Framework for evaluation of sound event detection in web videos
The dimension-free structure of nonhomogeneous random matrices
An Optimal Choice Dictionary
Expressive power of recurrent neural networks
Bootstrapping Exchangeable Random Graphs
Measuring Quantum Entropy
Johari-Goldstein relaxation far below Tg: Experimental evidence for the Gardner transition in structural glasses?
Medoids in almost linear time via multi-armed bandits
Minor-free graphs have light spanners
Adaptive Network Flow with $k$-Arc Destruction
Random walk on random planar maps: spectral dimension, resistance, and displacement
Oversampling for Imbalanced Learning Based on K-Means and SMOTE
Lower Bounds for Finding Stationary Points II: First-Order Methods
Generalized Probabilistic Bisection for Stochastic Root-Finding
Variational Inference of Disentangled Latent Concepts from Unlabeled Observations
Provable defenses against adversarial examples via the convex outer adversarial polytope
Quenched invariance principles for the maximal particle in branching random walk in random environment and the parabolic Anderson model