Long-term Forecasting using Tensor-Train RNNs

We present Tensor-Train RNN (TT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics. Long-term forecasting in such systems is highly challenging, since there exist long-term temporal dependencies, higher-order correlations and sensitivity to error propagation. Our proposed tensor recurrent architecture addresses these issues by learning the nonlinear dynamics directly using higher order moments and high-order state transition functions. Furthermore, we decompose the higher-order structure using the tensor-train (TT) decomposition to reduce the number of parameters while preserving the model performance. We theoretically establish the approximation properties of Tensor-Train RNNs for general sequence inputs, and such guarantees are not available for usual RNNs. We also demonstrate significant long-term prediction improvements over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on real-world climate and traffic data.

Accelerated Sparse Subspace Clustering

State-of-the-art algorithms for sparse subspace clustering perform spectral clustering on a similarity matrix typically obtained by representing each data point as a sparse combination of other points using either basis pursuit (BP) or orthogonal matching pursuit (OMP). BP-based methods are often prohibitive in practice while the performance of OMP-based schemes are unsatisfactory, especially in settings where data points are highly similar. In this paper, we propose a novel algorithm that exploits an accelerated variant of orthogonal least-squares to efficiently find the underlying subspaces. We show that under certain conditions the proposed algorithm returns a subspace-preserving solution. Simulation results illustrate that the proposed method compares favorably with BP-based method in terms of running time while being significantly more accurate than OMP-based schemes.

Pomegranate: fast and flexible probabilistic modeling in python

We present pomegranate, an open source machine learning package for probabilistic modeling in Python. Probabilistic modeling encompasses a wide range of methods that explicitly describe uncertainty using probability distributions. Three widely used probabilistic models implemented in pomegranate are general mixture models, hidden Markov models, and Bayesian networks. A primary focus of pomegranate is to abstract away the complexities of training models from their definition. This allows users to focus on specifying the correct model for their application instead of being limited by their understanding of the underlying algorithms. An aspect of this focus involves the collection of additive sufficient statistics from data sets as a strategy for training models. This approach trivially enables many useful learning strategies, such as out-of-core learning, minibatch learning, and semi-supervised learning, without requiring the user to consider how to partition data or modify the algorithms to handle these tasks themselves. pomegranate is written in Cython to speed up calculations and releases the global interpreter lock to allow for built-in multithreaded parallelism, making it competitive with—or outperform—other implementations of similar algorithms. This paper presents an overview of the design choices in pomegranate, and how they have enabled complex features to be supported by simple code.

A multitask deep learning model for real-time deployment in embedded systems

We propose an approach to Multitask Learning (MTL) to make deep learning models faster and lighter for applications in which multiple tasks need to be solved simultaneously, which is particularly useful in embedded, real-time systems. We develop a multitask model for both Object Detection and Semantic Segmentation and analyze the challenges that appear during its training. Our multitask network is 1.6x faster, lighter and uses less memory than deploying the single-task models in parallel. We conclude that MTL has the potential to give superior performance in exchange of a more complex training process that introduces challenges not present in single-task models.

Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples

Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand the underlying technology, they need textual or visual interfaces to help them make sense of it. We explore the problem of generating natural language summaries for Semantic Web data. This is non-trivial, especially in an open-domain context. To address this problem, we explore the use of neural networks. Our system encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on the encoded vector. We train and evaluate our models on two corpora of loosely aligned Wikipedia snippets and DBpedia and Wikidata triples with promising results.

Deep Neural Networks as Gaussian Processes

A deep fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP) in the limit of infinite network width. This correspondence enables exact Bayesian inference for neural networks on regression tasks by means of straightforward matrix computations. For single hidden-layer networks, the covariance function of this GP has long been known. Recently, kernel functions for multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified the correspondence between using these kernels as the covariance function for a GP and performing fully Bayesian prediction with a deep neural network. In this work, we derive this correspondence and develop a computationally efficient pipeline to compute the covariance functions. We then use the resulting GP to perform Bayesian inference for deep neural networks on MNIST and CIFAR-10. We find that the GP-based predictions are competitive and can outperform neural networks trained with stochastic gradient descent. We observe that the trained neural network accuracy approaches that of the corresponding GP-based computation with increasing layer width, and that the GP uncertainty is strongly correlated with prediction error. We connect our observations to the recent development of signal propagation in random neural networks.

Minimum Energy Quantized Neural Networks

This work targets the automated minimum-energy optimization of Quantized Neural Networks (QNNs) – networks using low precision weights and activations. These networks are trained from scratch at an arbitrary fixed point precision. At iso-accuracy, QNNs using fewer bits require deeper and wider network architectures than networks using higher precision operators, while they require less complex arithmetic and less bits per weights. This fundamental trade-off is analyzed and quantified to find the minimum energy QNN for any benchmark and hence optimize energy-efficiency. To this end, the energy consumption of inference is modeled for a generic hardware platform. This allows drawing several conclusions across different benchmarks. First, energy consumption varies orders of magnitude at iso-accuracy depending on the number of bits used in the QNN. Second, in a typical system, BinaryNets or int4 implementations lead to the minimum energy solution, outperforming int8 networks up to 2-10x at iso-accuracy. All code used for QNN training is available from https://…/BertMoons.

Stochastic Variational Inference for Fully Bayesian Sparse Gaussian Process Regression Models

This paper presents a novel variational inference framework for deriving a family of Bayesian sparse Gaussian process regression (SGPR) models whose approximations are variationally optimal with respect to the full-rank GPR model enriched with various corresponding correlation structures of the observation noises. Our variational Bayesian SGPR (VBSGPR) models jointly treat both the distributions of the inducing variables and hyperparameters as variational parameters, which enables the decomposability of the variational lower bound that in turn can be exploited for stochastic optimization. Such a stochastic optimization involves iteratively following the stochastic gradient of the variational lower bound to improve its estimates of the optimal variational distributions of the inducing variables and hyperparameters (and hence the predictive distribution) of our VBSGPR models and is guaranteed to achieve asymptotic convergence to them. We show that the stochastic gradient is an unbiased estimator of the exact gradient and can be computed in constant time per iteration, hence achieving scalability to big data. We empirically evaluate the performance of our proposed framework on two real-world, massive datasets.

The multiset dimension of graphs

We introduce a variation of the metric dimension, called the multiset dimension. The representation multiset of a vertex v with respect to W (which is a subset of the vertex set of a graph G), r_m (v|W), is defined as a multiset of distances between v and the vertices in W together with their multiplicities. If r_m (u |W) \neq r_m(v|W) for every pair of distinct vertices u and v, then W is called a resolving set of G. If G has a resolving set, then the cardinality of a smallest resolving set is called the multiset dimension of G, denoted by md(G). If G does not contain a resolving set, we write md(G) = \infty. We present basic results on the multiset dimension. We also study graphs of given diameter and give some sufficient conditions for a graph to have an infinite multiset dimension.

Efficient Inferencing of Compressed Deep Neural Networks

Large number of weights in deep neural networks makes the models difficult to be deployed in low memory environments such as, mobile phones, IOT edge devices as well as ‘inferencing as a service’ environments on cloud. Prior work has considered reduction in the size of the models, through compression techniques like pruning, quantization, Huffman encoding etc. However, efficient inferencing using the compressed models has received little attention, specially with the Huffman encoding in place. In this paper, we propose efficient parallel algorithms for inferencing of single image and batches, under various memory constraints. Our experimental results show that our approach of using variable batch size for inferencing achieves 15-25\% performance improvement in the inference throughput for AlexNet, while maintaining memory and latency constraints.

Smooth Neighbors on Teacher Graphs for Semi-supervised Learning

The paper proposes an inductive semi-supervised learning method, called Smooth Neighbors on Teacher Graphs (SNTG). At each iteration during training, a graph is dynamically constructed based on predictions of the teacher model, i.e., the implicit self-ensemble of models. Then the graph serves as a similarity measure with respect to which the representations of ‘similar’ neighboring points are learned to be smooth on the low dimensional manifold. We achieve state-of-the-art results on semi-supervised learning benchmarks. The error rates are 9.89%, 3.99% for CIFAR-10 with 4000 labels, SVHN with 500 labels, respectively. In particular, the improvements are significant when the labels are scarce. For non-augmented MNIST with only 20 labels, the error rate is reduced from previous 4.81% to 1.36%. Our method is also effective under noisy supervision and shows robustness to incorrect labels.

Determination of Checkpointing Intervals for Malleable Applications

Selecting optimal intervals of checkpointing an application is important for minimizing the run time of the application in the presence of system failures. Most of the existing efforts on checkpointing interval selection were developed for sequential applications while few efforts deal with parallel applications where the applications are executed on the same number of processors for the entire duration of execution. Some checkpointing systems support parallel applications where the number of processors on which the applications execute can be changed during the execution. We refer to these kinds of parallel applications as {\em malleable} applications. In this paper, we develop a performance model for malleable parallel applications that estimates the amount of useful work performed in unit time (UWT) by a malleable application in the presence of failures as a function of checkpointing interval. We use this performance model function with different intervals and select the interval that maximizes the UWT value. By conducting a large number of simulations with the traces obtained on real supercomputing systems, we show that the checkpointing intervals determined by our model can lead to high efficiency of applications in the presence of failures.

Paraphrase Generation with Deep Reinforcement Learning

Automatic generation of paraphrases for a given sentence is an important yet challenging task in natural language processing (NLP), and plays a key role in a number of applications such as question answering, information retrieval and dialogue. In this paper we present a deep reinforcement learning approach to paraphrase generation. Specifically, we propose a new model for the task, which consists of a \textit{generator} and a \textit{teacher}. The generator, built on the sequence-to-sequence learning framework, can generate paraphrases given a sentence. The teacher, modeled as a deep neural network, can decide whether the sentences are paraphrases of each other. After construction of the generator and teacher, the generator is further fine-tuned by reinforcement learning in which the reward is given by a teacher. Empirical study shows that the teacher can provide precise supervision to the generator, and guide the generator to produce more accurate paraphrases. Experimental results demonstrate the proposed model outperforms the state-of-the-art methods in paraphrase generation in both automatic evaluation and human evaluation.

Orthogonal Machine Learning: Power and Limitations

Double machine learning provides \sqrt{n}-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an n^{-1/4} rate. The key is to employ \emph{Neyman-orthogonal} moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the n^{-1/4} requirement can be improved to n^{-1/(2k+2)} by employing a k-th order notion of orthogonality that grants robustness to more complex or higher-dimensional nuisance parameters. In the partially linear model setting popular in causal inference, we use Stein’s lemma to show that we can construct second-order orthogonal moments if and only if the treatment residual is not normally distributed. We conclude by demonstrating the robustness benefits of an explicit doubly-orthogonal estimation procedure for treatment effect.

Universal gradient descent

In this small book we collect many different and useful facts around gradient descent method. First of all we consider gradient descent with inexact oracle. We build a general model of optimized function that include composite optimization approache, level’s methods, proximal methods etc. Then we investigate primal-dual properties of the gradient descent in general model set-up. At the end we generalize method to universal one.

Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR

There has been much discussion of the right to explanation in the EU General Data Protection Regulation, and its existence, merits, and disadvantages. Implementing a right to explanation that opens the black box of algorithmic decision-making faces major legal and technical barriers. Explaining the functionality of complex algorithmic decision-making systems and their rationale in specific cases is a technically challenging problem. Some explanations may offer little meaningful information to data subjects, raising questions around their value. Explanations of automated decisions need not hinge on the general public understanding how algorithmic systems function. Even though such interpretability is of great importance and should be pursued, explanations can, in principle, be offered without opening the black box. Looking at explanations as a means to help a data subject act rather than merely understand, one could gauge the scope and content of explanations according to the specific goal or action they are intended to support. From the perspective of individuals affected by automated decision-making, we propose three aims for explanations: (1) to inform and help the individual understand why a particular decision was reached, (2) to provide grounds to contest the decision if the outcome is undesired, and (3) to understand what would need to change in order to receive a desired result in the future, based on the current decision-making model. We assess how each of these goals finds support in the GDPR. We suggest data controllers should offer a particular type of explanation, unconditional counterfactual explanations, to support these three aims. These counterfactual explanations describe the smallest change to the world that can be made to obtain a desirable outcome, or to arrive at the closest possible world, without needing to explain the internal logic of the system.

Attacking Binarized Neural Networks

Neural networks with low-precision weights and activations offer compelling efficiency advantages over their full-precision equivalents. The two most frequently discussed benefits of quantization are reduced memory consumption, and a faster forward pass when implemented with efficient bitwise operations. We propose a third benefit of very low-precision neural networks: improved robustness against some adversarial attacks, and in the worst case, performance that is on par with full-precision models. We focus on the very low-precision case where weights and activations are both quantized to \pm1, and note that stochastically quantizing weights in just one layer can sharply reduce the impact of iterative attacks. We observe that non-scaled binary neural networks exhibit a similar effect to the original defensive distillation procedure that led to gradient masking, and a false notion of security. We address this by conducting both black-box and white-box experiments with binary models that do not artificially mask gradients.

Piecewise Linear Neural Network verification: A comparative study

The success of Deep Learning and its potential use in many important safety- critical applications has motivated research on formal verification of Neural Network (NN) models. Despite the reputation of learned NN models to behave as black boxes and the theoretical hardness of proving their properties, researchers have been successful in verifying some classes of models by exploiting their piecewise linear structure. Unfortunately, most of these approaches test their algorithms without comparison with other approaches. As a result, the pros and cons of the different algorithms are not well understood. Motivated by the need to accelerate progress in this very important area, we investigate the trade-offs of a number of different approaches based on Mixed Integer Programming, Satisfiability Modulo Theory, as well as a novel method based on the Branch-and-Bound framework. We also propose a new data set of benchmarks, in addition to a collection of pre- viously released testcases that can be used to compare existing methods. Our analysis not only allows a comparison to be made between different strategies, the comparison of results from different solvers also revealed implementation bugs in published methods. We expect that the availability of our benchmark and the analysis of the different approaches will allow researchers to develop and evaluate promising approaches for making progress on this important topic.

An Information-Theoretic Analysis of Deep Latent-Variable Models

We present an information-theoretic framework for understanding trade-offs in unsupervised learning of deep latent-variables models using variational inference. This framework emphasizes the need to consider latent-variable models along two dimensions: the ability to reconstruct inputs (distortion) and the communication cost (rate). We derive the optimal frontier of generative models in the two-dimensional rate-distortion plane, and show how the standard evidence lower bound objective is insufficient to select between points along this frontier. However, by performing targeted optimization to learn generative models with different rates, we are able to learn many models that can achieve similar generative performance but make vastly different trade-offs in terms of the usage of the latent variable. Through experiments on MNIST and Omniglot with a variety of architectures, we show how our framework sheds light on many recent proposed extensions to the variational autoencoder family.

Topological Floquet-Thouless energy pump
Machine learning out-of-equilibrium phases of matter
Quantile Functional Regression using Quantlets
Unsupervised Machine Translation Using Monolingual Corpora Only
$K$-User Symmetric M$\times$N MIMO Interference Channel under Finite Precision CSIT: A GDoF perspective
Replace or Retrieve Keywords In Documents at Scale
Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction
Medical Image Segmentation Based on Multi-Modal Convolutional Neural Network: Study on Image Fusion Schemes
Every exponential group supports a positive harmonic function
Empirical likelihood inference for partial functional linear regression models based on B spline
Abnormal Spatial-Temporal Pattern Analysis for Niagara Frontier Border Wait Times
Calibration for Stratified Classification Models
Fraternal Dropout
Spatially Adaptive Colocalization Analysis in Dual-Color Fluorescence Microscopy
Ranking Median Regression: Learning to Order through Local Consensus
Approximating the $2$-Machine Flow Shop Problem with Exact Delays Taking Two Values
Synth-Validation: Selecting the Best Causal Inference Method for a Given Dataset
Theory of Activated Glassy Dynamics in Randomly Pinned Fluids
Semantic Image Retrieval via Active Grounding of Visual Situations
A note on the simultaneous Waring rank of monomials
Summarizing Dialogic Arguments from Social Media
Pattern Recognition Techniques for the Identification of Activities of Daily Living using Mobile Device Accelerometer
Bayesian Markov Switching Tensor Regression for Time-varying Networks
Functional data approaches for mixed longitudinal studies, with applications in midlife women’s health
Scheduling Monotone Moldable Jobs in Linear Time
A Multiple Source Framework for the Identification of Activities of Daily Living Based on Mobile Device Data
DCN+: Mixed Objective and Deep Residual Coattention for Question Answering
Beyond Shared Hierarchies: Deep Multitask Learning through Soft Layer Ordering
Countering Adversarial Images using Input Transformations
Statistical Modeling of FSO Fronthaul Channel for Drone-based Networks
Dynamical SimRank Search on Time-Varying Networks
Backpropagation through the Void: Optimizing control variates for black-box gradient estimation
User Environment Detection with Acoustic Sensors Embedded on Mobile Devices for the Recognition of Activities of Daily Living
A remark for meeting times in large random regular graphs
Automata Guided Hierarchical Reinforcement Learning for Zero-shot Skill Composition
Diffusive Molecular Communications with Reactive Signaling
Bayesian model comparison with the Hyvärinen score: computation and consistency
Visualizing and Understanding Atari Agents
Segmentation-by-Detection: A Cascade Network for Volumetric Medical Image Segmentation
Training GANs with Optimism
Sampling and Reconstruction of Graph Signals via Weak Submodularity and Semidefinite Relaxation
The Unreasonable Rigidity of Ulam Sets
A tableau formula of double Grothendieck polynomials for $321$-avoiding permutations
Statistical Inference of Kumaraswamy distribution under imprecise information
Link prediction in drug-target interactions network using similarity indices
Spectral correlations in Anderson insulating wires
Bayesian Variable Selection for Multivariate Zero-Inflated Models: Application to Microbiome Count Data
On the Ristic-Balakrishnan distribution: bivariate extension and characterizations
Dynamic quantile linear model: a Bayesian approach
Cluster Algebras, Invariant Theory, and Kronecker Coefficients II
Improving Object Localization with Fitness NMS and Bounded IoU Loss
Outdoor to Indoor Penetration Loss at 28 GHz for Fixed Wireless Access
Dynamic Double Directional Propagation Channel Measurements at 28 GHz
28 GHz Microcell Measurement Campaign for Residential Environment
On some further properties and application of Weibull-R family of distributions
On $AP_3$ – covering sequences
On a problem of Nathanson
The number of spanning trees in circulant graphs, its arithmetic properties and asymptotic
Bandwidth selection for nonparametric modal regression
Keyword-based Query Comprehending via Multiple Optimized-Demand Augmentation
On the denominators of harmonic numbers
On additive representation functions
Domino tilings of the expanded Aztec diamond
Stochastic Modeling and Forecast of Hydrological Contributions in the Colombian Electric System
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
Towards Effective Low-bitwidth Convolutional Neural Networks
Learning deep features for source color laser printer identification based on cascaded learning
On the complete weight enumerators of some linear codes with a few weights
Large induced acyclic and outerplanar subgraphs of 2-outerplanar graph
A Coalition Formation Approach to Coordinated Task Allocation in Heterogeneous UAV Networks
Limiting Laws for Divergent Spiked Eigenvalues and Largest Non-spiked Eigenvalue of Sample Covariance Matrices
Cumulants, Spreadability and the Campbell-Baker-Hausdorff Series
The Hardness of Synthesizing Elementary Net Systems from Highly Restricted Inputs
Vertex-Context Sampling for Weighted Network Embedding
Dynamic Load Balancing Strategies for Graph Applications on GPUs
Group-Average and Convex Clustering for Partially Heterogeneous Linear Regression
Single Multi-feature detector for Amodal 3D Object Detection in RGB-D Images
Secure Classification With Augmented Features
Output Consensus of Networked Hammerstein and Wiener Systems
Improved Text Language Identification for the South African Languages
Query-free Clothing Retrieval via Implicit Relevance Feedback
Unifying different interpretations of the nonlinear response in glass-forming liquids
Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization
Rarity of extremal edges in random surfaces and other theoretical applications of cluster algorithms
The geometry of optimally transported meshes on the sphere
Acquiring Target Stacking Skills by Goal-Parameterized Deep Reinforcement Learning
Flux large deviations of weakly interacting jump processes via well-posedness of an associated Hamilton-Jacobi equation
Fast Dynamic Arrays
Inapproximability of the independent set polynomial in the complex plane
Improved Approximation Schemes for the Restricted Shortest Path Problem
Personalized Schedules for Surveillance of Low Risk Prostate Cancer Patients
Deep and Shallow convections in Atmosphere Models on Intel Xeon Phi Coprocessor Systems
Towards Automatic Generation of Entertaining Dialogues in Chinese Crosstalks
Assessing the reliability polynomial based on percolation theory
Multi-View Data Generation Without View Supervision
Pricing of commodity derivatives on processes with memory
Improving Neural Machine Translation through Phrase-based Forced Decoding
On Search Powered Navigation
Avoiding Your Teacher’s Mistakes: Training Neural Networks with Controlled Weak Supervision
Strengthening the Group – Aggregated Frequency Reserve Bidding with ADMM
Robust Saliency Detection via Fusing Foreground and Background Priors
Ontological states and dynamics of discrete (pre-)quantum systems
The quoter model: a paradigmatic model of the social flow of written information
Semantic Structure and Interpretability of Word Embeddings
Totally bipartite tridiagonal pairs
Automatic calcium scoring in low-dose chest CT using deep neural networks with dilated convolutions
Detection for 5G-NOMA: An Online Adaptive Machine Learning Approach
Polyhedral characteristics of balanced and unbalanced bipartite subgraph problems
Complex-valued image denosing based on group-wise complex-domain sparsity
A Large Dimensional Analysis of Regularized Discriminant Analysis Classifiers
Analyzing the Approximation Error of the Fast Graph Fourier Transform
Active Tolerant Testing
Excursion Processes Associated with Elliptic Combinatorics
Minimal Exploration in Structured Stochastic Bandits
Transitions from trees to cycles in adaptive flow networks
Successive Cancellation Soft Output Detector For Uplink MU-MIMO Systems With One-bit ADCs
Building Data-driven Models with Microstructural Images: Generalization and Interpretability
The Price of Information in Combinatorial Optimization
Combinatorial cost: a coarse setting
Intelligent Parameter Tuning in Optimization-based Iterative CT Reconstruction via Deep Reinforcement Learning
Performance Analysis for Massive MIMO Downlink with Low Complexity Approximate Zero-Forcing Precoding
Snow Queen is Evil and Beautiful: Experimental Evidence for Probabilistic Contextuality in Human Choices
Super RSK correspondence with symmetry
On the variance of radio interferometric calibration solutions: Quality-based Weighting Schemes
Fractional Brownian motion with zero Hurst parameter: a rough volatility viewpoint
On the width of FAC orders, a somewhat rediscovered notion
A counterexample to Stein’s Equi-n-square Conjecture
Statistical Challenges in Modeling Big Brain Signals
Block-modified Wishart matrices: the easy case
Hierarchical Representations for Efficient Architecture Search
Geostatistical inference in the presence of geomasking: a composite-likelihood approach
Sampling and multilevel coarsening algorithms for fast matrix approximations
Data, Depth, and Design: Learning Reliable Models for Melanoma Screening
Bases for cluster algebras from orbifolds with one marked point
Sufficient Conditions for the Controllability of Wave Equations with a Transmission Condition at the Interface
Almost instant brain atlas segmentation for large-scale studies
Locally stationary spatio-temporal interpolation of Argo profiling float data
Optimizing quantum optimization algorithms via faster quantum gradient computation