Universal Transformers

Self-attentive feed-forward sequence models have been shown to achieve impressive results on sequence modeling tasks, thereby presenting a compelling alternative to recurrent neural networks (RNNs) which has remained the de-facto standard architecture for many sequence modeling problems to date. Despite these successes, however, feed-forward sequence models like the Transformer fail to generalize in many tasks that recurrent models handle with ease (e.g. copying when the string lengths exceed those observed at training time). Moreover, and in contrast to RNNs, the Transformer model is not computationally universal, limiting its theoretical expressivity. In this paper we propose the Universal Transformer which addresses these practical and theoretical shortcomings and we show that it leads to improved performance on several tasks. Instead of recurring over the individual symbols of sequences like RNNs, the Universal Transformer repeatedly revises its representations of all symbols in the sequence with each recurrent step. In order to combine information from different parts of a sequence, it employs a self-attention mechanism in every recurrent step. Assuming sufficient memory, its recurrence makes the Universal Transformer computationally universal. We further employ an adaptive computation time (ACT) mechanism to allow the model to dynamically adjust the number of times the representation of each position in a sequence is revised. Beyond saving computation, we show that ACT can improve the accuracy of the model. Our experiments show that on various algorithmic tasks and a diverse set of large-scale language understanding tasks the Universal Transformer generalizes significantly better and outperforms both a vanilla Transformer and an LSTM in machine translation, and achieves a new state of the art on the bAbI linguistic reasoning task and the challenging LAMBADA language modeling task.

Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition

In this paper, we propose a novel Convolutional Neural Network (CNN) architecture for learning multi-scale feature representations with good tradeoffs between speed and accuracy. This is achieved by using a multi-branch network, which has different computational complexity at different branches. Through frequent merging of features from branches at distinct scales, our model obtains multi-scale features while using less computation. The proposed approach demonstrates improvement of model efficiency and performance on both object recognition and speech recognition tasks,using popular architectures including ResNet and ResNeXt. For object recognition, our approach reduces computation by 33% on object recognition while improving accuracy with 0.9%. Furthermore, our model surpasses state-of-the-art CNN acceleration approaches by a large margin in accuracy and FLOPs reduction. On the task of speech recognition, our proposed multi-scale CNNs save 30% FLOPs with slightly better word error rates, showing good generalization across domains.

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees

While model-based reinforcement learning has empirically been shown to significantly reduce the sample complexity that hinders model-free RL, the theoretical understanding of such methods has been rather limited. In this paper, we introduce a novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees, and a practical algorithm Optimistic Lower Bounds Optimization (OLBO). In particular, we derive a theoretical guarantee of monotone improvement for model-based RL with our framework. We iteratively build a lower bound of the expected reward based on the estimated dynamical model and sample trajectories, and maximize it jointly over the policy and the model. Assuming the optimization in each iteration succeeds, the expected reward is guaranteed to improve. The framework also incorporates an optimism-driven perspective, and reveals the intrinsic measure for the model prediction error. Preliminary simulations demonstrate that our approach outperforms the standard baselines on continuous control benchmark tasks.

Automatic Gradient Boosting

Automatic machine learning performs predictive modeling with high performing machine learning tools without human interference. This is achieved by making machine learning applications parameter-free, i.e. only a dataset is provided while the complete model selection and model building process is handled internally through (often meta) optimization. Projects like Auto-WEKA and auto-sklearn aim to solve the Combined Algorithm Selection and Hyperparameter optimization (CASH) problem resulting in huge configuration spaces. However, for most real-world applications, the optimization over only a few different key learning algorithms can not only be sufficient, but also potentially beneficial. The latter becomes apparent when one considers that models have to be validated, explained, deployed and maintained. Here, less complex model are often preferred, for validation or efficiency reasons, or even a strict requirement. Automatic gradient boosting simplifies this idea one step further, using only gradient boosting as a single learning algorithm in combination with model-based hyperparameter tuning, threshold optimization and encoding of categorical features. We introduce this general framework as well as a concrete implementation called autoxgboost. It is compared to current AutoML projects on 16 datasets and despite its simplicity is able to achieve comparable results on about half of the datasets as well as performing best on two.

Node-specific effects in latent space modelling of multidimensional networks

Observed multidimensional network data can have different levels of complexity, as nodes may be characterized by heterogeneous individual-specific features. Also, such characteristics may vary across the networks. This article discusses a novel class of models for multidimensional networks, able to deal with different levels of heterogeneity within and between networks. The proposed framework is developed within the family of latent space models, in order to distinguish recurrent symmetrical relations between the nodes from node-specific features in the different views. Models parameters are estimated via a Markov Chain Monte Carlo algorithm. Simulated data and also FAO fruits import/export data are analysed to illustrate the performances of the proposed models.

Deep Structured Generative Models

Deep generative models have shown promising results in generating realistic images, but it is still non-trivial to generate images with complicated structures. The main reason is that most of the current generative models fail to explore the structures in the images including spatial layout and semantic relations between objects. To address this issue, we propose a novel deep structured generative model which boosts generative adversarial networks (GANs) with the aid of structure information. In particular, the layout or structure of the scene is encoded by a stochastic and-or graph (sAOG), in which the terminal nodes represent single objects and edges represent relations between objects. With the sAOG appropriately harnessed, our model can successfully capture the intrinsic structure in the scenes and generate images of complicated scenes accordingly. Furthermore, a detection network is introduced to infer scene structures from a image. Experimental results demonstrate the effectiveness of our proposed method on both modeling the intrinsic structures, and generating realistic images.

Limits to Surprise in Recommender Systems

In this study, we address the challenge of measuring the ability of a recommender system to make surprising recommendations. Although current evaluation methods make it possible to determine if two algorithms can make recommendations with a significant difference in their average surprise measure, it could be of interest to our community to know how competent an algorithm is at embedding surprise in its recommendations, without having to resort to making a direct comparison with another algorithm. We argue that a) surprise is a finite resource in a recommender system, b) there is a limit to how much surprise any algorithm can embed in a recommendation, and c) this limit can provide us with a scale against which the performance of any algorithm can be measured. By exploring these ideas, it is possible to define the concepts of maximum and minimum potential surprise and design a surprise metric called ‘normalised surprise’ that employs these limits to potential surprise. Two experiments were conducted to test the proposed metric. The aim of the first was to validate the quality of the estimates of minimum and maximum potential surprise produced by a greedy algorithm. The purpose of the second experiment was to analyse the behaviour of the proposed metric using the MovieLens dataset. The results confirmed the behaviour that was expected, and showed that the proposed surprise metric is both effective and consistent for differing choices of recommendation algorithms, data representations and distance functions.

Seq2Seq2Sentiment: Multimodal Sequence to Sequence Models for Sentiment Analysis

Multimodal machine learning is a core research area spanning the language, visual and acoustic modalities. The central challenge in multimodal learning involves learning representations that can process and relate information from multiple modalities. In this paper, we propose two methods for unsupervised learning of joint multimodal representations using sequence to sequence (Seq2Seq) methods: a \textit{Seq2Seq Modality Translation Model} and a \textit{Hierarchical Seq2Seq Modality Translation Model}. We also explore multiple different variations on the multimodal inputs and outputs of these seq2seq models. Our experiments on multimodal sentiment analysis using the CMU-MOSI dataset indicate that our methods learn informative multimodal representations that outperform the baselines and achieve improved performance on multimodal sentiment analysis, specifically in the Bimodal case where our model is able to improve F1 Score by twelve points. We also discuss future directions for multimodal Seq2Seq methods.

Discovering Interesting Plots in Production Yield Data Analytics

An analytic process is iterative between two agents, an analyst and an analytic toolbox. Each iteration comprises three main steps: preparing a dataset, running an analytic tool, and evaluating the result, where dataset preparation and result evaluation, conducted by the analyst, are largely domain-knowledge driven. In this work, the focus is on automating the result evaluation step. The underlying problem is to identify plots that are deemed interesting by an analyst. We propose a methodology to learn such analyst’s intent based on Generative Adversarial Networks (GANs) and demonstrate its applications in the context of production yield optimization using data collected from several product lines.

Adaptive Learning Method of Recurrent Temporal Deep Belief Network to Analyze Time Series Data

Deep Learning has the hierarchical network architecture to represent the complicated features of input patterns. Such architecture is well known to represent higher learning capability compared with some conventional models if the best set of parameters in the optimal network structure is found. We have been developing the adaptive learning method that can discover the optimal network structure in Deep Belief Network (DBN). The learning method can construct the network structure with the optimal number of hidden neurons in each Restricted Boltzmann Machine and with the optimal number of layers in the DBN during learning phase. The network structure of the learning method can be self-organized according to given input patterns of big data set. In this paper, we embed the adaptive learning method into the recurrent temporal RBM and the self-generated layer into DBN. In order to verify the effectiveness of our proposed method, the experimental results are higher classification capability than the conventional methods in this paper.

Learning Neural Models for End-to-End Clustering

We propose a novel end-to-end neural network architecture that, once trained, directly outputs a probabilistic clustering of a batch of input examples in one pass. It estimates a distribution over the number of clusters k, and for each 1 \leq k \leq k_\mathrm{max}, a distribution over the individual cluster assignment for each data point. The network is trained in advance in a supervised fashion on separate data to learn grouping by any perceptual similarity criterion based on pairwise labels (same/different group). It can then be applied to different data containing different groups. We demonstrate promising performance on high-dimensional data like images (COIL-100) and speech (TIMIT). We call this “learning to cluster” and show its conceptual difference to deep metric learning, semi-supervise clustering and other related approaches while having the advantage of performing learnable clustering fully end-to-end.

Clustering Macroeconomic Time Series

The data mining technique of time series clustering is well established in many fields. However, as an unsupervised learning method, it requires making choices that are nontrivially influenced by the nature of the data involved. The aim of this paper is to verify usefulness of the time series clustering method for macroeconomics research, and to develop the most suitable methodology. By extensively testing various possibilities, we arrive at a choice of a dissimilarity measure (compression-based dissimilarity measure, or CDM) which is particularly suitable for clustering macroeconomic variables. We check that the results are stable in time and reflect large-scale phenomena such as crises. We also successfully apply our findings to analysis of national economies, specifically to identyfing their structural relations.

Causal discovery in the presence of missing data

Missing data are ubiquitous in many domains such as healthcare. Depending on how they are missing, the (conditional) independence relations in the observed data may be different from those for the complete data generated by the underlying causal process and, as a consequence, simply applying existing causal discovery methods to the observed data may lead to wrong conclusions. It is then essential to extend existing causal discovery approaches to find true underlying causal structure from such incomplete data. In this paper, we aim at solving this problem for data that are missing with different mechanisms, including missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). With missingness mechanisms represented by missingness Graph (m-Graph), we analyze conditions under which addition correction is needed to derive conditional independence/dependence relations in the complete data. Based on our analysis, we propose missing value PC (MVPC), which combines additional corrections with traditional causal discovery algorithm, in particular, PC. Our proposed MVPC is shown in theory to give asymptotically correct results even using data that are MAR and MNAR. Experiment results illustrate that the proposed algorithm can correct the conditional independence for values MCAR, MAR and rather general cases of values MNAR both with synthetic data as well as real-life healthcare application.

Improved SVD-based Initialization for Nonnegative Matrix Factorization using Low-Rank Correction

Due to the iterative nature of most nonnegative matrix factorization (\textsc{NMF}) algorithms, initialization is a key aspect as it significantly influences both the convergence and the final solution obtained. Many initialization schemes have been proposed for NMF, among which one of the most popular class of methods are based on the singular value decomposition (SVD). However, these SVD-based initializations do not satisfy a rather natural condition, namely that the error should decrease as the rank of factorization increases. In this paper, we propose a novel SVD-based \textsc{NMF} initialization to specifically address this shortcoming by taking into account the SVD factors that were discarded to obtain a nonnegative initialization. This method, referred to as nonnegative SVD with low-rank correction (NNSVD-LRC), allows us to significantly reduce the initial error at a negligible additional computational cost using the low-rank structure of the discarded SVD factors. NNSVD-LRC has two other advantages compared to previous SVD-based initializations: (1) it provably generates sparse initial factors, and (2) it is faster as it only requires to compute a truncated SVD of rank \lceil r/2 + 1 \rceil where r is the factorization rank of the sought NMF decomposition (as opposed to a rank-r truncated SVD for other methods). We show on several standard dense and sparse data sets that our new method competes favorably with state-of-the-art SVD-based initializations for NMF.

Modeling Data Lake Metadata with a Data Vault

With the rise of big data, business intelligence had to find solutions for managing even greater data volumes and variety than in data warehouses, which proved ill-adapted. Data lakes answer these needs from a storage point of view, but require managing adequate metadata to guarantee an efficient access to data. Starting from a multidimensional metadata model designed for an industrial heritage data lake presenting a lack of schema evolutivity, we propose in this paper to use ensemble modeling, and more precisely a data vault, to address this issue. To illustrate the feasibility of this approach, we instantiate our metadata conceptual model into relational and document-oriented logical and physical models, respectively. We also compare the physical models in terms of metadata storage and query response time.

DeSTNet: Densely Fused Spatial Transformer Networks

Modern Convolutional Neural Networks (CNN) are extremely powerful on a range of computer vision tasks. However, their performance may degrade when the data is characterised by large intra-class variability caused by spatial transformations. The Spatial Transformer Network (STN) is currently the method of choice for providing CNNs the ability to remove those transformations and improve performance in an end-to-end learning framework. In this paper, we propose Densely Fused Spatial Transformer Network (DeSTNet), which, to the best of our knowledge, is the first dense fusion pattern for combining multiple STNs. Specifically, we show how changing the connectivity pattern of multiple STNs from sequential to dense leads to more powerful alignment modules. Extensive experiments on three benchmarks namely, MNIST, GTSRB, and IDocDB show that the proposed technique outperforms related state-of-the-art methods (i.e., STNs and CSTNs) both in terms of accuracy and robustness.

UniParse: A universal graph-based parsing toolkit

This paper describes the design and use of the graph-based parsing framework and toolkit UniParse, released as an open-source python software package. UniParse as a framework novelly streamlines research prototyping, development and evaluation of graph-based dependency parsing architectures. UniParse does this by enabling highly efficient, sufficiently independent, easily readable, and easily extensible implementations for all dependency parser components. We distribute the toolkit with ready-made configurations as re-implementations of all current state-of-the-art first-order graph-based parsers, including even more efficient Cython implementations of both encoders and decoders, as well as the required specialised loss functions.

A Recurrent Neural Network Survival Model: Predicting Web User Return Time

The size of a website’s active user base directly affects its value. Thus, it is important to monitor and influence a user’s likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation.

Exploiting statistical dependencies of time series with hierarchical correlation reconstruction

While we are usually focused on predicting future values of time series, it is often valuable to additionally predict their entire probability distributions, for example to evaluate risk or Monte Carlo simulations. On example of time series of \approx 30000 Dow Jones Industrial Averages, there will be shown application of hierarchical correlation reconstruction for this purpose: mean-square fitting polynomial as joint density for (current value, context), where context is for example a few previous values. Then substituting the currently observed context and normalizing density to 1, we get predicted probability distribution for the current value. In contrast to standard machine learning approaches like neural networks, optimal coefficients here can be inexpensively directly calculated, are unique and independent, each has a specific cumulant-like interpretation, and such approximation can approach complete description of any joint distribution – providing a perfect tool to quantitatively describe and exploit statistical dependencies in time series.

TherML: Thermodynamics of Machine Learning

In this work we offer a framework for reasoning about a wide class of existing objectives in machine learning. We develop a formal correspondence between this work and thermodynamics and discuss its implications.

The importance of being dissimilar in Recommendation

Similarity measures play a fundamental role in memory-based nearest neighbors approaches. They recommend items to a user based on the similarity of either items or users in a neighborhood. In this paper we argue that, although it keeps a leading importance in computing recommendations, similarity between users or items should be paired with a value of dissimilarity (computed not just as the complement of the similarity one). We formally modeled and injected this notion in some of the most used similarity measures and evaluated our approach showing its effectiveness in terms of accuracy results.

Measuring abstract reasoning in neural networks

Whether neural networks can learn abstract reasoning or whether they merely rely on superficial statistics is a topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation `regimes’ in which the training and test data differ in clearly-defined ways. We show that popular models such as ResNets perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture, with a structure designed to encourage reasoning, that does significantly better. When we vary the way in which the test questions and training data differ, we find that our model is notably proficient at certain forms of generalisation, but notably weak at others. We further show that the model’s ability to generalise improves markedly if it is trained to predict symbolic explanations for its answers. Altogether, we introduce and explore ways to both measure and induce stronger abstract reasoning in neural networks. Our freely-available dataset should motivate further progress in this direction.

Limits of Treewidth-based tractability in Optimization
Vertex and Edge connectivity of the zero divisor graph $Γ[\mathbb {Z}_n]$
Real-time clustering and multi-target tracking using event-based sensors
A Filter of Minhash for Image Similarity Measures
Dynamic Objects Segmentation for Visual Localization in Urban Environments
Emergence of Altruism Behavior for Multi Feeding Areas in Army Ant Social Evolutionary System
Multi-stage splitting integrators for sampling with modified Hamiltonian Monte Carlo methods
Soft-TTL: Time-Varying Fractional Caching
The Introduction of the Mean Field Approximation to Psychology: Combining Dynamical Systems Theory and Network Theory in Major Depressive Disorder
Fooling the classifier: Ligand antagonism and adversarial examples
A multi-sensor data-driven methodology for all-sky passive microwave inundation retrieval
Verification of Uncertain POMDPs Using Barrier Certificates
Improved Time and Space Bounds for Dynamic Range Mode
A theoretical framework of the scaled Gaussian stochastic process in prediction and calibration
Online Facility Location with Deletions
Model-based free-breathing cardiac MRI reconstruction using deep learned \& STORM priors: MoDL-STORM
Scalable Katz Ranking Computation in Large Static and Dynamic Graphs
No percolation at criticality
A Cautionary Tail: A Framework and Casey Study for Testing Predictive Model Validity
Analyzing Highly Volatile Driving Trips Taken by Alternative Fuel Vehicles
Learning Implicit Generative Models by Teaching Explicit Ones
‘Factual’ or ‘Emotional’: Stylized Image Captioning with Adaptive Learning and Attention
Effects of Some Lattice Reductions on the Success Probability of the Zero-Forcing Decoder
Using deep learning for comprehensive, personalized forecasting of Alzheimer’s Disease progression
DeepDiff: Deep-learning for predicting Differential gene expression from histone modifications
A Practical Reconstruction Method for Three-Dimensional Phase Contrast Atomic Electron Tomography
Vision System for AGI: Problems and Directions
A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks
Estimators of the proportion of false null hypotheses: I ‘universal construction via Lebesgue-Stieltjes integral equations and uniform consistency under independence’
Decay of correlations and uniqueness of the infinite-volume Gibbs measure of the canonical ensemble of 1d-lattice systems
Fluctuation and Rate of Convergence for the Stochastic Heat Equation in Weak Disorder
The asymptotic behaviors of Hawkes information diffusion processes for a large number of individuals
Deep Imbalanced Attribute Classification using Visual Attention Aggregation
The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization
Emotion Recognition from Speech based on Relevant Feature and Majority Voting
Neural networks as ‘hidden’ variable models for quantum systems
Decoding Reed-Muller and Polar Codes by Successive Factor Graph Permutations
Bin Decompositions
A Driver Behavior Modeling Structure Based on Non-parametric Bayesian Stochastic Hybrid Architecture
Outage Correlation in Finite and Clustered Wireless Networks
Harnack inequalities for a class of semilinear stochastic partial differential equations
Generative Adversarial Networks with Decoder-Encoder Output Noise
Attacks and alignments: rooks, set partitions, and permutations
Quantification under prior probability shift: the ratio estimator and its extensions
Robust Beamforming Design in a NOMA Cognitive Radio Network Relying on SWIPT
A Hierarchical Bayesian Linear Regression Model with Local Features for Stochastic Dynamics Approximation
Instance-based entropy fuzzy support vector machine for imbalanced data
Pollution State Modeling for Mexico City
Power Flow Solvers for Direct Current Networks
Normalized Laplacian spectrum of some generalized subdivision-corona of two regular graphs
Recurrent neural networks running on quantum spins: memory accuracy and capacity
A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Behavior Change
Towards Understanding End-of-trip Instructions in a Taxi Ride Scenario
Linear relations on LLT polynomials and their k-Schur positivity for k=2
Shortening Time Required for Adaptive Structural Learning Method of Deep Belief Network with Multi-Modal Data Arrangement
Knowledge Extracted from Recurrent Deep Belief Network for Real Time Deterministic Control
An improved neural network model for joint POS tagging and dependency parsing
Deep attention-based classification network for robust depth prediction
Complete results for a numerical evaluation of interior point solvers for large-scale optimal power flow problems
Approximation of The Constrained Joint Spectral Radius via Algebraic Lifting
The Minkowski Property and Reflexivity of Marked Poset Polytopes
Graph Operations and Neighborhood Polynomials
Testing Global Constraints
Sequential Voting with Confirmation Network
Paradoxes in Sequential Voting
A note on the distribution of the product of zero mean correlated normal random variables
A knowledge based system approach in securing distributed wireless sensor networks
Prevention of Metro Rail Accidents and Incidents in Stations Using RFID Technology
On van Hamme’s (A.2) and (H.2) supercongruences
Nonasymptotic control of the MLE for misspecified nonparametric hidden Markov models
Interfering Channel Estimation in Radar-Cellular Coexistence: How Much Information Do We Need
A Multidimensional Hierarchical Framework for Modeling Speed and Ability in Computer-based Multidimensional Tests
Faster FISTA
Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Interfaces
On catastrophic forgetting and mode collapse in Generative Adversarial Networks
Collisions of several walkers in recurrent random environments
On Bayesian Estimation And Proximity Operators
Variable metric algorithms driven by averaged operators
On Error Bounds and Multiplier Methods for Variational Problems in Banach Spaces
Learning Singularity Avoidance
On nested code pairs from the Hermitian curve
Stabilization of the non-homogeneous Navier-Stokes equations in a 2d channel
CG-DIQA: No-reference Document Image Quality Assessment Based on Character Gradient
DCNN-based Human-Interpretable Post-mortem Iris Recognition
A Robust Nonlinear RLS Type Adaptive Filter for Second-Order-Intermodulation Distortion Cancellation in FDD LTE and 5G Direct Conversion Transceivers
Temporal Convolution Networks for Real-Time Abdominal Fetal Aorta Analysis with Ultrasound
Presentation Attack Detection for Cadaver Irises
Cross-spectral Iris Recognition for Mobile Applications using High-quality Color Images
Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions
MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network
A Security Index for Actuators Based on Perfect Undetectability: Properties and Approximation
Self-Estimation of Path-Loss Exponent in Wireless Networks and Applications
A punishment voting algorithm based on super categories construction for acoustic scene classification
Recognising Cardiac Abnormalities in Wearable Device Photoplethysmography (PPG) with Deep Learning
Proactive Intervention to Downtrend Employee Attrition using Artificial Intelligence Techniques
Random motion on finite rings, II: Noncommutative rings
FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs
Factor models with many assets: strong factors, weak factors, and the two-pass procedure
Variational Capsules for Image Analysis and Synthesis
VFunc: a Deep Generative Model for Functions
The Human Geography of Twitter
Puncturing maximum rank distance codes
Modeling and Soft-fault Diagnosis of Underwater Thrusters with Recurrent Neural Networks
On reproducing kernels, and analysis of measures
The Weighted Davenport Constant of a group and a related extremal problem
Robust relative error estimation
Metastable Markov chains
Fast and exact simulation of isotropic Gaussian random fields on $\mathbb{S}^{2}$ and $\mathbb{S}^{2}\times \mathbb{R}$
JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion
Co-KV: A Collaborative Key-Value Store Using Near-Data Processing to Improve Compaction for the LSM-tree
Integrality Gap of the Configuration LP for the Restricted Max-Min Fair Allocation
Data-Driven Segmentation of Post-mortem Iris Images
Optimal Exploitation of a Resource with Stochastic Population Dynamics and Delayed Renewal
Heterogeneous Effects of Unconventional Monetary Policy on Loan Demand and Supply. Insights from the Bank Lending Survey
Fully Polynomial-Time Approximation Schemes for Fair Rent Division
Using Recursive Partitioning to Find and Estimate Heterogenous Treatment Effects In Randomized Clinical Trials
Underwater Image Haze Removal and Color Correction with an Underwater-ready Dark Channel Prior
Decision method choice in a human posture recognition context
Linear Transformations for Cross-lingual Semantic Textual Similarity
Statistical analysis of chiral structured ensembles: role of matrix constraints
Cross-lingual Word Analogies using Linear Transformations between Semantic Spaces
A probabilistic gridded product for daily precipitation extremes over the United States
Explainable Security
Unusual area-law violation in random inhomogeneous systems
On error representation in exact-decisions number types
Optimization over Continuous and Multi-dimensional Decisions with Observational Data
A Fixed-Parameter Linear-Time Algorithm for Maximum Flow in Planar Flow Networks
VTA: An Open Hardware-Software Stack for Deep Learning
A Computational Method for Evaluating UI Patterns
From Hawkes-type processes to stochastic volatility
Distributed Variational Representation Learning
Bounded-Excess Flows in Cubic Graphs
Cut-and-join equation for monotone Hurwitz numbers revisited
Optimal control problems with oscillations, concentrations and discontinuities
With Friends Like These, Who Needs Adversaries
\textbf{simode}: \textbf{R} Package for statistical inference of ordinary differential equations using separable integral-matching
Time and Local Popularity in top-N Recommendation
Cross-layer Interference Modeling for 5G MmWave Networks in the Presence of Blockage
Rank of weighted digraphs with blocks
Differentially Private False Discovery Rate Control
A Collaborative Ranking Model with Multiple Location-based Similarities for Venue Suggestion
Statistical estimation of superhedging prices
Tuning the orbital-lattice fluctuations in the mixed spin-dimer system Ba$_{3-x}$Sr$_{x}$Cr$_{2}$O$_{8}$
A Two-Stage Auction Mechanism for Cloud Resource Allocation
How Local is the Local Diversity Reinforcing Sequential Determinantal Point Processes with Dynamic Ground Sets for Supervised Video Summarization
Modified Regularized Dual Averaging Method for Training Sparse Convolutional Neural Networks
Path-by-path well-posedness of nonlinear diffusion equations with multiplicative noise
Geometric comparison of phylogenetic trees with different leaf sets
Koopman Performance Analysis of Nonlinear Consensus Networks
Spectra of Hadamard matrices
Morse Code Datasets for Machine Learning
DeepMove: Learning Place Representations through Large Scale Movement Data
Genome-scale estimation of cellular objectives
Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization
On the Fundamental Limits of Coded Data Shuffling for Distributed Learning Systems
Phase Retrieval Under a Generative Prior
Knowledge Compilation, Width and Quantification