Elastic Neural Networks for Classification

In this work we propose a framework for improving the performance of any deep neural network that may suffer from vanishing gradients. To address the vanishing gradient issue, we study a framework, where we insert an intermediate output branch after each layer in the computational graph and use the corresponding prediction loss for feeding the gradient to the early layers. The framework – which we name Elastic network – is tested with several well-known networks on CIFAR10 and CIFAR100 datasets, and the experimental results show that the proposed framework improves the accuracy on both shallow networks (e.g., MobileNet) and deep convolutional neural networks (e.g., DenseNet). We also identify the types of networks where the framework does not improve the performance and discuss the reasons. Finally, as a side product, the computational complexity of the resulting networks can be adjusted in an elastic manner by selecting the output branch according to current computational budget.

Taming VAEs

In spite of remarkable progress in deep latent variable generative modeling, training still remains a challenge due to a combination of optimization and generalization issues. In practice, a combination of heuristic algorithms (such as hand-crafted annealing of KL-terms) is often used in order to achieve the desired results, but such solutions are not robust to changes in model architecture or dataset. The best settings can often vary dramatically from one problem to another, which requires doing expensive parameter sweeps for each new case. Here we develop on the idea of training VAEs with additional constraints as a way to control their behaviour. We first present a detailed theoretical analysis of constrained VAEs, expanding our understanding of how these models work. We then introduce and analyze a practical algorithm termed Generalized ELBO with Constrained Optimization, GECO. The main advantage of GECO for the machine learning practitioner is a more intuitive, yet principled, process of tuning the loss. This involves defining of a set of constraints, which typically have an explicit relation to the desired model performance, in contrast to tweaking abstract hyper-parameters which implicitly affect the model behavior. Encouraging experimental results in several standard datasets indicate that GECO is a very robust and effective tool to balance reconstruction and compression constraints.

Predicted Variables in Programming

We present Predicted Variables (PVars), an approach to making machine learning (ML) a first class citizen in programming languages. There is a growing divide in approaches to building systems: using human experts (e.g. programming) on the one hand, and using behavior learned from data (e.g. ML) on the other hand. PVars aim to make ML in programming as easy as `if’ statements and with that hybridize ML with programming. We leverage the existing concept of variables and create a new type, a predicted variable. PVars are akin to native variables with one important distinction: PVars determine their value using ML when evaluated. We describe PVars and their interface, how they can be used in programming, and demonstrate the feasibility of our approach on three algorithmic problems: binary search, Quicksort, and caches. We show experimentally that PVars are able to improve over the commonly used heuristics and lead to a better performance than the original algorithms. As opposed to previous work applying ML to algorithmic problems, PVars have the advantage that they can be used within the existing frameworks and do not require the existing domain knowledge to be replaced. PVars allow for a seamless integration of ML into existing systems and algorithms. Our PVars implementation currently relies on standard Reinforcement Learning (RL) methods. To learn faster, PVars use the heuristic function, which they are replacing, as an initial function. We show that PVars quickly pick up the behavior of the initial function and then improve performance beyond that without ever performing substantially worse — allowing for a safe deployment in critical applications.

Fusion Hashing: A General Framework for Self-improvement of Hashing

Hashing has been widely used for efficient similarity search based on its query and storage efficiency. To obtain better precision, most studies focus on designing different objective functions with different constraints or penalty terms that consider neighborhood information. In this paper, in contrast to existing hashing methods, we propose a novel generalized framework called fusion hashing (FH) to improve the precision of existing hashing methods without adding new constraints or penalty terms. In the proposed FH, given an existing hashing method, we first execute it several times to get several different hash codes for a set of training samples. We then propose two novel fusion strategies that combine these different hash codes into one set of final hash codes. Based on the final hash codes, we learn a simple linear hash function for the samples that can significantly improve model precision. In general, the proposed FH can be adopted in existing hashing method and achieve more precise and stable performance compared to the original hashing method with little extra expenditure in terms of time and space. Extensive experiments were performed based on three benchmark datasets and the results demonstrate the superior performance of the proposed framework

Perfect Match: A Simple Method for Learning Representations For Counterfactual Inference With Neural Networks

Learning representations for counterfactual inference from observational data is of high practical relevance for many domains, such as healthcare, public policy and economics. Counterfactual inference enables one to answer ‘What if…?’ questions, such as ‘What would be the outcome if we gave this patient treatment t_1?’. However, current methods for training neural networks for counterfactual inference on observational data are either overly complex, limited to settings with only two available treatment options, or both. Here, we present Perfect Match (PM), a method for training neural networks for counterfactual inference that is easy to implement, compatible with any architecture, does not add computational complexity or hyperparameters, and extends to any number of treatments. PM is based on the idea of augmenting samples within a minibatch with their propensity-matched nearest neighbours. Our experiments demonstrate that PM outperforms a number of more complex state-of-the-art methods in inferring counterfactual outcomes across several real-world and semi-synthetic datasets.

Rare event simulation for stochastic dynamics in continuous time

Large deviations for additive path functionals and convergence properties for numerical approaches based on population dynamics have attracted recent research interest. The aim of this paper is twofold. Extending results from the literature of particle filters and sequential Monte Carlo methods we can establish rigorous bounds on convergence properties of the cloning algorithm in continuous time, which are reported in this paper with details of proofs given in a further publication. Secondly, the tilted generator characterizing the large deviation rate function can be associated to non-linear processes which give rise to several representations of the dynamics and additional freedom for associated particle approximations. We discuss these choices in detail, and combine insights from the filtering literature and the cloning algorithm to suggest a more efficient version of the algorithm.

Counterfactually Fair Prediction Using Multiple Causal Models

In this paper we study the problem of making predictions using multiple structural casual models defined by different agents, under the constraint that the prediction satisfies the criterion of counterfactual fairness. Relying on the frameworks of causality, fairness and opinion pooling, we build upon and extend previous work focusing on the qualitative aggregation of causal Bayesian networks and causal models. In order to complement previous qualitative results, we devise a method based on Monte Carlo simulations. This method enables a decision-maker to aggregate the outputs of the causal models provided by different experts while guaranteeing the counterfactual fairness of the result. We demonstrate our approach on a simple, yet illustrative, toy case study.

Classification Using Link Prediction

Link prediction in a graph is the problem of detecting the missing links that would be formed in the near future. Using a graph representation of the data, we can convert the problem of classification to the problem of link prediction which aims at finding the missing links between the unlabeled data (unlabeled nodes) and their classes. To our knowledge, despite the fact that numerous algorithms use the graph representation of the data for classification, none are using link prediction as the heart of their classifying procedure. In this work, we propose a novel algorithm called CULP (Classification Using Link Prediction) which uses a new structure namely Label Embedded Graph or LEG and a link predictor to find the class of the unlabeled data. Different link predictors along with Compatibility Score – a new link predictor we proposed that is designed specifically for our settings – has been used and showed promising results for classifying different datasets. This paper further improved CULP by designing an extension called CULM which uses a majority vote (hence the M in the acronym) procedure with weights proportional to the predictions’ confidences to use the predictive power of multiple link predictors and also exploits the low level features of the data. Extensive experimental evaluations shows that both CULP and CULM are highly accurate and competitive with the cutting edge graph classifiers and general classifiers.

The Profiling Machine: Active Generalization over Knowledge

The human mind is a powerful multifunctional knowledge storage and management system that performs generalization, type inference, anomaly detection, stereotyping, and other tasks. A dynamic KR system that appropriately profiles over sparse inputs to provide complete expectations for unknown facets can help with all these tasks. In this paper, we introduce the task of profiling, inspired by theories and findings in social psychology about the potential of profiles for reasoning and information processing. We describe two generic state-of-the-art neural architectures that can be easily instantiated as profiling machines to generate expectations and applied to any kind of knowledge to fill gaps. We evaluate these methods against Wikidata and crowd expectations, and compare the results to gain insight in the nature of knowledge captured by various profiling methods. We make all code and data available to facilitate future research.

On Theory for BART

Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying the foundations for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into branching process theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes exploiting their connection to random walks. We conclude with a result stating the optimal rate of posterior convergence for BART.

Graph Diffusion-Embedding Networks

We present a novel graph diffusion-embedding networks (GDEN) for graph structured data. GDEN is motivated by our closed-form formulation on regularized feature diffusion on graph. GDEN integrates both regularized feature diffusion and low-dimensional embedding simultaneously in a unified network model. Moreover, based on GDEN, we can naturally deal with structured data with multiple graph structures. Experiments on semi-supervised learning tasks on several benchmark datasets demonstrate the better performance of the proposed GDEN when comparing with the traditional GCN models.

How Powerful are Graph Neural Networks?

Graph Neural Networks (GNNs) for representation learning of graphs broadly follow a neighborhood aggregation framework, where the representation vector of a node is computed by recursively aggregating and transforming feature vectors of its neighboring nodes. Many GNN variants have been proposed and have achieved state-of-the-art results on both node and graph classification tasks. However, despite GNNs revolutionizing graph representation learning, there is limited understanding of their representational properties and limitations. Here, we present a theoretical framework for analyzing the expressive power of GNNs in capturing different graph structures. Our results characterize the discriminative power of popular GNN variants, such as Graph Convolutional Networks and GraphSAGE, and show that they cannot learn to distinguish certain simple graph structures. We then develop a simple architecture that is provably the most expressive among the class of GNNs and is as powerful as the Weisfeiler-Lehman graph isomorphism test. We empirically validate our theoretical findings on a number of graph classification benchmarks, and demonstrate that our model achieves state-of-the-art performance.

Integrated Principal Components Analysis

Data integration, or the strategic analysis of multiple sources of data simultaneously, can often lead to discoveries that may be hidden in individualistic analyses of a single data source. We develop a new statistical data integration method named Integrated Principal Components Analysis (iPCA), which is a model-based generalization of PCA and serves as a practical tool to find and visualize common patterns that occur in multiple datasets. The key idea driving iPCA is the matrix-variate normal model, whose Kronecker product covariance structure captures both individual patterns within each dataset and joint patterns shared by multiple datasets. Building upon this model, we develop several penalized (sparse and non-sparse) covariance estimators for iPCA and study their theoretical properties. We show that our sparse iPCA estimator consistently estimates the underlying joint subspace, and using geodesic convexity, we prove that our non-sparse iPCA estimator converges to the global solution of a non-convex problem. We also demonstrate the practical advantages of iPCA through simulations and a case study application to integrative genomics for Alzheimer’s Disease. In particular, we show that the joint patterns extracted via iPCA are highly predictive of a patient’s cognition and Alzheimer’s diagnosis.

Challenges of Using Text Classifiers for Causal Inference

Causal understanding is essential for many kinds of decision-making, but causal inference from observational data has typically only been applied to structured, low-dimensional datasets. While text classifiers produce low-dimensional outputs, their use in causal inference has not previously been studied. To facilitate causal analyses based on language data, we consider the role that text classifiers can play in causal inference through established modeling mechanisms from the causality literature on missing data and measurement error. We demonstrate how to conduct causal analyses using text classifiers on simulated and Yelp data, and discuss the opportunities and challenges of future work that uses text data in causal inference.

Neural Regression Trees

Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one. Current approaches for RvC use ad-hoc discretization strategies and are suboptimal. We propose a neural regression tree model for RvC. In this model, we employ a joint optimization framework where we learn optimal discretization thresholds while simultaneously optimizing the features for each node in the tree. We empirically show the validity of our model by testing it on two challenging regression tasks where we establish the state of the art.

plsRglm: Partial least squares linear and generalized linear regression for processing incomplete datasets by cross-validation and bootstrap techniques with R

The aim of the plsRglm package is to deal with complete and incomplete datasets through several new techniques or, at least, some which were not yet implemented in R. Indeed, not only does it make available the extension of the PLS regression to the generalized linear regression models, but also bootstrap techniques, leave-one-out and repeated k-fold cross-validation. In addition, graphical displays help the user to assess the significance of the predictors when using bootstrap techniques. Biplots (Fig. 4) can be used to delve into the relationship between individuals and variables.

Feature Selection Approach with Missing Values Conducted for Statistical Learning: A Case Study of Entrepreneurship Survival Dataset

In this article, we investigate the features which enhanced discriminate the survival in the micro and small business (MSE) using the approach of data mining with feature selection. According to the complexity of the data set, we proposed a comparison of three data imputation methods such as mean imputation (MI), k-nearest neighbor (KNN) and expectation maximization (EM) using mutually the selection of variables technique, whereby t-test, then through the data mining process using logistic regression classification methods, naive Bayes algorithm, linear discriminant analysis and support vector machine hence comparing their respective performances. The experimental results will be spread in developing a model to predict the MSE survival, providing a better understanding in the topic once it is a significant part of the Brazilian’ GPA and macroeconomy.

A flexible sequential Monte Carlo algorithm for shape-constrained regression

We propose an algorithm that is capable of imposing shape constraints on regression curves, without requiring the constraints to be written as closed-form expressions, nor assuming the functional form of the loss function. Our algorithm, which is based on Sequential Monte Carlo-Simulated Annealing, only relies on an indicator function that assesses whether or not the constraints are fulfilled, thus allowing us to enforce various complex constraints by specifying an appropriate indicator function without altering other parts of the algorithm. We demonstrate our algorithm by fitting rational function models subject to monotonicity and continuity constraints. The algorithm was implemented using R (R Core Team, 2018) and the code is freely available on GitHub.

Target Aware Network Adaptation for Efficient Representation Learning

This paper presents an automatic network adaptation method that finds a ConvNet structure well-suited to a given target task, e.g., image classification, for efficiency as well as accuracy in transfer learning. We call the concept target-aware transfer learning. Given only small-scale labeled data, and starting from an ImageNet pre-trained network, we exploit a scheme of removing its potential redundancy for the target task through iterative operations of filter-wise pruning and network optimization. The basic motivation is that compact networks are on one hand more efficient and should also be more tolerant, being less complex, against the risk of overfitting which would hinder the generalization of learned representations in the context of transfer learning. Further, unlike existing methods involving network simplification, we also let the scheme identify redundant portions across the entire network, which automatically results in a network structure adapted to the task at hand. We achieve this with a few novel ideas: (i) cumulative sum of activation statistics for each layer, and (ii) a priority evaluation of pruning across multiple layers. Experimental results by the method on five datasets (Flower102, CUB200-2011, Dog120, MIT67, and Stanford40) show favorable accuracies over the related state-of-the-art techniques while enhancing the computational and storage efficiency of the transferred model.

AI Benchmark: Running Deep Neural Networks on Android Smartphones

Over the last years, the computational power of mobile devices such as smartphones and tablets has grown dramatically, reaching the level of desktop computers available not long ago. While standard smartphone apps are no longer a problem for them, there is still a group of tasks that can easily challenge even high-end devices, namely running artificial intelligence algorithms. In this paper, we present a study of the current state of deep learning in the Android ecosystem and describe available frameworks, programming models and the limitations of running AI on smartphones. We give an overview of the hardware acceleration resources available on four main mobile chipset platforms: Qualcomm, HiSilicon, MediaTek and Samsung. Additionally, we present the real-world performance results of different mobile SoCs collected with AI Benchmark that are covering all main existing hardware configurations.

Sinkhorn AutoEncoders

Optimal Transport offers an alternative to maximum likelihood for learning generative autoencoding models. We show how this principle dictates the minimization of the Wasserstein distance between the encoder aggregated posterior and the prior, plus a reconstruction error. We prove that in the non-parametric limit the autoencoder generates the data distribution if and only if the two distributions match exactly, and that the optimum can be obtained by deterministic autoencoders. We then introduce the Sinkhorn AutoEncoder (SAE), which casts the problem into Optimal Transport on the latent space. The resulting Wasserstein distance is minimized by backpropagating through the Sinkhorn algorithm. SAE models the aggregated posterior as an implicit distribution and therefore does not need a reparameterization trick for gradients estimation. Moreover, it requires virtually no adaptation to different prior distributions. We demonstrate its flexibility by considering models with hyperspherical and Dirichlet priors, as well as a simple case of probabilistic programming. SAE matches or outperforms other autoencoding models in visual quality and FID scores.

Robust Optimization through Neuroevolution

We propose a method for evolving solutions that are robust with respect to variations of the environmental conditions (i.e. that can operate effectively in new conditions immediately, without the need to adapt to variations). The obtained results show how the method proposed is effective and computational tractable. It permits to improve performance on an extended version of the double-pole balancing problem, to outperform the best available human-designed controllers on a car racing problem, and to generate rather effective solutions for a swarm robotic problem. The comparison of different algorithms indicates that the CMA-ES and xNES methods, that operate by optimizing a distribution of parameters, represent the best options for the evolution of robust neural network controllers.

Adversarial Examples – A Complete Characterisation of the Phenomenon

We provide a complete characterisation of the phenomenon of adversarial examples – inputs intentionally crafted to fool machine learning models. We aim to cover all the important concerns in this field of study: (1) the conjectures on the existence of adversarial examples, (2) the security, safety and robustness implications, (3) the methods used to generate and (4) protect against adversarial examples and (5) the ability of adversarial examples to transfer between different machine learning models. We provide ample background information in an effort to make this document self-contained. Therefore, this document can be used as survey, tutorial or as a catalog of attacks and defences using adversarial examples.

Inference Over Programs That Make Predictions

This abstract extends on the previous work (arXiv:1407.2646, arXiv:1606.00075) on program induction using probabilistic programming. It describes possible further steps to extend that work, such that, ultimately, automatic probabilistic program synthesis can generalise over any reasonable set of inputs and outputs, in particular in regard to text, image and video data.

High-dimensional functional time series forecasting: An application to age-specific mortality rates

We address the problem of forecasting high-dimensional functional time series through a two-fold dimension reduction procedure. The difficulty of forecasting high-dimensional functional time series lies in the curse of dimensionality. In this paper, we propose a novel method to solve this problem. Dynamic functional principal component analysis is first applied to reduce each functional time series to a vector. We then use the factor model as a further dimension reduction technique so that only a small number of latent factors are preserved. Classic time series models can be used to forecast the factors and conditional forecasts of the functions can be constructed. Asymptotic properties of the approximated functions are established, including both estimation error and forecast error. The proposed method is easy to implement especially when the dimension of the functional time series is large. We show the superiority of our approach by both simulation studies and an application to Japanese age-specific mortality rates.

Near-Linear Approximation Algorithms for Scheduling Problems with Batch Setup Times

We investigate the scheduling of n jobs divided into c classes/batches on m identical parallel machines. For every class there is a sequence-independent setup time. This setup is required whenever a machine switches from the processing of one class to another class. The objective is to find a schedule that minimizes the makespan. We give near-linear approximation algorithms for the following problem variants: the non-preemptive problem context where jobs may not be preempted, the preemptive context where jobs may be preempted but not parallelized, as well as the splittable context where jobs may be preempted and parallelized. We present the first algorithm improving the previously best approximation ratio of 2 to a better ratio of 3/2 in the preemptive case. In more detail, for all three flavors we present an approximation ratio 2 with running time \mathcal{O}(n), ratio 3/2+\epsilon in time \mathcal{O}(n\log 1/\epsilon) as well as a ratio of 3/2. The (3/2)-approximate algorithms have different running times. For the non-preemptive case we get time \mathcal{O}(n\log (n+\Delta)) where \Delta is the largest value of the input. The splittable approximation needs a running time of \mathcal{O}(n+c\log(c+m)) whereas the algorithm for the preemptive context has a running time \mathcal{O}(n \log (c+m)) \leq \mathcal{O}(n \log n). So far, no PTAS is known for the preemptive problem without restrictions, so we make progress towards that question. Recently Jansen et al. found an EPTAS for the splittable and non-preemptive case but with impractical running times exponential in 1/\epsilon.

Causal inference under over-simplified longitudinal causal models

Most causal models of interest involve longitudinal exposures, confounders and mediators. However, in practice, repeated measurements are rarely available. Then, practitioners tend to overlook the time-varying nature of exposures and work under over-simplified causal models. In this work, we investigate whether, and how, the quantities estimated under these simplified models can be related to the true longitudinal causal effects. We focus on two common situations regarding the type of available data for exposures: when they correspond to (i) ‘instantaneous’ levels measured at inclusion in the study or (ii) summary measures of their levels up to inclusion in the study. Our results state that inference based on either ‘instantaneous’ levels or summary measures usually returns quantities that do not directly relate to any causal effect of interest and should be interpreted with caution. They raise the need for the availability of repeated measurements and/or the development of sensitivity analyses when such data is not available.

Learning with Random Learning Rates

Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the ‘All Learning Rates At Once’ (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. This comes at practically no computational cost. Perhaps surprisingly, stochastic gradient descent (SGD) with Alrao performs close to SGD with an optimally tuned learning rate, for various architectures and problems. Alrao could save time when testing deep learning models: a range of models could be quickly assessed with Alrao, and the most promising models could then be trained more extensively. This text comes with a PyTorch implementation of the method, which can be plugged on an existing PyTorch model.

Generative Ensembles for Robust Anomaly Detection

Deep generative models are capable of learning probability distributions over large, high-dimensional datasets such as images, video and natural language. Generative models trained on samples from p(x) ought to assign low likelihoods to out-of-distribution (OoD) samples from q(x), making them suitable for anomaly detection applications. We show that in practice, likelihood models are themselves susceptible to OoD errors, and even assign large likelihoods to images from other natural datasets. To mitigate these issues, we propose Generative Ensembles, a model-independent technique for OoD detection that combines density-based anomaly detection with uncertainty estimation. Our method outperforms ODIN and VIB baselines on image datasets, and achieves comparable performance to a classification model on the Kaggle Credit Fraud dataset.

Optimal Completion Distillation for Sequence Learning

We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyper-parameters of its own, and does not require pretraining or joint optimization with conditional log-likelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving 9.3\% WER and 4.5\% WER respectively.

GLAD: GLocalized Anomaly Detection via Active Feature Space Suppression

We propose an algorithm called GLAD (GLocalized Anomaly Detection) that allows end-users to retain the use of simple and understandable global anomaly detectors by automatically learning their local relevance to specific data instances using label feedback. The key idea is to place a uniform prior over the input feature space for each member of the anomaly detection ensemble via a neural network trained on unlabeled instances, and tune the weights of the neural network to adjust the local relevance of each ensemble member using all labeled instances. Our experiments on synthetic and real-world data show the effectiveness of GLAD in learning the local relevance of ensemble members and discovering anomalies via label feedback.

Optimal Adaptive and Accelerated Stochastic Gradient Descent
Probabilistic Meta-Representations Of Neural Networks
The information-theoretic meaning of Gagliardo–Nirenberg type inequalities
Performance Evaluation for LTE-V based Vehicle-to-Vehicle Platooning Communication
Adaptive Polling in Hierarchical Social Networks using Blackwell Dominance
On valid descriptive inference from non-probability sample
Slaying Hydrae: Improved Bounds for Generalized k-Server in Uniform Metrics
Spectra of networks
Selberg integrals in 1D random Euclidean optimization problems
Improved Ramsey-type results in comparability graphs
Detecting Satire in the News with Machine Learning
Composite optimization for the resource allocation problem
Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication
Unsupervised Trajectory Segmentation and Promoting of Multi-Modal Surgical Demonstrations
Orbital Stabilization of Nonlinear Systems via the Immersion and Invariance Technique
Privado: Practical and Secure DNN Inference
One-Click Annotation with Guided Hierarchical Object Detection
Phase resetting and intermittent control at critical edge of stability as major mechanisms of fractality in human gait cycle variability
Central Values for Clebsch-Gordan coefficients
Design and simulation of 1.28 Tbps dense wavelength division multiplex system suitable for long haul backbone
Explicit solutions of the kinetic and potential matching conditions of the energy shaping method
Polyline Simplification has Cubic Complexity
Approximation bounds on maximum edge 2-coloring of dense graphs
Neighborhood complexes, homotopy test graphs and a contribution to a conjecture of Hedetniemi
Role of time scales and topology on the dynamics of complex networks
Part-Level Convolutional Neural Networks for Pedestrian Detection Using Saliency and Boundary Box Alignment
Random Finite Set Theory and Optimal Control for Large Spacecraft Swarms
Data-driven Discovery of Cyber-Physical Systems
Well-posedness and Stability for Interconnection Structures of Port-Hamiltonian Type
On the existence of O’Nan configurations in Buekenhout unitals in PG(2,q^2)
TOP: Time-to-Event Bayesian Optimal Phase II Trial Design for Cancer Immunotherapy
Complete intersection Jordan types in height two
Joint Activity Detection and Channel Estimation for IoT Networks: Phase Transition and Computation-Estimation Tradeoff
SurfelMeshing: Online Surfel-Based Mesh Reconstruction
Towards Cereceda’s conjecture for planar graphs
Smeared phase transitions in percolation on real complex networks
Benchmark Analysis of Representative Deep Neural Network Architectures
Risk-Averse Stochastic Convex Bandit
Bayesian inference in high-dimensional linear models using an empirical correlation-adaptive prior
Game-Theoretic Choice of Curing Rates Against Networked SIS Epidemics by Human Decision-Makers
Improving the Generalization of Adversarial Training with Domain Adaptation
Quantum asymptotic spectra of graphs and non-commutative graphs, and quantum Shannon capacities
Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods
CBPF: leveraging context and content information for better recommendations
Analysis of the shortest relay queue policy in a cooperative random access network with collisions
Riemannian Adaptive Optimization Methods
Bisectors and pinned distances
Geometric Constellation Shaping for Fiber Optic Communication Systems via End-to-end Learning
Computing Dynamic User Equilibria on Large-Scale Networks: From Theory to Software Implementation
Mean Field Control and Mean Field Game Models with Several Populations
Enumerating minimal dominating sets in triangle-free graphs
Optimal Pricing For MHR Distributions
Accelerated Training of Large-Scale Gaussian Mixtures by a Merger of Sublinear Approaches
Solving 3SAT By Reduction To Testing For Odd Hole
Graph Isomorphism by Conversion to Chordal (6, 3) Graphs
Multilevel Adaptive Sparse Grid Quadrature for Monte Carlo models
Caterpillars in Erdős-Hajnal
Hypergraph polynomials and the Bernardi process
Two types of slow waves in anesthetized and sleeping brains
On the maximum number of copies of H in graphs with given size and order
RGB-D Object Detection and Semantic Segmentation for Autonomous Manipulation in Clutter
Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
Approximating mixed Hölder functions using random samples
Set Transformer
Singularity, Misspecification, and the Convergence Rate of EM
Adaptive Game-Theoretic Decision Making for Autonomous Vehicle Control at Roundabouts
Network Modeling and Pathway Inference from Incomplete Data (‘PathInf’)
CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs
Classification from Positive, Unlabeled and Biased Negative Data
Augmented Mitotic Cell Count using Field Of Interest Proposal
Truncated Laplacian Mechanism for Approximate Differential Privacy
Dynamic Sparse Graph for Efficient Deep Learning
ProxQuant: Quantized Neural Networks via Proximal Operators
Geometry of quadratic maps via convex relaxation
Dealing with State Estimation in Fractional-Order Systems under Artifacts
A Statistical Exploration of Duckworth-Lewis Method Using Bayesian Inference
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
Degree versions of theorems on intersecting families via stability
Handling Nominals and Inverse Roles using Algebraic Reasoning
Robust multivariate and functional archetypal analysis with application to financial time series analysis
Structure and properties of large intersecting families
Secrecy Analysis of Random MIMO Wireless Networks over $α$-$μ$ Fading Channels
Joint On-line Learning of a Zero-shot Spoken Semantic Parser and a Reinforcement Learning Dialogue Manager
Wikidata: A New Paradigm of Human-Bot Collaboration?
Few $T$ copies in $H$-saturated graphs
Modified diagonals and linear relations between small diagonals
Improved robustness to adversarial examples using Lipschitz regularization of the loss
The competitive exclusion principle in stochastic environments
On the density of sets of the Euclidean plane avoiding distance 1
Rigidity of the saddle connection complex
Natural measures of alignment
Efficient and Accurate Abnormality Mining from Radiology Reports with Customized False Positive Reduction
Utilizing a Transparency-driven Environment toward Trusted Automatic Genre Classification: A Case Study in Journalism History
On the discovery of the seed in uniform attachment trees
Abstract convex approximations of nonsmooth functions
A sampling framework for counting temporal motifs
Inertial-aided Motion Deblurring with Deep Networks
Ballot Permutations, Odd Order Permutations, and a New Permutation Statistic
A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption
Critical groups of van Lint-Schrijver Cyclotomic Strongly Regular Graphs
Learning Hash Codes via Hamming Distance Targets
Power domination throttling
CNN-SVO: Improving the Mapping in Semi-Direct Visual Odometry Using Single-Image Depth Prediction
Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults
AI for Trustworthiness! Credible User Identification on Social Web for Disaster Response Agencies
Bayesian Policy Optimization for Model Uncertainty
Convergence Rates for Empirical Estimation of Binary Classification Bounds
Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation
Large batch size training of neural networks with adversarial training and second-order information
Reinforcement Learning with Perturbed Rewards
Heterogeneous Replica for Query on Cassandra
Implementing the Lexicographic Maxmin Bargaining Solution
Nondegenerate spheres in four dimensions
A Unified Framework for Clustering Constrained Data without Locality Property
Improved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture
A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods
ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics
Super-Resolution Blind Channel-and-Signal Estimation for Massive MIMO with Arbitrary Array Geometry
Improving Sentence Representations with Multi-view Frameworks
Lattice points in vector-dilated quadratic irrational polytopes
PDE Acceleration: A convergence rate analysis and applications to obstacle problems
Cloud Chaser: Real Time Deep Learning Computer Vision on Low Computing Power Devices
Fully dynamic $3/2$ approximate maximum cardinality matching in $O(\sqrt{n})$ update time
NU-LiteNet: Mobile Landmark Recognition using Convolutional Neural Networks
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
On the Interference from VDE-SAT Downlink to the Incumbent Land Mobile System
A framework for generalized group testing with inhibitors and its potential application in neuroscience
Statistical learning with Lipschitz and convex loss functions
Ancient Coin Classification Using Graph Transduction Games
Relating Metric Distortion and Fairness of Social Choice Rules
Quantization-Aware Phase Retrieval
Cramér type moderate deviations for self-normalized $ψ$-mixing sequences
Video Imitation GAN: Learning control policies by imitating raw videos using generative adversarial reward estimation
Reconfiguring Graph Homomorphisms on the Sphere
The Dreaming Variational Autoencoder for Reinforcement Learning Environments
Who is Addressed in this Comment? Automatically Classifying Meta-Comments in News Comments
Inverse Gaussian quadrature and finite normal-mixture approximation of generalized hyperbolic distribution
Non-linear Model Predictive Control of Conically Shaped Liquid Storage Tanks
Predicate learning in neural systems: Discovering latent generative structures
Time Reversal as Self-Supervision
A Gini approach to spatial CO2 emissions
A deterministic polynomial kernel for Odd Cycle Transversal and Vertex Multiway Cut in planar graphs
Avoiding Burst-like Error Patterns in Windowed Decoding of Spatially Coupled LDPC Codes
Training compact deep learning models for video classification using circulant matrices
Retrofit Control with Approximate Environment Modeling
Coupled McKean-Vlasov diffusions: wellposedness, propagation of chaos and invariant measures
On characterizations of the covariance matrix
Know What Your Neighbors Do: 3D Semantic Segmentation of Point Clouds
Learning Discriminators as Energy Networks in Adversarial Learning
On Learning How to Communicate Over Noisy Channels for Collaborative Tasks
Sharp bounds for the chromatic number of random Kneser graphs
An Entropic Optimal Transport Loss for Learning Deep Neural Networks under Label Noise in Remote Sensing Images
Advanced Simulation of Droplet Microfluidics
Semi-supervised Text Regression with Conditional Generative Adversarial Networks
Variations on the CSC model
Findings of the E2E NLG Challenge
Note on the Estimation of Embedded Hermitian Gaussian Graphical Models for MEEG Source Activity and Connectivity Analysis in the Frequency Domain. Part I: Single Frequency Component and Subject
EMI: Exploration with Mutual Information Maximizing State and Action Embeddings
A variational formula for risk-sensitive control of diffusions in $\mathbb{R}^d$
Stochastic maximal regularity for rough time-dependent problems
Thompson Sampling for Cascading Bandits
Large deviations for the largest eigenvalue of Rademacher matrices
Sharp spectral bounds for the edge-connectivity of a regular graph
Characterization of Visual Object Representations in Rat Primary Visual Cortex
Consistent Maximum Likelihood Estimation Using Subsets with Applications to Multivariate Mixed Models
Bayesian approach to SETI
On non-repetitive sequences of arithmetic progressions:the cases $k \in \{4,5,6,7,8\}$
Rough infection fronts in a random medium
Approximation and sampling of multivariate probability distributions in the tensor train decomposition
All-optical Nonlinear Activation Function for Photonic Neural Networks
Sparse Gaussian Process Temporal Difference Learning for Marine Robot Navigation
CEM-RL: Combining evolutionary and gradient-based methods for policy search
The largest projective cube-free subsets of $\mathbb{Z}_{2^n}$
Long term behaviour of a reversible system of interacting random walks
Combinatorial Algorithms for General Linear Arrow-Debreu Markets
Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS
A Deep Autoencoder System for Differentiation of Cancer Types Based on DNA Methylation State
Fragility and anomalous susceptibility of weakly interacting networks
Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
From phase to amplitude oscillators
Bias Reduced Peaks over Threshold Tail Estimation
Sampling-based Estimation of In-degree Distribution with Applications to Directed Complex Networks
Strong solutions of SDEs with random and unbounded drifts
Landmine Detection Using Autoencoders on Multi-polarization GPR Volumetric Data
Moderate-Dimensional Inferences on Quadratic Functionals in Ordinary Least Squares
Hypocoercivity in Wasserstein-1 for the kinetic Fokker-Planck equation via Malliavin Calculus
FutureGAN: Anticipating the Future Frames of Video Sequences using Spatio-Temporal 3d Convolutions in Progressively Growing Autoencoder GANs
Finite Codimensional Controllability, and Optimal Control Problems with Endpoint State Constraints
The gap in Pure Traction Problems between Linear Elasticity and Variational Limit of Finite Elasticity
Majorization by Hemispheres & Quadratic Isoperimetric Constants
Unsupervised Emergence of Spatial Structure from Sensorimotor Prediction
Efficient Detectors for MIMO-OFDM Systems under Spatial Correlation Antenna Arrays
The structure of normal lattice supercharacter theories
Energy-Based Hindsight Experience Prioritization
On Self Modulation for Generative Adversarial Networks
FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models
Speed-Gradient Control of the Brockett Integrator
Semi-dense Stereo Matching using Dual CNNs
Covariate Distribution Balance via Propensity Scores
Efficient Dialog Policy Learning via Positive Memory Retention
Multi-scale Convolution Aggregation and Stochastic Feature Reuse for DenseNets
A Knowledge Hunting Framework for Common Sense Reasoning
Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation
Archimedean toroidal maps and their minimal almost regular covers
Approximating the Existential Theory of the Reals
Phasebook and Friends: Leveraging Discrete Representations for Source Separation
Sketching for Latent Dirichlet-Categorical Models
Attention Models with Random Features for Multi-layered Graph Embeddings
Super-Resolution via Conditional Implicit Maximum Likelihood Estimation
Can Adversarially Robust Learning Leverage Computational Hardness?