k-nearest neighbors prediction and classification for spatial data

We propose a nonparametric predictor and a supervised classification based on the regression function estimate of a spatial real variable using k-nearest neighbors method (k-NN). Under some assumptions, we establish almost complete or sure convergence of the proposed estimates which incorporate a spatial proximity between observations. Numerical results on simulated and real fish data illustrate the behavior of the given predictor and classification method.

Nonlinear Acceleration of CNNs

The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to neural networks, in particular for the task of image recognition on CIFAR10 and ImageNet. With very few modifications of exiting frameworks, RNA improves slightly the optimization process of CNNs, after training.

A Survey of Domain Adaptation for Neural Machine Translation

Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.

TAPAS: Train-less Accuracy Predictor for Architecture Search

In recent years an increasing number of researchers and practitioners have been suggesting algorithms for large-scale neural network architecture search: genetic algorithms, reinforcement learning, learning curve extrapolation, and accuracy predictors. None of them, however, demonstrated high-performance without training new experiments in the presence of unseen datasets. We propose a new deep neural network accuracy predictor, that estimates in fractions of a second classification performance for unseen input datasets, without training. In contrast to previously proposed approaches, our prediction is not only calibrated on the topological network information, but also on the characterization of the dataset-difficulty which allows us to re-tune the prediction without any training. Our predictor achieves a performance which exceeds 100 networks per second on a single GPU, thus creating the opportunity to perform large-scale architecture search within a few minutes. We present results of two searches performed in 400 seconds on a single GPU. Our best discovered networks reach 93.67% accuracy for CIFAR-10 and 81.01% for CIFAR-100, verified by training. These networks are performance competitive with other automatically discovered state-of-the-art networks however we only needed a small fraction of the time to solution and computational resources.

Lecture Notes: Temporal Point Processes and the Conditional Intensity Function

These short lecture notes contain a not too technical introduction to point processes on the time line. The focus lies on defining these processes using the conditional intensity function. Furthermore, likelihood inference, methods of simulation and residual analysis for temporal point processes specified by a conditional intensity function are considered.

Being curious about the answers to questions: novelty search with learned attention

We investigate the use of attentional neural network layers in order to learn a `behavior characterization’ which can be used to drive novelty search and curiosity-based policies. The space is structured towards answering a particular distribution of questions, which are used in a supervised way to train the attentional neural network. We find that in a 2d exploration task, the structure of the space successfully encodes local sensory-motor contingencies such that even a greedy local `do the most novel action’ policy with no reinforcement learning or evolution can explore the space quickly. We also apply this to a high/low number guessing game task, and find that guessing according to the learned attention profile performs active inference and can discover the correct number more quickly than an exact but passive approach.

The Nonlinearity Coefficient – Predicting Overfitting in Deep Neural Networks

For a long time, designing neural architectures that exhibit high performance was considered a dark art that required expert hand-tuning. One of the few well-known guidelines for architecture design is the avoidance of exploding gradients, though even this guideline has remained relatively vague and circumstantial. We introduce the nonlinearity coefficient (NLC), a measurement of the complexity of the function computed by a neural network that is based on the magnitude of the gradient. Via an extensive empirical study, we show that the NLC is a powerful predictor of test error and that attaining a right-sized NLC is essential for optimal performance. The NLC exhibits a range of intriguing and important properties. It is closely tied to the amount of information gained from computing a single network gradient. It is tied to the error incurred when replacing the nonlinearity operations in the network with linear operations. It is not susceptible to the confounders of multiplicative scaling, additive bias and layer width. It is stable from layer to layer. Hence, we argue that the NLC is the first robust predictor of overfitting in deep networks.

Strategic Object Oriented Reinforcement Learning

Humans learn to play video games significantly faster than state-of-the-art reinforcement learning (RL) algorithms. Inspired by this, we introduce strategic object oriented reinforcement learning (SOORL) to learn simple dynamics model through automatic model selection and perform efficient planning with strategic exploration. We compare different exploration strategies in a model-based setting in which exact planning is impossible. Additionally, we test our approach on perhaps the hardest Atari game Pitfall! and achieve significantly improved exploration and performance over prior methods.

Interpreting Deep Learning: The Machine Learning Rorschach Test

Theoretical understanding of deep learning is one of the most important tasks facing the statistics and machine learning communities. While deep neural networks (DNNs) originated as engineering methods and models of biological networks in neuroscience and psychology, they have quickly become a centerpiece of the machine learning toolbox. Unfortunately, DNN adoption powered by recent successes combined with the open-source nature of the machine learning community, has outpaced our theoretical understanding. We cannot reliably identify when and why DNNs will make mistakes. In some applications like text translation these mistakes may be comical and provide for fun fodder in research talks, a single error can be very costly in tasks like medical imaging. As we utilize DNNs in increasingly sensitive applications, a better understanding of their properties is thus imperative. Recent advances in DNN theory are numerous and include many different sources of intuition, such as learning theory, sparse signal analysis, physics, chemistry, and psychology. An interesting pattern begins to emerge in the breadth of possible interpretations. The seemingly limitless approaches are mostly constrained by the lens with which the mathematical operations are viewed. Ultimately, the interpretation of DNNs appears to mimic a type of Rorschach test — a psychological test wherein subjects interpret a series of seemingly ambiguous ink-blots. Validation for DNN theory requires a convergence of the literature. We must distinguish between universal results that are invariant to the analysis perspective and those that are specific to a particular network configuration. Simultaneously we must deal with the fact that many standard statistical tools for quantifying generalization or empirically assessing important network features are difficult to apply to DNNs.

PeerNets: Exploiting Peer Wisdom Against Adversarial Attacks

Deep learning systems have become ubiquitous in many aspects of our lives. Unfortunately, it has been shown that such systems are vulnerable to adversarial attacks, making them prone to potential unlawful uses. Designing deep neural networks that are robust to adversarial attacks is a fundamental step in making such systems safer and deployable in a broader variety of applications (e.g. autonomous driving), but more importantly is a necessary step to design novel and more advanced architectures built on new computational paradigms rather than marginally building on the existing ones. In this paper we introduce PeerNets, a novel family of convolutional networks alternating classical Euclidean convolutions with graph convolutions to harness information from a graph of peer samples. This results in a form of non-local forward propagation in the model, where latent features are conditioned on the global structure induced by the graph, that is up to 3 times more robust to a variety of white- and black-box adversarial attacks compared to conventional architectures with almost no drop in accuracy.

Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning

There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide insights into their behavior and thought processes. XAI allows users and parts of the internal system to be more transparent, providing explanations of their decisions in some level of detail. These explanations are important to ensure algorithmic fairness, identify potential bias/problems in the training data, and to ensure that the algorithms perform as expected. However, explanations produced by these systems is neither standardized nor systematically assessed. In an effort to create best practices and identify open challenges, we provide our definition of explainability and show how it can be used to classify existing literature. We discuss why current approaches to explanatory methods especially for deep neural networks are insufficient. Finally, based on our survey, we conclude with suggested future research directions for explanatory artificial intelligence.

Defending Against Model Stealing Attacks Using Deceptive Perturbations

Machine learning models are vulnerable to simple model stealing attacks if the adversary can obtain output labels for chosen inputs. To protect against these attacks, it has been proposed to limit the information provided to the adversary by omitting probability scores, significantly impacting the utility of the provided service. In this work, we illustrate how a service provider can still provide useful, albeit misleading, class probability information, while significantly limiting the success of the attack. Our defense forces the adversary to discard the class probabilities, requiring significantly more queries before they can train a model with comparable performance. We evaluate several attack strategies, model architectures, and hyperparameters under varying adversarial models, and evaluate the efficacy of our defense against the strongest adversary. Finally, we quantify the amount of noise injected into the class probabilities to mesure the loss in utility, e.g., adding 1.74 nats per query on CIFAR-10 and 3.27 on MNIST. Our extensive evaluation shows our defense can degrade the accuracy of the stolen model at least 20%, or require 4x more queries while keeping the accuracy of the protected model almost intact.

Interpretable Set Functions

We propose learning flexible but interpretable functions that aggregate a variable-length set of permutation-invariant feature vectors to predict a label. We use a deep lattice network model so we can architect the model structure to enhance interpretability, and add monotonicity constraints between inputs-and-outputs. We then use the proposed set function to automate the engineering of dense, interpretable features from sparse categorical features, which we call semantic feature engine. Experiments on real-world data show the achieved accuracy is similar to deep sets or deep neural networks, and is easier to debug and understand.

Text Normalization using Memory Augmented Neural Networks

We propose a memory augmented neural network to perform text normalization i.e. the transformation of words from the written to the spoken form. With the addition of dynamic memory access and storage mechanism, we present an architecture that will serve as a language agnostic text normalization system while avoiding the kind of silly errors made by the LSTM based recurrent neural architectures. By reducing the number of unacceptable mistakes, we show that such a novel architecture is indeed a better alternative. Our proposed system requires significantly lesser amounts of data, training time and compute resources. However, some occurrences of errors still remain in certain semiotic classes. Nevertheless, we demonstrate that memory augmented networks with meta-learning capabilities can open many doors to a superior text normalization system.

Blip: JIT and Footloose On The Edge

Edge environments offer a number of advantages for software developers including the ability to create services which can offer lower latency, better privacy, and reduced operational costs than traditional cloud hosted services. However large technical challenges exist, which prevent developers from utilising the Edge; complexities related to the heterogeneous nature of the Edge environment, issues with orchestration and application management and lastly, the inherent issues in creating decentralised distributed applications which operate at a large geographic scale. In this conceptual and architectural paper we envision a solution, Blip, which offers an easy to use programming and operational environment which addresses the these issues. It aims to remove the technical barriers which will inhibit the wider adoption Edge application development. This paper validates the Blip concept by demonstrating how it will deliver on the advantages of the Edge for a familiar scenario.

Assessing Generative Models via Precision and Recall

Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as Fr\’echet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since they yield one-dimensional scores. We propose a novel definition of precision and recall for distributions which disentangles the divergence into two separate dimensions. The proposed notion is intuitive, retains desirable properties, and naturally leads to an efficient algorithm that can be used to evaluate generative models. We relate this notion to total variation as well as to recent evaluation metrics such as Inception Score and FID. To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and the Variational Autoencoder. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution.

Too Fast Causal Inference under Causal Insufficiency

Causally insufficient structures (models with latent or hidden variables, or with confounding etc.) of joint probability distributions have been subject of intense study not only in statistics, but also in various AI systems. In AI, belief networks, being representations of joint probability distribution with an underlying directed acyclic graph structure, are paid special attention due to the fact that efficient reasoning (uncertainty propagation) methods have been developed for belief network structures. Algorithms have been therefore developed to acquire the belief network structure from data. As artifacts due to variable hiding negatively influence the performance of derived belief networks, models with latent variables have been studied and several algorithms for learning belief network structure under causal insufficiency have also been developed. Regrettably, some of them are known already to be erroneous (e.g. IC algorithm of [Pearl:Verma:91]. This paper is devoted to another algorithm, the Fast Causal Inference (FCI) Algorithm of [Spirtes:93]. It is proven by a specially constructed example that this algorithm, as it stands in [Spirtes:93], is also erroneous. Fundamental reason for failure of this algorithm is the temporary introduction of non-real links between nodes of the network with the intention of later removal. While for trivial dependency structures these non-real links may be actually removed, this may not be the case for complex ones, e.g. for the case described in this paper. A remedy of this failure is proposed.

Quantum Information Scrambling Through a High-Complexity Operator Mapping
On the Capacity of Secure Distributed Matrix Multiplication
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Surgical Activity Recognition in Robot-Assisted Radical Prostatectomy using Deep Learning
Adversarial quantum circuit learning for pure state approximation
Improved Oracle Complexity for Stochastic Compositional Variance Reduced Gradient
Do CIFAR-10 Classifiers Generalize to CIFAR-10
An Assmus-Mattson Theorem for Rank Metric Codes
Bayesian Logistic Regression for Small Areas with Numerous Households
Dual heuristics and new dual bounds to schedule the maintenances of nuclear power plants
Large-Margin Classification in Hyperbolic Space
Will pleural fluid affect surface wave speed measurements of the lung using lung ultrasound surface wave elastography: experimental and numerical studies on sponge phantom
Unfolding with Generative Adversarial Networks
Mobilizing the Trump Train: Understanding Collective Action in a Political Trolling Community
A Classification approach towards Unsupervised Learning of Visual Representations
Solving stochastic differential equations and Kolmogorov equations by means of deep learning
Whitening and Coloring transform for GANs
Automated discovery of characteristic features of phase transitions in many-body localization
The real tau-conjecture is true on average
Pattern Search MDS
A Numerical Study of the Relationship Between Erectile Pressure and Shear Wave Speed of Corpus Cavernosa in Ultrasound Vibro-elastography
Global linear convergence of Newton’s method without strong-convexity or Lipschitz gradients
Oversegmenting Graphs
New Gramians for Linear Switched Systems: Reachability, Observability, and Model Reduction
Inverting Supervised Representations with Autoregressive Neural Density Models
Overcoming device unreliability with continuous learning in a population coding based computing system
Radio Galaxy Morphology Generation Using DNN Autoencoder and Gaussian Mixture Models
Generalized couplings and ergodic rates for SPDEs and other Markov models
Opportunities in Machine Learning for Healthcare
Persistence paths and signature features in topological data analysis
Miniaturized Microwave Devices and Antennas for Wearable, Implantable and Wireless Applications
Evaluation of the Energy Efficiency in a Mixed Traffic with Automated Vehicles and Human Controlled Vehicles
$β$-Decay Spectrum, Response Function and Statistical Model for Neutrino Mass Measurements with the KATRIN Experiment
Accurate and Efficient Similarity Search for Large Scale Face Recognition
Domain Adaptation for MRI Organ Segmentation using Reverse Classification Accuracy
A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset
Differentiability in perturbation parameter of measure solutions to perturbed transport equation
Some of Them Can be Guessed! Exploring the Effect of Linguistic Context in Predicting Quantifiers
A probabilistic verification theorem for the finite horizon two-player zero-sum optimal switching game in continuous time
Producing radiologist-quality reports for interpretable artificial intelligence
Structured Local Optima in Sparse Blind Deconvolution
A Reinforcement Learning Approach to Age of Information in Multi-User Networks
On the some parameters related to matching of graph powers
Binary PSOGSA for Load Balancing Task Scheduling in Cloud Environment
Partial correlation hypersurfaces in Gaussian graphical models
Learning convex bounds for linear quadratic control policy synthesis
A programmable clock generator for automatic Quality Assurance of LOCx2
Joint Size and Depth Optimization of Sorting Networks
Strong geodetic problem on complete multipartite graphs
Artificial Immune Systems Can Find Arbitrarily Good Approximations for the NP-Hard Partition Problem
Fast Artificial Immune Systems
Iterative hard-thresholding applied to optimal control problems with $L^0(Ω)$ control cost
Automatic Detection of Neurons in NeuN-stained Histological Images of Human Brain
Optimal Algorithms for Non-Smooth Distributed Optimization in Networks
Analysis of Peer Review Effectiveness for Academic Journals Based on Distributed Parallel System
Localization of phonons in mass disordered alloys – A typical medium dynamical cluster approach
The Chromatic Number of the $q$-Kneser Graph for $q \geq 5$
On the governing equations for Poisson and Skellam processes time-changed by inverse subordinators
Locally $D$-optimal Designs for Non-linear Models on the $k$-dimensional Ball
Musical Instrument Separation on Shift-Invariant Spectrograms via Stochastic Dictionary Learning
Learning Neural Random Fields with Inclusive Auxiliary Generators
Projections of spherical Brownian motion
Learn the new, keep the old: Extending pretrained models with new anatomy and images
APNet: Semantic Segmentation for Pelvic MR Image
The Proximal Alternating Minimization Algorithm for two-block separable convex optimization problems with linear constraints
New Semifields and new MRD Codes from Skew Polynomial Rings
A Reduction Principle for the Critical Values of Random Spherical Harmonics
Proportional Fairness in ALOHA Networks with RF Energy Harvesting
Private Streaming with Convolutional Codes
Accounting for model errors in iterative ensemble smoothers
Unsupervised Object Localization using Generative Adversarial Networks
Stein approximation for multidimensional Poisson random measures by third cumulant expansions
Shortcuts to Adiabatic Classical Spin Dynamics Mimicking Quantum Annealing
SaGe: Preemptive Query Execution for High Data Availability on the Web
Model-based clustering for populations of networks
Creativity in Science and the Link to Cited References: Is the Creative Potential of Papers Reflected in their Cited References
Tangles and the Stone-Cech compactification of infinite graphs
Asymptotic Existence of Proportionally Fair Allocations
Inference Aided Reinforcement Learning for Incentive Mechanism Design in Crowdsourcing
Adversity Index for Clinical Trials: An Inclusive Approach for Analysis of Safety Data
Online Learning with Inexact Proximal Online Gradient Descent Algorithms
Block Palindromes: A New Generalization of Palindromes
Learning a Latent Space of Multitrack Measures
Deep Imbalanced Learning for Face Recognition and Attribute Prediction
On non-randomized stationary optimal policies in constrained discounted Markov decision processes
Near-Optimal Budgeted Data Exchange for Distributed Loop Closure Detection
Scaling Neural Machine Translation
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Hyperspectral Image Denoising Employing a Spatial-Spectral Deep Residual Convolutional Neural Network
IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks
Reparameterization Gradient for Non-differentiable Models
Training LSTM Networks with Resistive Cross-Point Devices
Balancedly splittable Hadamard matrices
Fast Channel Estimation and Beam Tracking for Millimeter Wave Vehicular Communications
Sparse Multiband Signal Acquisition Receiver with Co-prime Sampling
Neural Control Variates for Variance Reduction
k-Space Deep Learning for Reference-free EPI Ghost Correction
Distance Distribution to Received Words in Reed-Solomon Codes
Decentralized Connectivity-Preserving Deployment of Large-Scale Robot Swarms
q-Neurons: Neuron Activations based on Stochastic Jackson’s Derivative Operators
Tandem Blocks in Deep Convolutional Neural Networks
Sea surface temperature prediction and reconstruction using patch-level neural network representations
Modeling Preemptive Behaviors for Uncommon Hazardous Situations From Demonstrations
A Coupled Compressive Sensing Scheme for Unsourced Multiple Access
PID2018 Benchmark Challenge: Model-based Feedforward Compensator with A Conditional Integrator
Packing spanning partition-connected subgraphs with small degrees
Damping Effect on PageRank Distribution
On Curvature-aided Incremental Aggregated Gradient Methods
Technical Report: Inconsistency in Answer Set Programs and Extensions
Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits
Nonparametric Estimation in Fractional SDE
An Ideal Observer Model to Probe Human Visual Segmentation of Natural Images
Probabilistically Safe Robot Planning with Confidence-Based Human Predictions
Exact steady-state distributions of multispecies birth-death-immigration processes: effects of mutations and carrying capacity on diversity
MONET: Multiview Semi-supervised Keypoint via Epipolar Divergence
Respond-CAM: Analyzing Deep Models for 3D Imaging Data by Visualizations
Ratio Matching MMD Nets: Low dimensional projections for effective deep generative models
Imaging with SPADs and DMDs: Seeing through Diffraction-Photons
Robust Covariance Adaptation in Adaptive Importance Sampling
Applications of stochastic semigroups to cell cycle models
Resisting Adversarial Attacks using Gaussian Mixture Variational Autoencoders
Open and closed random walks with fixed edgelengths in $\mathbb{R}^d$
Resonate and Fire Neuron with Fixed Magnetic Skyrmions
Optimal modularity in complex contagion
On graph Laplacians eigenvectors with components in {1,-1,0}
Minimax Learning for Remote Prediction
Adaptive regularization with cubics on manifolds with a first-order analysis
Efficient Low-rank Multimodal Fusion with Modality-Specific Factors
Privacy Under Hard Distortion Constraints
On the probability that two random integers are coprime
On reachability of Markov chains: A Markov Decision Process approach
Millimeter Wave Communications with Reconfigurable Antennas
Basis Values Have Questionable Value
Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning
Efficient Algorithms and Lower Bounds for Robust Linear Regression
Near-Balanced Incomplete Block Designs with An Application to Poster Competitions
The number of the non-full-rank Steiner triple systems
Multi-Layered Gradient Boosting Decision Trees
Towards a new system for drowsiness detection based on eye blinking and head posture estimation
An Ontology to Support Collective Intelligence in Decentralised Multi-Robot Systems
A mixture model for aggregation of multiple pre-trained weak classifiers
The permanent functions of tensors
Enhancing noise-induced switching times in systems with distributed delays