Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions

Deep autoregressive models have shown state-of-the-art performance in density estimation for natural images on large-scale datasets such as ImageNet. However, such models require many thousands of gradient-based weight updates and unique image examples for training. Ideally, the models would rapidly learn visual concepts from only a handful of examples, similar to the manner in which humans learns across many vision tasks. In this paper, we show how 1) neural attention and 2) meta learning techniques can be used in combination with autoregressive models to enable effective few-shot density estimation. Our proposed modifications to PixelCNN result in state-of-the art few-shot density estimation on the Omniglot dataset. Furthermore, we visualize the learned attention policy and find that it learns intuitive algorithms for simple tasks such as image mirroring on ImageNet and handwriting on Omniglot without supervision. Finally, we extend the model to natural images and demonstrate few-shot image generation on the Stanford Online Products dataset.

A Self-Training Method for Semi-Supervised GANs

Since the creation of Generative Adversarial Networks (GANs), much work has been done to improve their training stability, their generated image quality, their range of application but nearly none of them explored their self-training potential. Self-training has been used before the advent of deep learning in order to allow training on limited labelled training data and has shown impressive results in semi-supervised learning. In this work, we combine these two ideas and make GANs self-trainable for semi-supervised learning tasks by exploiting their infinite data generation potential. Results show that using even the simplest form of self-training yields an improvement. We also show results for a more complex self-training scheme that performs at least as well as the basic self-training scheme but with significantly less data augmentation.

Revisit Fuzzy Neural Network: Demystifying Batch Normalization and ReLU with Generalized Hamming Network

We revisit fuzzy neural network with a cornerstone notion of generalized hamming distance, which provides a novel and theoretically justified framework to re-interpret many useful neural network techniques in terms of fuzzy logic. In particular, we conjecture and empirically illustrate that, the celebrated batch normalization (BN) technique actually adapts the normalized bias such that it approximates the rightful bias induced by the generalized hamming distance. Once the due bias is enforced analytically, neither the optimization of bias terms nor the sophisticated batch normalization is needed. Also in the light of generalized hamming distance, the popular rectified linear units (ReLU) can be treated as setting a minimal hamming distance threshold between network inputs and weights. This thresholding scheme, on the one hand, can be improved by introducing double thresholding on both extremes of neuron outputs. On the other hand, ReLUs turn out to be non-essential and can be removed from networks trained for simple tasks like MNIST classification. The proposed generalized hamming network (GHN) as such not only lends itself to rigorous analysis and interpretation within the fuzzy logic theory but also demonstrates fast learning speed, well-controlled behaviour and state-of-the-art performances on a variety of learning tasks.

Similarity-based Multi-label Learning

Multi-label classification is an important learning problem with many applications. In this work, we propose a principled similarity-based approach for multi-label learning called SML. We also introduce a similarity-based approach for predicting the label set size. The experimental results demonstrate the effectiveness of SML for multi-label classification where it is shown to compare favorably with a wide variety of existing algorithms across a range of evaluation criterion.

Deep Generative Dual Memory Network for Continual Learning

Despite advances in deep learning, artificial neural networks do not learn the same way as humans do. Today, neural networks can learn multiple tasks when trained on them jointly, but cannot maintain performance on learnt tasks when tasks are presented one at a time — this phenomenon called catastrophic forgetting is a fundamental challenge to overcome before neural networks can learn continually from incoming data. In this work, we derive inspiration from human memory to develop an architecture capable of learning continuously from sequentially incoming tasks, while averting catastrophic forgetting. Specifically, our model consists of a dual memory architecture to emulate the complementary learning systems (hippocampus and the neocortex) in the human brain, and maintains a consolidated long-term memory via generative replay of past experiences. We (i) substantiate our claim that replay should be generative, (ii) show the benefits of generative replay and dual memory via experiments, and (iii) demonstrate improved performance retention even for small models with low capacity. Our architecture displays many important characteristics of the human memory and provides insights on the connection between sleep and learning in humans.

Fully Distributed Multi-sensor Change-point Detection

Change-point detection has been a fundamental problem in quality control, hazards monitoring and cybersecurity. With the ever-growing complexity of system and enlarging number of sensors, multi-sensor detection algorithms are in real demand. However, most existing detection algorithms consider a centralized approach, meaning that one collects all data at one location, which sometimes may be infeasible for real-time detection due to delay and cost of communication. To address this issue, we propose a fully distributed CUSUM-based detection procedure inspired by the idea of average consensus. The algorithm is fully distributed, i.e., it only requires sensors to communicate with their neighbors and no fusion center is needed. We provide theoretical analysis showing that the performance of the fully distributed scheme can match the centralized algorithms under some mild assumptions. Numerical experiments demonstrate the good performance of the algorithm.

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large data sets due to computational limitations. A critical step in big data analysis is data reduction. Existing investigations in the context of linear regression focus on subsampling-based methods. However, not only is this approach prone to sampling errors, it also leads to a covariance matrix of the estimators that is typically bounded from below by a term that is of the order of the inverse of the subdata size. We propose a novel approach, termed information-based optimal subdata selection (IBOSS). Compared to leading existing subdata methods, the IBOSS approach has the following advantages: (i) it is significantly faster; (ii) it is suitable for distributed parallel computing; (iii) the variances of the slope parameter estimators converge to 0 as the full data size increases even if the subdata size is fixed, i.e., the convergence rate depends on the full data size; (iv) data analysis for IBOSS subdata is straightforward and the sampling distribution of an IBOSS estimator is easy to assess. Theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to subsampling-based methods, sometimes by orders of magnitude. The advantages of the new approach are also illustrated through analysis of real data.

Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks

We propose a method, called Label Embedding Network, which can learn label representation (label embedding) during the training process of deep networks. With the proposed method, the label embedding is adaptively and automatically learned through back propagation. The original one-hot represented loss function is converted into a new loss function with soft distributions, such that the originally unrelated labels have continuous interactions with each other during the training process. As a result, the trained model can achieve substantially higher accuracy and with faster convergence speed. Experimental results based on competitive tasks demonstrate the effectiveness of the proposed method, and the learned label embedding is reasonable and interpretable. The proposed method achieves comparable or even better results than the state-of-the-art systems. The source code is available at \url{https://…/LabelEmb}.

Wasserstein Identity Testing

Uniformity testing and the more general identity testing are well studied problems in distributional property testing. Most previous work focuses on testing under L_1-distance. However, when the support is very large or even continuous, testing under L_1-distance may require a huge (even infinite) number of samples. Motivated by such issues, we consider the identity testing in Wasserstein distance (a.k.a. transportation distance and earthmover distance) on a metric space (discrete or continuous). In this paper, we propose the Wasserstein identity testing problem (Identity Testing in Wasserstein distance). We obtain nearly optimal worst-case sample complexity for the problem. Moreover, for a large class of probability distributions satisfying the so-called ‘Doubling Condition’, we provide nearly instance-optimal sample complexity.

Topic Based Sentiment Analysis Using Deep Learning

In this paper , we tackle Sentiment Analysis conditioned on a Topic in Twitter data using Deep Learning . We propose a 2-tier approach : In the first phase we create our own Word Embeddings and see that they do perform better than state-of-the-art embeddings when used with standard classifiers. We then perform inference on these embeddings to learn more about a word with respect to all the topics being considered, and also the top n-influencing words for each topic. In the second phase we use these embeddings to predict the sentiment of the tweet with respect to a given topic, and all other topics under discussion.

Interpretation of Neural Networks is Fragile

In order for machine learning to be deployed and trusted in many applications, it is crucial to be able to reliably explain why the machine learning algorithm makes certain predictions. For example, if an algorithm classifies a given pathology image to be a malignant tumor, then the doctor may need to know which parts of the image led the algorithm to this classification. How to interpret black-box predictors is thus an important and active area of research. A fundamental question is: how much can we trust the interpretation itself? In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different interpretations. We systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency maps, relevance propagation, and DeepLIFT) on ImageNet and CIFAR-10. Our experiments show that even small random perturbation can change the feature importance and new systematic perturbations can lead to dramatically different interpretations without changing the label. We extend these results to show that interpretations based on exemplars (e.g. influence functions) are similarly fragile. Our analysis of the geometry of the Hessian matrix gives insight on why fragility could be a fundamental challenge to the current interpretation approaches.

A Hybrid Data Mining Approach for Product Complexity Analysis

This paper proposes a hybrid data mining approach to quantitatively analyze product complexity of prefabricated construction components from product nonconforming quality performance data. The proposed model is constructed in three steps, which (1) measure product complexity by introducing a Bayesian-based nonconforming quality performance indicator; (2) score each type of product complexity by developing a Hellinger distance-based distribution similarity measurement; and (3) cluster products into homogeneous complexity groups by using the agglomerative hierarchical clustering technique. An illustrative example is provided to demonstrate the proposed approach, and a case study of an industrial company in Edmonton, Canada, is conducted to validate the feasibility and applicability of the proposed model. This research inventively defines and investigates product complexity from the perspective of product quality performance. The research outcomes provide valuable insights for practitioners to better analyze and manage product complexity. In addition to this practical contribution, a novel hierarchical clustering technique is devised. This technique is capable of clustering uncertain data (i.e., probability distributions) and has the potential to be generalized to cluster all types of uncertain data.

Practical Bayesian Inference for Record Linkage

Probabilistic record linkage (PRL) is the process of determining which records in two databases correspond to the same underlying entity in the absence of a unique identifier. Bayesian solutions to this problem provide a powerful mechanism for propagating uncertainty due to uncertain links between records (via the posterior distribution). However, computational considerations severely limit the practical applicability of existing Bayesian approaches. We propose a new computational approach, providing both a fast algorithm for deriving point estimates of the linkage structure that properly account for one-to-one matching and a restricted MCMC algorithm that samples from an approximate posterior distribution. Our advances make it possible to perform Bayesian PRL for larger problems, and to assess the sensitivity of results to varying prior specifications. We demonstrate the methods on simulated data and an application to a post-enumeration survey for coverage estimation in the Italian census.

A Bayesian Data Augmentation Approach for Learning Deep Models

Data augmentation is an essential part of the training process applied to deep learning models. The motivation is that a robust training process for deep learning models depends on large annotated datasets, which are expensive to be acquired, stored and processed. Therefore a reasonable alternative is to be able to automatically generate new annotated training samples using a process known as data augmentation. The dominant data augmentation approach in the field assumes that new training samples can be obtained via random geometric or appearance transformations applied to annotated training samples, but this is a strong assumption because it is unclear if this is a reliable generative model for producing new training samples. In this paper, we provide a novel Bayesian formulation to data augmentation, where new annotated training points are treated as missing variables and generated based on the distribution learned from the training set. For learning, we introduce a theoretically sound algorithm — generalised Monte Carlo expectation maximisation, and demonstrate one possible implementation via an extension of the Generative Adversarial Network (GAN). Classification results on MNIST, CIFAR-10 and CIFAR-100 show the better performance of our proposed method compared to the current dominant data augmentation approach mentioned above — the results also show that our approach produces better classification results than similar GAN models.

Stochastic Training of Graph Convolutional Networks

Graph convolutional networks (GCNs) are powerful deep neural networks for graph-structured data. However, GCN computes nodes’ representation recursively from their neighbors, making the receptive field size grow exponentially with the number of layers. Previous attempts on reducing the receptive field size by subsampling neighbors do not have any convergence guarantee, and their receptive field size per node is still in the order of hundreds. In this paper, we develop a preprocessing strategy and two control variate based algorithms to further reduce the receptive field size. Our algorithms are guaranteed to converge to GCN’s local optimum regardless of the neighbor sampling size. Empirical results show that our algorithms have a similar convergence speed per epoch with the exact algorithm even using only two neighbors per node. The time consumption of our algorithm on the Reddit dataset is only one fifth of previous neighbor sampling algorithms.

Weight Initialization of Deep Neural Networks(DNNs) using Data Statistics

Deep neural networks (DNNs) form the backbone of almost every state-of-the-art technique in the fields such as computer vision, speech processing, and text analysis. The recent advances in computational technology have made the use of DNNs more practical. Despite the overwhelming performances by DNN and the advances in computational technology, it is seen that very few researchers try to train their models from the scratch. Training of DNNs still remains a difficult and tedious job. The main challenges that researchers face during training of DNNs are the vanishing/exploding gradient problem and the highly non-convex nature of the objective function which has up to million variables. The approaches suggested in He and Xavier solve the vanishing gradient problem by providing a sophisticated initialization technique. These approaches have been quite effective and have achieved good results on standard datasets, but these same approaches do not work very well on more practical datasets. We think the reason for this is not making use of data statistics for initializing the network weights. Optimizing such a high dimensional loss function requires careful initialization of network weights. In this work, we propose a data dependent initialization and analyze its performance against the standard initialization techniques such as He and Xavier. We performed our experiments on some practical datasets and the results show our algorithm’s superior classification accuracy.

Variational Continual Learning

This paper develops variational continual learning (VCL), a simple but general framework for continual learning that fuses online variational inference (VI) and recent advances in Monte Carlo VI for neural networks. The framework can successfully train both deep discriminative models and deep generative models in complex continual learning settings where existing tasks evolve over time and entirely new tasks emerge. Experimental results show that variational continual learning outperforms state-of-the-art continual learning methods on a variety of tasks, avoiding catastrophic forgetting in a fully automatic way.

Regularization for Deep Learning: A Taxonomy

Regularization is one of the crucial ingredients of deep learning, yet the term regularization has various definitions, and regularization methods are often studied separately from each other. In our work we present a systematic, unifying taxonomy to categorize existing methods. We distinguish methods that affect data, network architectures, error terms, regularization terms, and optimization procedures. We do not provide all details about the listed methods; instead, we present an overview of how the methods can be sorted into meaningful categories and sub-categories. This helps revealing links and fundamental similarities between them. Finally, we include practical recommendations both for users and for developers of new regularization methods.

Kernel Graph Convolutional Neural Networks

Graph kernels have been successfully applied to many graph classification problems. Typically, a kernel is first designed, and then an SVM classifier is trained based on the features defined implicitly by this kernel. This two-stage approach decouples data representation from learning, which is suboptimal. On the other hand, Convolutional Neural Networks (CNNs) have the capability to learn their own features directly from the raw data during training. Unfortunately, they cannot handle irregular data such as graphs. We address this challenge by using graph kernels to embed meaningful local neighborhoods of the graphs in a continuous vector space. A set of filters is then convolved with these patches, pooled, and the output is then passed to a feedforward network. With limited parameter tuning, our approach outperforms strong baselines on 7 out of 10 benchmark datasets.

Multilinear Class-Specific Discriminant Analysis

There has been a great effort to transfer linear discriminant techniques that operate on vector data to high-order data, generally referred to as Multilinear Discriminant Analysis (MDA) techniques. Many existing works focus on maximizing the inter-class variances to intra-class variances defined on tensor data representations. However, there has not been any attempt to employ class-specific discrimination criteria for the tensor data. In this paper, we propose a multilinear subspace learning technique suitable for applications requiring class-specific tensor models. The method maximizes the discrimination of each individual class in the feature space while retains the spatial structure of the input. We evaluate the efficiency of the proposed method on two problems, i.e. facial image analysis and stock price prediction based on limit order book data.

Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data

Machine learning algorithms such as linear regression, SVM and neural network have played an increasingly important role in the process of scientific discovery. However, none of them is both interpretable and accurate on nonlinear datasets. Here we present contextual regression, a method that joins these two desirable properties together using a hybrid architecture of neural network embedding and dot product layer. We demonstrate its high prediction accuracy and sensitivity through the task of predictive feature selection on a simulated dataset and the application of predicting open chromatin sites in the human genome. On the simulated data, our method achieved high fidelity recovery of feature contributions under random noise levels up to 200%. On the open chromatin dataset, the application of our method not only outperformed the state of the art method in terms of accuracy, but also unveiled two previously unfound open chromatin related histone marks. Our method can fill the blank of accurate and interpretable nonlinear modeling in scientific data mining tasks.

Evolving Deep Convolutional Neural Networks for Image Classification

Evolutionary computation methods have been successfully applied to neural networks since two decades ago, while those methods cannot scale well to the modern deep neural networks due to the complicated architectures and large quantities of connection weights. In this paper, we propose a new method using genetic algorithms for evolving the architectures and connection weight initialization values of a deep convolutional neural network to address image classification problems. In the proposed algorithm, an efficient variable-length gene encoding strategy is designed to represent the different building blocks and the unpredictable optimal depth in convolutional neural networks. In addition, a new representation scheme is developed for effectively initializing connection weights of deep convolutional neural networks, which is expected to avoid networks getting stuck into local minima which is typically a major issue in the backward gradient-based optimization. Furthermore, a novel fitness evaluation method is proposed to speed up the heuristic search with substantially less computational resource. The proposed algorithm is examined and compared with 22 existing algorithms on nine widely used image classification tasks, including the state-of-the-art methods. The experimental results demonstrate the remarkable superiority of the proposed algorithm over the state-of-the-art algorithms in terms of classification error rate and the number of parameters (weights).

Tensorizing Generative Adversarial Nets

Generative Adversarial Network (GAN) and its variants demonstrate state-of-the-art performance in the class of generative models. To capture higher dimensional distributions, the common learning procedure requires high computational complexity and large number of parameters. In this paper, we present a new generative adversarial framework by representing each layer as a tensor structure connected by multilinear operations, aiming to reduce the number of model parameters by a large factor while preserving the quality of generalized performance. To learn the model, we develop an efficient algorithm by alternating optimization of the mode connections. Experimental results demonstrate that our model can achieve high compression rate for model parameters up to 40 times as compared to the existing GAN.

Transfer Learning to Learn with Multitask Neural Model Search

Deep learning models require extensive architecture design exploration and hyperparameter optimization to perform well on a given task. The exploration of the model design space is often made by a human expert, and optimized using a combination of grid search and search heuristics over a large space of possible choices. Neural Architecture Search (NAS) is a Reinforcement Learning approach that has been proposed to automate architecture design. NAS has been successfully applied to generate Neural Networks that rival the best human-designed architectures. However, NAS requires sampling, constructing, and training hundreds to thousands of models to achieve well-performing architectures. This procedure needs to be executed from scratch for each new task. The application of NAS to a wide set of tasks currently lacks a way to transfer generalizable knowledge across tasks. In this paper, we present the Multitask Neural Model Search (MNMS) controller. Our goal is to learn a generalizable framework that can condition model construction on successful model searches for previously seen tasks, thus significantly speeding up the search for new tasks. We demonstrate that MNMS can conduct an automated architecture search for multiple tasks simultaneously while still learning well-performing, specialized models for each task. We then show that pre-trained MNMS controllers can transfer learning to new tasks. By leveraging knowledge from previous searches, we find that pre-trained MNMS models start from a better location in the search space and reduce search time on unseen tasks, while still discovering models that outperform published human-designed models.

Understanding Hidden Memories of Recurrent Neural Networks

Recurrent neural networks (RNNs) have been successfully applied to various natural language processing (NLP) tasks and achieved better results than conventional methods. However, the lack of understanding of the mechanisms behind their effectiveness limits further improvements on their architectures. In this paper, we present a visual analytics method for understanding and comparing RNN models for NLP tasks. We propose a technique to explain the function of individual hidden state units based on their expected response to input texts. We then co-cluster hidden state units and words based on the expected response and visualize co-clustering results as memory chips and word clouds to provide more structured knowledge on RNNs’ hidden states. We also propose a glyph-based sequence visualization based on aggregate information to analyze the behavior of an RNN’s hidden state at the sentence-level. The usability and effectiveness of our method are demonstrated through case studies and reviews from domain experts.

How deep learning works –The geometry of deep learning

Why and how that deep learning works well on different tasks remains a mystery from a theoretical perspective. In this paper we draw a geometric picture of the deep learning system by finding its analogies with two existing geometric structures, the geometry of quantum computations and the geometry of the diffeomorphic template matching. In this framework, we give the geometric structures of different deep learning systems including convolutional neural networks, residual networks, recursive neural networks, recurrent neural networks and the equilibrium prapagation framework. We can also analysis the relationship between the geometrical structures and their performance of different networks in an algorithmic level so that the geometric framework may guide the design of the structures and algorithms of deep learning systems.

Understanding GANs: the LQG Setting

Generative Adversarial Networks (GANs) have become a popular method to learn a probability model from data. Many GAN architectures with different optimization metrics have been introduced recently. Instead of proposing yet another architecture, this paper aims to provide an understanding of some of the basic issues surrounding GANs. First, we propose a natural way of specifying the loss function for GANs by drawing a connection with supervised learning. Second, we shed light on the generalization peformance of GANs through the analysis of a simple LQG setting: the generator is Linear, the loss function is Quadratic and the data is drawn from a Gaussian distribution. We show that in this setting: 1) the optimal GAN solution converges to population Principal Component Analysis (PCA) as the number of training samples increases; 2) the number of samples required scales exponentially with the dimension of the data; 3) the number of samples scales almost linearly if the discriminator is constrained to be quadratic. Thus, linear generators and quadratic discriminators provide a good balance for fast learning.

Weighted entropy: basic inequalities

This paper represents an extended version of an earlier note [10]. The concept of weighted entropy takes into account values of different outcomes, i.e., makes entropy context-dependent, through the weight function. We analyse analogs of the Fisher information inequality and entropy power inequality for the weighted entropy and discuss connections with weighted Lieb’s splitting inequality. The concepts of rates of the weighted entropy and information are also discussed.

Graph Attention Networks

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods’ features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved state-of-the-art results across three established transductive and inductive graph benchmarks: the Cora and Citeseer citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs are entirely unseen during training).

A Comprehensive Survey on Fog Computing: State-of-the-art and Research Challenges

Cloud computing with its three key facets (i.e., IaaS, PaaS, and SaaS) and its inherent advantages (e.g., elasticity and scalability) still faces several challenges. The distance between the cloud and the end devices might be an issue for latency-sensitive applications such as disaster management and content delivery applications. Service Level Agreements (SLAs) may also impose processing at locations where the cloud provider does not have data centers. Fog computing is a novel paradigm to address such issues. It enables provisioning resources and services outside the cloud, at the edge of the network, closer to end devices or eventually, at locations stipulated by SLAs. Fog computing is not a substitute for cloud computing but a powerful complement. It enables processing at the edge while still offering the possibility to interact with the cloud. This article presents a comprehensive survey on fog computing. It critically reviews the state of the art in the light of a concise set of evaluation criteria. We cover both the architectures and the algorithms that make fog systems. Challenges and research directions are also introduced. In addition, the lessons learned are reviewed and the prospects are discussed in terms of the key role fog is likely to play in emerging technologies such as Tactile Internet.

Denoising random forests

This paper proposes a novel type of random forests called a denoising random forests that are robust against noises contained in test samples. Such noise-corrupted samples cause serious damage to the estimation performances of random forests, since unexpected child nodes are often selected and the leaf nodes that the input sample reaches are sometimes far from those for a clean sample. Our main idea for tackling this problem originates from a binary indicator vector that encodes a traversal path of a sample in the forest. Our proposed method effectively employs this vector by introducing denoising autoencoders into random forests. A denoising autoencoder can be trained with indicator vectors produced from clean and noisy input samples, and non-leaf nodes where incorrect decisions are made can be identified by comparing the input and output of the trained denoising autoencoder. Multiple traversal paths with respect to the nodes with incorrect decisions caused by the noises can then be considered for the estimation.

When tails wag the decision: The role of distributional tails on climate impacts on decision-relevant time-scales
Wavelet Shrinkage and Thresholding based Robust Classification for Brain Computer Interface
One-shot and few-shot learning of word embeddings
Probability Series Expansion Classifier that is Interpretable by Design
Properties of the Fibonacci-sum graph
Identifying overlapping terrorist cells from the Noordin Top actor-event network
Spectral Graph Wavelets for Structural Role Similarity in Networks
On Maximally Recoverable Local Reconstruction Codes
Lower Bounds for Higher-Order Convex Optimization
Multi-modal Aggregation for Video Classification
Improved approximation of layout problems on random graphs
Insights on Variance Estimation for Blocked and Matched Pairs Designs
Hasse diagrams of non-isomorphic posets with $n$ elements, $2\leq n \leq 7,$ and the number of posets with $10$ elements, without the aid of any computer program
A Treatise on Sucker’s Bets
The Implicit Bias of Gradient Descent on Separable Data
Identifying Individual Disease Dynamics in a Stochastic Multi-pathogen Model From Aggregated Reports and Laboratory Data
Multi-level Residual Networks from Dynamical Systems View
Bayesian Spatial Binary Regression for Label Fusion in Structural Neuroimaging
Automated Design using Neural Networks and Gradient Descent
Brewster anomaly in random anisotropic media
Convolutional Neural Networks Via Node-Varying Graph Filters
Deep Residual Learning for Small-Footprint Keyword Spotting
Diff-DAC: Distributed Actor-Critic for Multitask Deep Reinforcement Learning
Consistency of Lipschitz learning with infinite unlabeled data and finite labeled data
Lower Bounds for Two-Sample Structural Change Detection in Ising and Gaussian Models
Topology adaptive graph convolutional networks
Combinatorial proof of an identity of Andrews–Yee
Exploring Asymmetric Encoder-Decoder Structure for Context-based Sentence Representation Learning
Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information
A Geometric Perspective on the Power of Principal Component Association Tests in Multiple Phenotype Studies
Left-Right Skip-DenseNets for Coarse-to-Fine Object Categorization
A Range-Doppler-Angle Estimation Method for Passive Bistatic Radar
Minimax Rates and Efficient Algorithms for Noisy Sorting
Blocking Probability and Spatial Throughput Characterization for Cellular-Enabled UAV Network with Directional Antenna
Local approximation of a metapopulation’s equilibrium
Geometric Decomposition-Based Formulation for Time Derivatives of Instantaneous Impact Point
A Study of All-Convolutional Encoders for Connectionist Temporal Classification
Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
Trainable back-propagated functional transfer matrices
Efficient Localized Inference for Large Graphical Models
Doppelgangers: the Ur-Operation and Posets of Bounded Height
Cox’s proportional hazards model with a high-dimensional and sparse regression parameter
Efficient Licence Plate Detection By Unique Edge Detection Algorithm and Smarter Interpretation Through IoT
Channel Coherence Classification with Frame-Shifting in Massive MIMO System
Uniform rank gradient, cost and local-global convergence
An Ontology to support automated negotiation
A Framework for Compressive Time-of-Flight 3D Sensing
Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms
Criteria for input-to-state practical stability
Inducing Regular Grammars Using Recurrent Neural Networks
All partitions have small parts – Gallai-Ramsey numbers of bipartite graphs
Toward predictive machine learning for active vision
Omnidirectional Precoding and Combining Based Synchronization for Millimeter Wave Massive MIMO Systems
Long-Distance Loop Closure Using General Object Landmarks
Generalized End-to-End Loss for Speaker Verification
Speaker Diarization with LSTM
Attention-Based Models for Text-Dependent Speaker Verification
SeeThrough: Finding Chairs in Heavily Occluded Indoor Scenes
A reference-searching-based algorithm for large-scale data envelopment analysis computation
On the $α$-index of graphs with pendent paths
ILAPF: Incremental Learning Assisted Particle Filtering
Analytical Estimation of Scalability of Iterative Numerical Algorithms on Distributed Memory Multiprocessors
Learning to diagnose from scratch by exploiting dependencies among labels
Customer sojourn time in GI/G/1 feedback queue in the presence of heavy tails
Phase Conductor on Multi-layered Attentions for Machine Comprehension
Online Approximate Optimal Station Keeping of a Marine Craft in the Presence of a Current
Crime incidents embedding using restricted Boltzmann machines
Optimal Battery Participation in Frequency Regulation Markets
Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization
A Dual Encoder Sequence to Sequence Model for Open-Domain Dialogue Modeling
Object Recognition by Using Multi-level Feature Point Extraction
Interlacement and Activities in Delta-Matroids
Optimal designs for regression with spherical data
Parking on transitive unimodular graphs
Interpretable Apprenticship Learning with Temporal Logic Specifications
Heat kernel and ergodicity of SDEs with distributional drifts
Partial Knowledge In Embeddings
Interaction between cluster synchronization and epidemic spread in community networks
Hierarchical and Distributed Monitoring of Voltage Stability in Distribution Networks
A $o(d) \cdot \text{polylog}~n$ Monotonicity Tester for Boolean Functions over the Hypergrid $[n]^d$
Vehicle Routing Problem with Vector Profits (VRPVP) with Max-Min Criterion
Stochastic Zeroth-order Optimization in High Dimensions
A Novel Approach to Artistic Textual Visualization via GAN
Smooth Sensitivity Based Approach for Differentially Private Principal Component Analysis
Synthetic Iris Presentation Attack using iDCGAN
Certifiable Distributional Robustness with Principled Adversarial Training
Personalized word representations Carrying Personalized Semantics Learned from Social Network Posts
Examining CNN representations with respect to Dataset Bias
Secrecy Rate Maximization with Outage Constraint in Multihop Relaying Networks
Path-Based Attention Neural Model for Fine-Grained Entity Typing
Evaluation of Automatic Video Captioning Using Direct Assessment
Intelligent Interference Exploitation for Heterogeneous Cellular Networks against Eavesdropping
Automatic Knee Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Approach
Almost Optimal Stochastic Weighted Matching With Few Queries
Social Welfare Maximization Auction in Edge Computing Resource Allocation for Mobile Blockchain
Regularization approaches for support vector machines with applications to biomedical data
SDPNAL+: A Matlab software for semidefinite programming with bound constraints (version 1.0)
Regularity and Sensitivity for McKean-Vlasov SPDEs
Finding Dominant User Utterances And System Responses in Conversations
$k$-Foldability of Words
Half of an antipodal spherical design
Detecting Multiple Random Changepoints in Bayesian Piecewise Growth Mixture Models
Dimensionality reduction methods for molecular simulations
Recursive formulae in regularity structures
JESC: Japanese-English Subtitle Corpus
Percolation without FKG
On the Consistency of Quick Shift
Using the quantization error from Self-Organized Map (SOM) output for detecting critical variability in large bodies of image time series in less than a minute
Robust adaptive efficient estimation for a semi-Markov continuous time regression from discrete data
Delivery Time Minimization in Edge Caching: Synergistic Benefits of Subspace Alignment and Zero Forcing
If it ain’t broke, don’t fix it: Sparse metric repair
Multi-Armed Bandits with Non-Stationary Rewards
Improved Bounds for Testing Forbidden Order Patterns
A Study on Topological Descriptors for the Analysis of 3D Surface Texture
List-decodable zero-rate codes
Wideband Channel Estimation for Hybrid Beamforming Millimeter Wave Communication Systems with Low-Resolution ADCs
Narrowband Channel Estimation for Hybrid Beamforming Millimeter Wave Communication Systems with One-bit Quantization
Discovery Radiomics with CLEAR-DR: Interpretable Computer Aided Diagnosis of Diabetic Retinopathy
Local limit theorems and mod-phi convergence
High-Precision Localization Using Ground Texture
Maximum Likelihood Estimations Based on Upper Record Values for Probability Density Function and Cumulative Distribution Function in Exponential Family and Investigating Some of Their Properties
Research on ruin probability of risk model based on AR(1) series
Robust Optimal Design of Quantum Electronic Devices
Training Probabilistic Spiking Neural Networks with First-to-spike Decoding
Distributional Consistency of Lasso by Perturbation Bootstrap
On Pre-Trained Image Features and Synthetic Images for Deep Learning
Bayesian Nonparametric Differential Analysis for Dependent Multigroup Data with Application to Colorectal Cancer DNA Methylation
A Saak Transform Approach to Efficient, Scalable and Robust Handwritten Digits Recognition
Optimal Coded Multicast in Cache Networks with Arbitrary Content Placement
Globally Optimal Symbolic Regression
Simple and Effective Multi-Paragraph Reading Comprehension
BAS: Beetle Antennae Search Algorithm for Optimization Problems
Breaking the Madry Defense Model with $L_1$-based Adversarial Examples
Can you find a face in a HEVC bitstream?
Linearly convergent stochastic heavy ball method for minimizing generalization error
Learning neural trans-dimensional random field language models with noise-contrastive estimation
Implicit Causal Models for Genome-wide Association Studies
Detection and Estimation of the Invisible Units Using Utility Data Based on Random Matrix Theory
Crack Is Controllable, a controllable crack propagation method by using artificial neural network assisted particle swarm optimization
Cascade Region Proposal and Global Context for Deep Object Detection
Computational Social Choice and Computational Complexity: BFFs?
Stationarity Region of Mm-Wave Channel Based on Outdoor Microcellular Measurements at 28 GHz
Modeling Attention in Panoramic Video: A Deep Reinforcement Learning Approach
Fair Termination for Parameterized Probabilistic Concurrent Systems (Technical Report)
On an extremal problem involving a pair of forbidden posets
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples
Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models
Communication-Avoiding Optimization Methods for Massive-Scale Graphical Model Structure Learning
Frank-Wolfe methods for geodesically convex optimization with application to the matrix geometric mean
Sequence-to-Sequence ASR Optimization via Reinforcement Learning
Generative Adversarial Source Separation
Sparse Vector Coding for Ultra-Reliable and Low Latency Communications
Stochastic variance reduced multiplicative update for nonnegative matrix factorization
An introduction to random matrix theory
Performance Limits of Compressive Sensing Channel Estimation in Dense Cloud RAN
DART: Distribution Aware Retinal Transform for Event-based Cameras
Performance Analysis of Multi-Service Oriented Multiple Access Under General Channel Correlation
Reliable Communication under the Influence of a State-Constrained Jammer: A Novel Perspective on Receive Diversity
Hit Song Prediction for Pop Music by Siamese CNN with Ranking Loss
Monotonicity and robustness in Wiener disorder detection
Rough extreme learning machine: a new classification method based on uncertainty measure
2D Unitary ESPRIT Based Super-Resolution Channel Estimation for Millimeter-Wave Massive MIMO with Hybrid Precoding
Generalized gradient optimization over lossy networks for partition-based estimation
A Framework for Over-the-air Reciprocity Calibration for TDD Massive MIMO Systems
Gradient Estimates on Dirichlet Eigenfunctions
Verification of BSF Parallel Computational Model
An algorithmic approach to handle circular trading in commercial taxing system
An introduction to Wishart matrix moments
Asymptotic analysis of average case approximation complexity of additive random fields
Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming
Factorizations of $k$-Nonnegative Matrices
Sparse covariance matrix estimation in high-dimensional deconvolution
Shifts of the prime divisor function of Alladi and Erdős
Fast Linear Model for Knowledge Graph Embeddings
Weak Stability of $\ell_1$-minimization Methods in Sparse Data Reconstruction
Divisibility of binomial coefficients by powers of two
Models with varying structure
Calorimeter-less gamma-ray telescopes: Optimal measurement of charged particle momentum from multiple scattering by Bayesian analysis of Kalman filtering innovations
Rectilinear and $\mathcal{O}$-convex hull with minimum area
Open Set Logo Detection and Retrieval
Level algebras and $\s$-lecture hall polytopes
Learning to solve inverse problems using Wasserstein loss
A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices
Monochromatic Paths in the Complete Symmetric Infinite Digraph
An FPTAS of Minimizing Total Weighted Completion Time on Single Machine with Position Constraint
Abelian Schur groups of odd order
Device-centric Energy Optimization for Edge Cloud Offloading
Optimal Kernel-Based Dynamic Mode Decomposition
Solution of linear ill-posed problems by model selection and aggregation
The loss surface and expressivity of deep convolutional neural networks
Numerical approximation of general Lipschitz BSDEs with branching processes
Asymptotically efficient estimators for stochastic blockmodels: the naive MLE, the rank-constrained MLE, and the spectral
Evidence for thermal activation in the glassy dynamics of insulating granular aluminum conductance
A Supervised STDP-based Training Algorithm for Living Neural Networks
At the Roots of Dictionary Compression: String Attractors
A short proof of a lower bound for Turán numbers
Content-based Representations of audio using Siamese neural networks
Finding Connected Secluded Subgraphs
Statistical validation of financial time series via visibility graph
Convex duality in nonlinear optimal transport
Conceptual Text Summarizer: A new model in continuous vector space
A Derivative-Free Gauss-Newton Method
Kirszbraun-type Theorems For Graphs
Error Analysis for the Linear Feedback Particle Filter
An Artificial-Noise-Aided Secure Scheme for Hybrid Parallel PLC/Wireless OFDM Systems
Derivation of the stochastic Burgers equation with Dirichlet boundary conditions from the WASEP
Limiting empirical spectral distribution for the non-backtracking matrix of an Erdős-Rényi random graph
Rate-Splitting for Downlink Multi-User Multi-Antenna Systems: Bridging NOMA and Conventional Linear Precoding
A new class of bell-shaped functions
Named Entity Recognition in Twitter using Images and Text
Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Descent polynomials
Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German
How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics
Unsupervised Neural Machine Translation
Isolation and connectivity in random geometric graphs with self-similar intensity measures
Trends in European flood risk over the past 150 years
A Connection between Feed-Forward Neural Networks and Probabilistic Graphical Models
Semantic Code Repair using Neuro-Symbolic Transformation Networks
Improved quantum annealer performance from oscillating transverse fields
Techreport: Time-sensitive probabilistic inference for the edge
Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks
Asymptotic degree distributions in large homogeneous random networks: A little theory and a counterexample
On Fair Reinsurance Premiums; Capital Injections in a Perturbed Risk Model
Convergence Rates of Latent Topic Models Under Relaxed Identifiability Conditions
Summations of Linear Recurrent Sequences
Continuous Authentication Using One-class Classifiers and their Fusion
A mathematical bridge between discretized gauge theories in quantum physics and approximate reasoning in pairwise comparisons
An Integrated Approach to Crowd Video Analysis: From Tracking to Multi-level Activity Recognition
Eigenoption Discovery through the Deep Successor Representation
Infinite dimensional compressed sensing from anisotropic measurements
The Capacity of Private Computation