EgoCoder: Intelligent Program Synthesis with Hierarchical Sequential Neural Network Model

Programming has been an important skill for researchers and practitioners in computer science and other related areas. To learn basic programing skills, a long-time systematic training is usually required for beginners. According to a recent market report, the computer software market is expected to continue expanding at an accelerating speed, but the market supply of qualified software developers can hardly meet such a huge demand. In recent years, the surge of text generation research works provides the opportunities to address such a dilemma through automatic program synthesis. In this paper, we propose to make our try to solve the program synthesis problem from a data mining perspective. To address the problem, a novel generative model, namely EgoCoder, will be introduced in this paper. EgoCoder effectively parses program code into abstract syntax trees (ASTs), where the tree nodes will contain the program code/comment content and the tree structure can capture the program logic flows. Based on a new unit model called Hsu, EgoCoder can effectively capture both the hierarchical and sequential patterns in the program ASTs. Extensive experiments will be done to compare EgoCoder with the state-of-the-art text generation methods, and the experimental results have demonstrated the effectiveness of EgoCoder in addressing the program synthesis problem.

CascadeCNN: Pushing the performance limits of quantisation

This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, to perform high-throughput inference by exploiting the computation time-accuracy trade-off. Without the need for retraining, a two-stage architecture tailored for any given FPGA device is generated, consisting of a low- and a high-precision unit. A confidence evaluation unit is employed between them to identify misclassified cases at run time and forward them to the high-precision unit or terminate computation. Experiments demonstrate that CascadeCNN achieves a performance boost of up to 55% for VGG-16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy.

Adversarially Robust Training through Structured Gradient Regularization

We propose a novel data-dependent structured gradient regularizer to increase the robustness of neural networks vis-a-vis adversarial perturbations. Our regularizer can be derived as a controlled approximation from first principles, leveraging the fundamental link between training with noise and regularization. It adds very little computational overhead during learning and is simple to implement generically in standard deep learning frameworks. Our experiments provide strong evidence that structured gradient regularization can act as an effective first line of defense against attacks based on low-level signal corruption.

Adversarial Training of Word2Vec for Basket Completion

In recent years, the Word2Vec model trained with the Negative Sampling loss function has shown state-of-the-art results in a number of machine learning tasks, including language modeling tasks, such as word analogy and word similarity, and in recommendation tasks, through Prod2Vec, an extension that applies to modeling user shopping activity and user preferences. Several methods that aim to improve upon the standard Negative Sampling loss have been proposed. In our paper we pursue more sophisticated Negative Sampling, by leveraging ideas from the field of Generative Adversarial Networks (GANs), and propose Adversarial Negative Sampling. We build upon the recent progress made in stabilizing the training objective of GANs in the discrete data setting, and introduce a new GAN-Word2Vec model.We evaluate our model on the task of basket completion, and show significant improvements in performance over Word2Vec trained using standard loss functions, including Noise Contrastive Estimation and Negative Sampling.

Parsimonious Bayesian deep networks

Combining Bayesian nonparametrics and a forward model selection strategy, we construct parsimonious Bayesian deep networks (PBDNs) that infer capacity-regularized network architectures from the data and require neither cross-validation nor fine-tuning when training the model. One of the two essential components of a PBDN is the development of a special infinite-wide single-hidden-layer neural network, whose number of active hidden units can be inferred from the data. The other one is the construction of a greedy layer-wise learning algorithm that uses a forward model selection criterion to determine when to stop adding another hidden layer. We develop both Gibbs sampling and stochastic gradient descent based maximum a posteriori inference for PBDNs, providing state-of-the-art classification accuracy and interpretable data subtypes near the decision boundaries, while maintaining low computational complexity for out-of-sample prediction.

Reducing Disparate Exposure in Ranking: A Learning To Rank Approach

In this paper we consider a ranking problem in which we would like to order a set of items by utility or relevance, while also considering the visibility of different groups of items. To solve this problem, we adopt a supervised learning to rank approach that learns a ranking function from a set of training examples, which are queries and ranked lists of documents for each query. We consider that the elements to be ranked are divided into two groups: protected and non-protected. Following long-standing empirical observations showing that users of information retrieval systems rarely look past the first few results, we consider that some items receive more exposure than others. Our objective is to produce a ranker that is able to reproduce the ordering of the training set, which is the standard objective in learning to rank, but that additionally gives protected elements sufficient exposure, compared to non-protected elements. We demonstrate how to describe this objective formally, how to achieve it effectively and implement it, and present an experimental study describing how large differences in exposure can be reduced without having to introduce large distortions in the ranking utility.

Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance

Building tools for code-mixed data is rapidly gaining popularity in the NLP research community as such data is exponentially rising on social media. Working with code-mixed data contains several challenges, especially due to grammatical inconsistencies and spelling variations in addition to all the previous known challenges for social media scenarios. In this article, we present a novel architecture focusing on normalizing phonetic typing variations, which is commonly seen in code-mixed data. One of the main features of our architecture is that in addition to normalizing, it can also be utilized for back-transliteration and word identification in some cases. Our model achieved an accuracy of 90.27% on the test data.

Robust Conditional Generative Adversarial Networks

Conditional generative adversarial networks (cGAN) have led to large improvements in the task of conditional image generation, which lies at the heart of computer vision. The major focus so far has been on performance improvement, while there has been little effort in making cGAN more robust to noise or leveraging structure in the output space of the model. The end-to-end regression (of the generator) might lead to arbitrarily large errors in the output, which is unsuitable for the application of such networks to real-world systems. In this work, we introduce a novel conditional GAN, called RoCGAN, which adds implicit constraints to address the issue. Our proposed model augments the generator with an unsupervised pathway, which encourages the outputs of the generator to span the target manifold even in the presence of large amounts of noise. We prove that RoCGAN shares similar theoretical properties as GAN and experimentally verify that the proposed model outperforms existing state-of-the-art cGAN architectures by a large margin in a variety of domains including images from natural scenes and faces.

LMKL-Net: A Fast Localized Multiple Kernel Learning Solver via Deep Neural Networks

In this paper we propose solving localized multiple kernel learning (LMKL) using LMKL-Net, a feedforward deep neural network. In contrast to previous works, as a learning principle we propose {\em parameterizing} both the gating function for learning kernel combination weights and the multiclass classifier in LMKL using an attentional network (AN) and a multilayer perceptron (MLP), respectively. In this way we can learn the (nonlinear) decision function in LMKL (approximately) by sequential applications of AN and MLP. Empirically on benchmark datasets we demonstrate that overall LMKL-Net can not only outperform the state-of-the-art MKL solvers in terms of accuracy, but also be trained about {\em two orders of magnitude} faster with much smaller memory footprint for large-scale learning.

Image Retrieval using Heat Diffusion for Deep Feature Aggregation

Image-level feature descriptors obtained from convolutional neural networks have shown powerful representation capabilities for image retrieval. In this paper, we present an unsupervised method to aggregate deep convolutional features into compact yet discriminative image vectors by simulating the dynamics of heat diffusion. A distinctive problem in image retrieval is that repetitive or bursty features tend to dominate feature representations, leading to less than ideal matches. We show that by leveraging elegant properties of the heat equation, our method is able to select informative features while avoiding over-representation of bursty features. We additionally present a theoretical time complexity analysis showing the efficiency of our method, which is further demonstrated in our experimental evaluation. Finally, we extensively evaluate the proposed approach with pre-trained and fine-tuned deep networks on common public benchmarks, and show superior performance compared to previous work.

Best of many worlds: Robust model selection for online supervised learning

We introduce algorithms for online, full-information prediction that are competitive with contextual tree experts of unknown complexity, in both probabilistic and adversarial settings. We show that by incorporating a probabilistic framework of structural risk minimization into existing adaptive algorithms, we can robustly learn not only the presence of stochastic structure when it exists (leading to constant as opposed to \mathcal{O}(\sqrt{T}) regret), but also the correct model order. We thus obtain regret bounds that are competitive with the regret of an optimal algorithm that possesses strong side information about both the complexity of the optimal contextual tree expert and whether the process generating the data is stochastic or adversarial. These are the first constructive guarantees on simultaneous adaptivity to the model and the presence of stochasticity.

An integer-valued time series model for multivariate surveillance

In recent days different types of surveillance data are becoming available for public health reasons. In most cases several variables are monitored and events of different types are reported. As the amount of surveillance data increases, statistical methods that can effectively address multivariate surveillance scenarios are demanded. Even though research activity in this field is increasing rapidly in recent years, only a few approaches have simultaneously addressed the integer-valued property of the data and its correlation (both time correlation and cross correlation) structure. In this paper, we suggest a multivariate integer-valued autoregressive model that allows for both serial and cross correlation between the series and can easily accommodate overdispersion and covariate information. Moreover, its structure implies a natural decomposition into an endemic and an epidemic component, a common distinction in dynamic models for infectious disease counts. Detection of disease outbreaks is achieved through the comparison of surveillance data with one-step-ahead predictions obtained after fitting the suggested model to a set of historical data. The performance of the suggested model is illustrated on a trivariate series of syndromic surveillance data collected during Athens 2004 Olympic Games.

Rank Minimization on Tensor Ring: A New Paradigm in Scalable Tensor Decomposition and Completion

In low-rank tensor completion tasks, due to the underlying multiple large-scale singular value decomposition (SVD) operations and rank selection problem of the traditional methods, they suffer from high computational cost and high sensitivity of model complexity. In this paper, taking advantages of high compressibility of the recently proposed tensor ring (TR) decomposition, we propose a new model for tensor completion problem. This is achieved through introducing convex surrogates of tensor low-rank assumption on latent tensor ring factors, which makes it possible for the Schatten norm regularization based models to be solved at much smaller scale. We propose two algorithms which apply different structured Schatten norms on tensor ring factors respectively. By the alternating direction method of multipliers (ADMM) scheme, the tensor ring factors and the predicted tensor can be optimized simultaneously. The experiments on synthetic data and real-world data show the high performance and efficiency of the proposed approach.

State-Denoised Recurrent Neural Networks

Recurrent neural networks (RNNs) are difficult to train on sequence processing tasks, not only because input noise may be amplified through feedback, but also because any inaccuracy in the weights has similar consequences as input noise. We describe a method for denoising the hidden state during training to achieve more robust representations thereby improving generalization performance. Attractor dynamics are incorporated into the hidden state to `clean up’ representations at each step of a sequence. The attractor dynamics are trained through an auxillary denoising loss to recover previously experienced hidden states from noisy versions of those states. This state-denoised recurrent neural network {SDRNN} performs multiple steps of internal processing for each external sequence step. On a range of tasks, we show that the SDRNN outperforms a generic RNN as well as a variant of the SDRNN with attractor dynamics on the hidden state but without the auxillary loss. We argue that attractor dynamics—and corresponding connectivity constraints—are an essential component of the deep learning arsenal and should be invoked not only for recurrent networks but also for improving deep feedforward nets and intertask transfer.

Opening the black box of deep learning

The great success of deep learning shows that its technology contains profound truth, and understanding its internal mechanism not only has important implications for the development of its technology and effective application in various fields, but also provides meaningful insights into the understanding of human brain mechanism. At present, most of the theoretical research on deep learning is based on mathematics. This dissertation proposes that the neural network of deep learning is a physical system, examines deep learning from three different perspectives: microscopic, macroscopic, and physical world views, answers multiple theoretical puzzles in deep learning by using physics principles. For example, from the perspective of quantum mechanics and statistical physics, this dissertation presents the calculation methods for convolution calculation, pooling, normalization, and Restricted Boltzmann Machine, as well as the selection of cost functions, explains why deep learning must be deep, what characteristics are learned in deep learning, why Convolutional Neural Networks do not have to be trained layer by layer, and the limitations of deep learning, etc., and proposes the theoretical direction and basis for the further development of deep learning now and in the future. The brilliance of physics flashes in deep learning, we try to establish the deep learning technology based on the scientific theory of physics.

Reducing Parameter Space for Neural Network Training

For neural networks (NNs) with rectified linear unit (ReLU) or binary activation functions, we show that their training can be accomplished in a reduced parameter space. Specifically, the weights in each neuron can be trained on the unit sphere, as opposed to the entire space, and the threshold can be trained in a bounded interval, as opposed to the real line. We show that the NNs in the reduced parameter space are mathematically equivalent to the standard NNs with parameters in the whole space. The reduced parameter space shall facilitate the optimization procedure for the network training, as the search space becomes (much) smaller. We demonstrate the improved training performance using numerical examples.

Maximum Causal Tsallis Entropy Imitation Learning

In this paper, we propose a novel maximum causal Tsallis entropy (MCTE) framework for imitation learning which can efficiently learn a sparse multi-modal policy distribution from demonstrations. We provide the full mathematical analysis of the proposed framework. First, the optimal solution of an MCTE problem is shown to be a sparsemax distribution, whose supporting set can be adjusted. The proposed method has advantages over a softmax distribution in that it can exclude unnecessary actions by assigning zero probability. Second, we prove that an MCTE problem is equivalent to robust Bayes estimation in the sense of the Brier score. Third, we propose a maximum causal Tsallis entropy imitation learning (MCTEIL) algorithm with a sparse mixture density network (sparse MDN) by modeling mixture weights using a sparsemax distribution. In particular, we show that the causal Tsallis entropy of an MDN encourages exploration and efficient mixture utilization while Boltzmann Gibbs entropy is less effective. We validate the proposed method in two simulation studies and MCTEIL outperforms existing imitation learning methods in terms of average returns and learning multi-modal policies.

AgileNet: Lightweight Dictionary-based Few-shot Learning

The success of deep learning models is heavily tied to the use of massive amount of labeled data and excessively long training time. With the emergence of intelligent edge applications that use these models, the critical challenge is to obtain the same inference capability on a resource-constrained device while providing adaptability to cope with the dynamic changes in the data. We propose AgileNet, a novel lightweight dictionary-based few-shot learning methodology which provides reduced complexity deep neural network for efficient execution at the edge while enabling low-cost updates to capture the dynamics of the new data. Evaluations of state-of-the-art few-shot learning benchmarks demonstrate the superior accuracy of AgileNet compared to prior arts. Additionally, AgileNet is the first few-shot learning approach that prevents model updates by eliminating the knowledge obtained from the primary training. This property is ensured through the dictionaries learned by our novel end-to-end structured decomposition, which also reduces the memory footprint and computation complexity to match the edge device constraints.

AxTrain: Hardware-Oriented Neural Network Training for Approximate Inference

The intrinsic error tolerance of neural network (NN) makes approximate computing a promising technique to improve the energy efficiency of NN inference. Conventional approximate computing focuses on balancing the efficiency-accuracy trade-off for existing pre-trained networks, which can lead to suboptimal solutions. In this paper, we propose AxTrain, a hardware-oriented training framework to facilitate approximate computing for NN inference. Specifically, AxTrain leverages the synergy between two orthogonal methods—one actively searches for a network parameters distribution with high error tolerance, and the other passively learns resilient weights by numerically incorporating the noise distributions of the approximate hardware in the forward pass during the training phase. Experimental results from various datasets with near-threshold computing and approximation multiplication strategies demonstrate AxTrain’s ability to obtain resilient neural network parameters and system energy efficiency improvement.

Multiple Causal Inference with Latent Confounding

Causal inference from observational data requires assumptions. These assumptions range from measuring confounders to identifying instruments. Traditionally, these assumptions have focused on estimation in a single causal problem. In this work, we develop techniques for causal estimation in causal problems with multiple treatments. We develop two assumptions based on shared confounding between treatments and independence of treatments given the confounder. Together these assumptions lead to a confounder estimator regularized by mutual information. For this estimator, we develop a tractable lower bound. To fit the outcome model, we use the residual information in the treatments given the confounder. We validate on simulations and an example from clinical medicine.

A Convolutional Feature Map based Deep Network targeted towards Traffic Detection and Classification
Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication
Multi-model inference through projections in model space
Minimum-gain Pole Placement with Sparse Static Feedback
Complexity of Stability in Trading Networks
On the Connection Between Sequential Quadratic Programming and Riemannian Gradient Methods
Peer-to-Peer Energy-Aware Tree Network Formation
Fake News Detection with Deep Diffusive Network Model
A Tropical Approach to Neural Networks with Piecewise Linear Activations
On resilience of connectivity in the evolution of random graphs
A change of perspective in network centrality
A Utility-Based Channel Ranking for Cognitive Radio Systems
Efficient Stochastic Gradient Descent for Distributionally Robust Learning
Inferring Human Traits From Facebook Statuses
Automatic Adaptation of Person Association for Multiview Tracking in Group Activities
A Distributed Version of the Hungarian Method for Multi-Robot Assignment
Multiplex networks of musical artists: the effect of heterogeneous inter-layer links
Aesthetics Assessment of Images Containing Faces
A Parameter Estimation of Fractional Order Gray Model Based on Adaptive Dynamic Cat Swarm Algorithm
Convexity Shape Prior for Level Set based Image Segmentation Method
Packing A-Paths of Length Zero Modulo Four
Information Constraints on Auto-Encoding Variational Bayes
Adding One Neuron Can Eliminate All Bad Local Minima
Regression Analysis of Proportion Outcomes with Random Effects
Optimal Cheeger cuts and bisections of random geometric graphs
Structured Bayesian Gaussian process latent variable model
COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment
Universal discriminative quantum neural networks
Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning
Cache-based Multi-query Optimization for Data-intensive Scalable Computing Frameworks
Constructing Compact Brain Connectomes for Individual Fingerprinting
Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits
Cost-aware Cascading Bandits
Solvable Integration Problems and Optimal Sample Size Selection
Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit
On the Worst-Case Complexity of TimSort
Optimization, fast and slow: optimally switching between local and Bayesian optimization
DRAPS: Dynamic and Resource-Aware Placement Scheme for Docker Containers in a Heterogeneous Cluster
Neural Generative Models for Global Optimization with Gradients
Confounding-Robust Policy Improvement
Computable Variants of AIXI which are More Powerful than AIXItl
An Empirical Bayes Approach for Distributed Estimation of Spatial Fields
Central limit theorems for multivariate Bessel processes in the freezing regime
‘Why Should I Trust Interactive Learners ‘ Explaining Interactive Queries of Classifiers to Users
PDQP/qpoly = ALL
Breaking the Activation Function Bottleneck through Adaptive Parameterization
On Coresets for Logistic Regression
Less is More: Surgical Phase Recognition with Less Annotations through Self-Supervised Pre-training of CNN-LSTM Networks
Global Navigation Using Predictable and Slow Feature Analysis in Multiroom Environments, Path Planning and Other Control Tasks
A 2D laser rangefinder scans dataset of standard EUR pallets
A Deformed Quon Algebra
Convergence theorems for barycentric maps
Weak Poincaré and Nash-type inequalities via density of states estimates
More Consequences of Falsifying SETH and the Orthogonal Vectors Conjecture
Robust Model Predictive Control for Autonomous Vehicles/Self Driving Cars
Learning over Multitask Graphs – Part II: Performance Analysis
A Recurrent Convolutional Neural Network Approach for Sensorless Force Estimation in Robotic Surgery
Fast Motion Deblurring for Feature Detection and Matching Using Inertial Measurements
Blockchain and Trusted Computing: Problems, Pitfalls, and a Solution for Hyperledger Fabric
Fully Understanding the Hashing Trick
Extremal Controls in the Sub-Riemannian Problem on the Group of Motions of Euclidean Space
Learning over Multitask Graphs – Part I: Stability Analysis
Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid Approach
Gossip of Statistical Observations using Orthogonal Polynomials
Existence and Besov regularity of the density for a class of SDEs with Volterra noise
Safe Element Screening for Submodular Function Minimization
Social-Network-Assisted Worker Recruitment in Mobile Crowd Sensing
Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search
Deep learning generalizes because the parameter-function map is biased towards simple functions
On Distributed Nonlinear Signal Analytics : Bandwidth and Approximation Error Tradeoffs
MonetDBLite: An Embedded Analytical Database
Functional Regression Models with Highly Irregular Designs
Interacting self-avoiding polygons
Non-parametric Structural Change Detection in Multivariate Systems
Part-Based Tracking by Sampling
The out-equilibrium 2D Ising spin glass: almost, but not quite, a free-field theory
OmniDetector: With Neural Networks to Bounding Boxes
Implicit Reparameterization Gradients
Blind Predicting Similar Quality Map for Image Quality Assessment
Knowledge-based Fully Convolutional Network and Its Application in Segmentation of Lung CT Images
Generative Code Modeling with Graphs
Pose-Based Two-Stream Relational Networks for Action Recognition in Videos
Decoupling multivariate functions using second-order information and tensors
Speed of propagation for Hamilton-Jacobi equations with multiplicative rough time dependence and convex Hamiltonians
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Low-Rank Tensor Decomposition via Multiple Reshaping and Reordering Operations
Variational Learning on Aggregate Outputs with Gaussian Processes
Meta-Learning with Hessian Free Approach in Deep Neural Nets Training
The restricted $h$-connectivity of balanced hypercubes
Distributed Partitioned Big-Data Optimization via Asynchronous Dual Decomposition
Resilient consensus for multi-agent systems subject to differential privacy requirements
QBF as an Alternative to Courcelle’s Theorem
Context-Aware Sequence-to-Sequence Models for Conversational Systems
On the existence and uniqueness of self-adjoint realizations of discrete (magnetic) Schrödinger operators
Scene Coordinate and Correspondence Learning for Imabe-Based Localization
Classification Uncertainty of Deep Neural Networks Based on Gradient Information
Paracompositionality, MWEs and Argument Substitution
RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Correctness and Fairness of Tendermint-core Blockchains
Efficient schemes for the quantum teleportation of a sub-class of tripartite entangled states
Bayesian Inference of Regular Expressions from Human-Generated Example Strings
Rainbow structures in locally bounded colourings of graphs
Fast and Accurate Binary Response Mixed Model Analysis via Expectation Propagation
Joint Detection and Localization of an Unknown Number of Sources Using Algebraic Structure of the Noise Subspace
Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition
Training Convolutional Networks with Web Images
Estimating the Rating of Reviewers Based on the Text
Stochastic nonlinear Schrödinger equation with almost space-time white noise
Autofocus Layer for Semantic Segmentation
Adaptive Boundary Control of Constant-Parameter Reaction-Diffusion PDEs Using Regulation-Triggered Finite-Time Identification
Adapted Deep Embeddings: A Synthesis of Methods for $k$-Shot Inductive Transfer Learning
Deep Learning with Cinematic Rendering – Fine-Tuning Deep Neural Networks Using Photorealistic Medical Images
Learning to Optimize via Wasserstein Deep Inverse Optimal Control
Joint Image Captioning and Question Answering
M-convexity of the minimum-cost packings of arborescences
Natural gradient in Wasserstein statistical manifold
Counting partitions inside a rectangle
Modeling the Safety Effect of Access and Signal Density on Suburban Arterials: Using Macro Level Analysis Method
Speeding-up Age Estimation in Intelligent Demographics System via Network Optimization
Cutting plane methods can be extended into nonconvex optimization
Learning Markov Clustering Networks for Scene Text Detection
Improved Algorithms for Collaborative PAC Learning
Learning sentence embeddings using Recursive Networks
Controlling Personality-Based Stylistic Variation with Neural Natural Language Generators
A Solvable High-Dimensional Model of GAN
How To Solve Moral Conundrums with Computability Theory
Success Probability of Grant-Free Random Access with Massive MIMO
Equivalent matrices up to permutations
Nearest neighbor density functional estimation based on inverse Laplace transform
Extinction time of the logistic process
Spectral lower bounds for the quantum chromatic number of a graph
Storage and Memory Characterization of Data Intensive Workloads for Bare Metal Cloud
Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport
Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents
Verifiable Reinforcement Learning via Policy Extraction
Robust Gradient Descent via Moment Encoding with LDPC Codes
Measurement-wise Occlusion in Multi-object Tracking
A Spatially Correlated Auto-regressive Model for Count Data
Teaching Multiple Concepts to Forgetful Learners
Adaptive Monte-Carlo Optimization
The Swarmathon: An Autonomous Swarm Robotics Competition
Self-Attention Generative Adversarial Networks
Learning long-range spatial dependencies with horizontal gated-recurrent units
Intriguing behavior when testing the impact of quotation marks usage in Google search results
Learning Safe Policies with Expert Guidance
geomstats: a Python Package for Riemannian Geometry in Machine Learning
Deep Energy Estimator Networks
Anchored Bayesian Gaussian Mixture Models
Compression of Deep Convolutional Neural Networks under Joint Sparsity Constraints
Lassoing Eigenvalues
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation
Character-based Neural Networks for Sentence Pair Modeling
Data-Efficient Hierarchical Reinforcement Learning
Large Sample Covariance Matrices of Concentrated Vectors
Measuring and regularizing networks in function space
Transformations of High-Level Synthesis Codes for High-Performance Computing
Anti-regular graphs with loops and their spectrum
Communication with Crystal-Free Radios
Weighted batch means estimators in Markov chain Monte Carlo
Bernoulli shifts with bases of equal entropy are isomorphic
The limit shape of convex hull peeling
Multiple Treatments with Strategic Interaction
Halo: Learning Semantics-Aware Representations for Cross-Lingual Information Extraction
Colouring Square-Free Graphs without Long Induced Paths
Effective Dimension of Exp-concave Optimization
On the Selection of Initialization and Activation Function for Deep Neural Networks
Planning to Give Information in Partially Observed Domains with a Learned Weighted Entropy Model
Does a distinct quasi many-body localized phase exist A numerical study of a translationally invariant system in the thermodynamic limit
Evolving Real-Time Heuristics Search Algorithms with Building Blocks
Algorithmic and algebraic aspects of unshuffling permutations
Sample Compression for Real-Valued Learners
Classifier-agnostic saliency map extraction
Stochastic modified equations for the asynchronous stochastic gradient descent
Sparse and Constrained Attention for Neural Machine Translation
The Roles of Supervised Machine Learning in Systems Neuroscience
Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings
Improving CNN classifiers by estimating test-time priors
Multilevel Models Allow Modular Specification of What and Where to Regularize, Especially in Small Area Estimation
A Simple Cache Model for Image Recognition
Stamp-it: A more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model
Joint Configuration of Transmission Direction and Altitude in UAV-based Two-Way Communication
Multi-robot Symmetric Rendezvous Search on the Line with an Unknown Initial Distance
The right way to teach the FFT
Towards Global Optimization in Display Advertising by Integrating Multimedia Metrics with Real-Time Bidding
Correlation Clustering Based Coalition Formation For Multi-Robot Task Allocation
Improving Adversarial Robustness by Data-Specific Discretization
Simulation of Large Scale Neural Networks for Evaluation Applications
Transitions, Losses, and Re-parameterizations: Elements of Prediction Games
Wavelet Convolutional Neural Networks
Algorithms and Theory for Multiple-Source Adaptation
Orthogonal Point Location and Rectangle Stabbing Queries in 3-d
A Novel Second-Order Nonlinear Differentiator With Application to Active Disturbance Rejection Control
Gradient descent in hyperbolic space
A syllogistic system for propositions with intermediate quantifiers
ScaffoldNet: Detecting and Classifying Biomedical Polymer-Based Scaffolds via a Convolutional Neural Network
Automatic Data Registration of Geostationary Payloads for Meteorological Applications at ISRO
On profitability of selfish mining
Semantic Cluster Unary Loss for Efficient Deep Hashing
Method of increasing the information capacity of associative memory of oscillator neural networks using high-order synchronization effect
Attaining human-level performance for anatomical landmark detection in 3D CT data
Replicating Active Appearance Model by Generator Network
An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems
Fast Symbolic 3D Registration Solution