Learning More Robust Features with Adversarial Training

In recent years, it has been found that neural networks can be easily fooled by adversarial examples, which is a potential safety hazard in some safety-critical applications. Many researchers have proposed various method to make neural networks more robust to white-box adversarial attacks, but an effective method have not been found so far. In this short paper, we focus on the robustness of the features learned by neural networks. We show that the features learned by neural networks are not robust, and find that the robustness of the learned features is closely related to the resistance against adversarial examples of neural networks. We also find that adversarial training against fast gradients sign method (FGSM) does not make the leaned features very robust, even if it can make the trained networks very resistant to FGSM attack. Then we propose a method, which can be seen as an extension of adversarial training, to train neural networks to learn more robust features. We perform experiments on MNIST and CIFAR-10 to evaluate our method, and the experiment results show that this method greatly improves the robustness of the learned features and the resistance to adversarial attacks.

PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in a dynamic environment with uncertainties. Symbolic plans are used to guide the agent’s task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.

Right Answer for the Wrong Reason: Discovery and Mitigation

Exposing the weaknesses of neural models is crucial for improving their performance and robustness in real-world applications. One common approach is to examine how input perturbations affect the output. Our analysis takes this to an extreme on natural language processing tasks by removing as many words as possible from the input without changing the model prediction. For question answering and natural language inference, this of- ten reduces the inputs to just one or two words, while model confidence remains largely unchanged. This is an undesireable behavior: the model gets the Right Answer for the Wrong Reason (RAWR). We introduce a simple training technique that mitigates this problem while maintaining performance on regular examples.

Value-aware Quantization for Training and Inference of Neural Networks

We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large data in high precision, which reduces total quantization errors under very low precision. We present new techniques to apply the proposed quantization to training and inference. The experiments show that our method with 3-bit activations (with 2% of large ones) can give the same training accuracy as full-precision one while offering significant (41.6% and 53.7%) reductions in the memory cost of activations in ResNet-152 and Inception-v3 compared with the state-of-the-art method. Our experiments also show that deep networks such as Inception-v3, ResNet-101 and DenseNet-121 can be quantized for inference with 4-bit weights and activations (with 1% 16-bit data) within 1% top-1 accuracy drop.

Understanding AI Data Repositories with Automatic Query Generation

We describe a set of techniques to generate queries automatically based on one or more ingested, input corpuses. These queries require no a priori domain knowledge, and hence no human domain experts. Thus, these auto-generated queries help address the epistemological question of how we know what we know, or more precisely in this case, how an AI system with ingested data knows what it knows. These auto-generated queries can also be used to identify and remedy problem areas in ingested material — areas for which the knowledge of the AI system is incomplete or even erroneous. Similarly, the proposed techniques facilitate tests of AI capability — both in terms of coverage and accuracy. By removing humans from the main learning loop, our approach also allows more effective scaling of AI and cognitive capabilities to provide (1) broader coverage in a single domain such as health or geology; and (2) more rapid deployment to new domains. The proposed techniques also allow ingested knowledge to be extended naturally. Our investigations are early, and this paper provides a description of the techniques. Assessment of their efficacy is our next step for future work.

Sequential Network Transfer: Adapting Sentence Embeddings to Human Activities and Beyond

We study the problem of adapting neural sentence embedding models to the domain of human activities to capture their relations in different dimensions. We introduce a novel approach, Sequential Network Transfer, and show that it largely improves the performance on all dimensions. We also extend this approach to other semantic similarity datasets, and show that the resulting embeddings outperform traditional transfer learning approaches in many cases, achieving state-of-the-art results on the Semantic Textual Similarity (STS) Benchmark. To account for the improvements, we provide some interpretation of what the networks have learned. Our results suggest that Sequential Network Transfer is highly effective for various sentence embedding models and tasks.

CactusNets: Layer Applicability as a Metric for Transfer Learning

Deep neural networks trained over large datasets learn features that are both generic to the whole dataset, and specific to individual classes in the dataset. Learned features tend towards generic in the lower layers and specific in the higher layers of a network. Methods like fine-tuning are made possible because of the ability for one filter to apply to multiple target classes. Much like the human brain this behavior, can also be used to cluster and separate classes. However, to the best of our knowledge there is no metric for how applicable learned features are to specific classes. In this paper we propose a definition and metric for measuring the applicability of learned features to individual classes, and use this applicability metric to estimate input applicability and produce a new method of unsupervised learning we call the CactusNet.

What’s Going On in Neural Constituency Parsers? An Analysis

A number of differences have emerged between modern and classic approaches to constituency parsing in recent years, with structural components like grammars and feature-rich lexicons becoming less central while recurrent neural network representations rise in popularity. The goal of this work is to analyze the extent to which information provided directly by the model structure in classical systems is still being captured by neural methods. To this end, we propose a high-performance neural model (92.08 F1 on PTB) that is representative of recent work and perform a series of investigative experiments. We find that our model implicitly learns to encode much of the same information that was explicitly provided by grammars and lexicons in the past, indicating that this scaffolding can largely be subsumed by powerful general-purpose neural machinery.

Probabilistic Analysis of Balancing Scores for Causal Inference

Propensity scores are often used for stratification of treatment and control groups of subjects in observational data to remove confounding bias when estimating of causal effect of the treatment on an outcome in so-called potential outcome causal modeling framework. In this article, we try to get some insights into basic behavior of the propensity scores in a probabilistic sense. We do a simple analysis of their usage confining to the case of discrete confounding covariates and outcomes. While making clear about behavior of the propensity score our analysis shows how the so-called prognostic score can be derived simultaneously. However the prognostic score is derived in a limited sense in the current literature whereas our derivation is more general and shows all possibilities of having the score. And we call it outcome score. We argue that application of both the propensity score and the outcome score is the most efficient way for reduction of dimension in the confounding covariates as opposed to current belief that the propensity score alone is the most efficient way.

Is feature selection secure against training data poisoning?

Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.

A geometric view on Pearson’s correlation coefficient and a generalization of it to non-linear dependencies

Measuring strength or degree of statistical dependence between two random variables is a common problem in many domains. Pearson’s correlation coefficient \rho is an accurate measure of linear dependence. We show that \rho is a normalized, Euclidean type distance between joint probability distribution of the two random variables and that when their independence is assumed while keeping their marginal distributions. And the normalizing constant is the geometric mean of two maximal distances, each between the joint probability distribution when the full linear dependence is assumed while preserving respective marginal distribution and that when the independence is assumed. Usage of it is restricted to linear dependence because it is based on Euclidean type distances that are generally not metrics and considered full dependence is linear. Therefore, we argue that if a suitable distance metric is used while considering all possible maximal dependences then it can measure any non-linear dependence. But then, one must define all the full dependences. Hellinger distance that is a metric can be used as the distance measure between probability distributions and obtain a generalization of \rho for the discrete case.

Viewing Simpson’s Paradox

Well known Simpson’s paradox is puzzling and surprising for many, especially for the empirical researchers and users of statistics. However there is no surprise as far as mathematical details are concerned. A lot more is written about the paradox but most of them are beyond the grasp of such users. This short article is about explaining the phenomenon in an easy way to grasp using simple algebra and geometry. The mathematical conditions under which the paradox can occur are made explicit and a simple geometrical illustrations is used to describe it. We consider the reversal of the association between two binary variables, say, X and Y by a third binary variable, say, Z. We show that it is always possible to define Z algebraically for non-extreme dependence between X and Y, therefore occurrence of the paradox depends on identifying it with a practical meaning for it in a given context of interest, that is up to the subject domain expert. And finally we discuss the paradox in predictive contexts since in literature it is argued that the paradox is resolved using causal reasoning.

Generative Stock Question Answering

We study the problem of stock related question answering (StockQA): automatically generating answers to stock related questions, just like professional stock analysts providing action recommendations to stocks upon user’s requests. StockQA is quite different from previous QA tasks since (1) the answers in StockQA are natural language sentences (rather than entities or values) and due to the dynamic nature of StockQA, it is scarcely possible to get reasonable answers in an extractive way from the training data; and (2) StockQA requires properly analyzing the relationship between keywords in QA pair and the numerical features of a stock. We propose to address the problem with a memory-augmented encoder-decoder architecture, and integrate different mechanisms of number understanding and generation, which is a critical component of StockQA. We build a large-scale Chinese dataset containing over 180K StockQA instances, based on which various technique combinations are extensively studied and compared. Experimental results show that a hybrid word-character model with separate character components for number processing, achieves the best performance.\footnote{The data is publicly available at \url{http://…/}.}

Expert Finding in Community Question Answering: A Review

The rapid development recently of Community Question Answering (CQA) satisfies users quest for professional and personal knowledge about anything. In CQA, one central issue is to find users with expertise and willingness to answer the given questions. Expert finding in CQA often exhibits very different challenges compared to traditional methods. Sparse data and new features violate fundamental assumptions of traditional recommendation systems. This paper focuses on reviewing and categorizing the current progress on expert finding in CQA. We classify all the existing solutions into four different categories: matrix factorization based models (MF-based models), gradient boosting tree based models (GBT-based models), deep learning based models (DL-based models) and ranking based models (R-based models). We find that MF-based models outperform other categories of models in the field of expert finding in CQA. Moreover, we use innovative diagrams to clarify several important concepts of ensemble learning, and find that ensemble models with several specific single models can further boosting the performance. Further, we compare the performance of different models on different types of matching tasks, including text vs. text, graph vs. text, audio vs. text and video vs. text. The results can help the model selection of expert finding in practice. Finally, we explore some potential future issues in expert finding research in CQA.

Empirical Equilibrium

We introduce empirical equilibrium, the prediction in a game that selects the Nash equilibria that can be approximated by a sequence of payoff-monotone distributions, a well-documented proxy for empirically plausible behavior. Then, we reevaluate implementation theory based on this equilibrium concept. We show that in a partnership dissolution environment with complete information, two popular auctions that are essentially equivalent for the Nash equilibrium prediction, can be expected to differ in fundamental ways when they are operated. Besides the direct policy implications, two general consequences follow. First, a mechanism designer may not be constrained by typical invariance properties. Second, a mechanism designer who does not account for the empirical plausibility of equilibria may inadvertently design implicitly biased mechanisms.

Generating Natural Language Adversarial Examples

Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the network to misclassify. In the image domain, these perturbations can often be made virtually indistinguishable to human perception, causing humans and state-of-the-art models to disagree. However, in the natural language domain, small perturbations are clearly perceptible, and the replacement of a single word can drastically alter the semantics of the document. Given these challenges, we use a population-based optimization algorithm to generate semantically and syntactically similar adversarial examples. We demonstrate via a human study that 94.3% of the generated examples are classified to the original label by human evaluators, and that the examples are perceptibly quite similar. We hope our findings encourage researchers to pursue improving the robustness of DNNs in the natural language domain.

Swarm Intelligence: Past, Present and Future

Many optimization problems in science and engineering are challenging to solve, and the current trend is to use swarm intelligence (SI) and SI-based algorithms to tackle such challenging problems. Some significant developments have been made in recent years, though there are still many open problems in this area. This paper provides a short but timely analysis about SI-based algorithms and their links with self-organization. Different characteristics and properties are analyzed here from both mathematical and qualitative perspectives. Future research directions are outlined and open questions are also highlighted.

Fine-grained Entity Typing through Increased Discourse Context and Adaptive Classification Thresholds

Fine-grained entity typing is the task of assigning fine-grained semantic types to entity mentions. We propose a neural architecture which learns a distributional semantic representation that leverages a greater amount of semantic context — both document and sentence level information — than prior work. We find that additional context improves performance, with further improvements gained by utilizing adaptive classification thresholds. Experiments show that our approach without reliance on hand-crafted features achieves the state-of-the-art results on three benchmark datasets.

Differentially Private k-Means with Constant Multiplicative Error

We design new differentially private algorithms for the Euclidean k-means problem, both in the centralized model and in the local model of differential privacy. In both models, our algorithms achieve significantly improved error rates over the previous state-of-the-art. In addition, in the local model, our algorithm significantly reduces the number of needed interactions. Although the problem has been widely studied in the context of differential privacy, all of the existing constructions achieve only super constant approximation factors. We present, for the first time, efficient private algorithms for the problem with constant multiplicative error.

Multi-modal space structure: a new kind of latent correlation for multi-modal entity resolution

Multi-modal data is becoming more common than before because of big data issues. Finding the semantically equal or similar objects from different data sources(called entity resolution) is one of the heart problem of multi-modal task. Current models for solving this problem usually needs much paired data to find the latent correlation between multi-modal data, which is of high cost. A new kind latent correlation is proposed in this article. With the correlation, multi-modal objects can be uniformly represented in a commonly shard space. A classifying based model is designed for multi-modal entity resolution task. With the proposed method, the demand of training data can be decreased much.

A Channel-based Exact Inference Algorithm for Bayesian Networks

This paper describes a new algorithm for exact Bayesian inference that is based on a recently proposed compositional semantics of Bayesian networks in terms of channels. The paper concentrates on the ideas behind this algorithm, involving a linearisation (`stretching’) of the Bayesian network, followed by a combination of forward state transformation and backward predicate transformation, while evidence is accumulated along the way. The performance of a prototype implementation of the algorithm in Python is briefly compared to a standard implementation (pgmpy): first results show competitive performance.

Learning from the experts: From expert systems to machine learned diagnosis models

Expert diagnostic support systems have been extensively studied. The practical application of these systems in real-world scenarios have been somewhat limited due to well-understood shortcomings such as extensibility. More recently, machine learned models for medical diagnosis have gained momentum since they can learn and generalize patterns found in very large datasets like electronic health records. These models also have shortcomings. In particular, there is no easy way to incorporate prior knowledge from existing literature or experts. In this paper, we present a method to merge both approaches by using expert systems as generative models that create simulated data on which models can be learned. We demonstrate that such a learned model not only preserve the original properties of the expert systems but also addresses some of their limitations. Furthermore, we show how this approach can also be used as the starting point to combine expert knowledge with knowledge extracted from other data sources such as electronic health records.

Bridgeout: stochastic bridge regularization for deep neural networks

A major challenge in training deep neural networks is overfitting, i.e. inferior performance on unseen test examples compared to performance on training examples. To reduce overfitting, stochastic regularization methods have shown superior performance compared to deterministic weight penalties on a number of image recognition tasks. Stochastic methods such as Dropout and Shakeout, in expectation, are equivalent to imposing a ridge and elastic-net penalty on the model parameters, respectively. However, the choice of the norm of weight penalty is problem dependent and is not restricted to \{L_1,L_2\}. Therefore, in this paper we propose the Bridgeout stochastic regularization technique and prove that it is equivalent to an L_q penalty on the weights, where the norm q can be learned as a hyperparameter from data. Experimental results show that Bridgeout results in sparse model weights, improved gradients and superior classification performance compared to Dropout and Shakeout on synthetic and real datasets.

Neural Sentence Location Prediction for Summarization

A competitive baseline in sentence-level extractive summarization of news articles is the Lead-3 heuristic, where only the first 3 sentences are extracted. The success of this method is due to the tendency for writers to implement progressive elaboration in their work by writing the most important content at the beginning. In this paper, we introduce the Lead-like Recognizer (LeadR) to show how the Lead heuristic can be extended to summarize multi-section documents where it would not usually work well. This is done by introducing a neural model which produces a probability distribution over positions for sentences, so that we can locate sentences with introduction-like qualities. To evaluate the performance of our model, we use the task of summarizing multi-section documents. LeadR outperforms several baselines on this task, including a simple extension of the Lead heuristic designed for the task. Our work suggests that predicted position is a strong feature to use when extracting summaries.

Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation

The encoder-decoder dialog model is one of the most prominent methods used to build dialog systems in complex domains. Yet it is limited because it cannot output interpretable actions as in traditional systems, which hinders humans from understanding its generation process. We present an unsupervised discrete sentence representation learning method that can integrate with any existing encoder-decoder dialog models for interpretable response generation. Building upon variational autoencoders (VAEs), we present two novel models, DI-VAE and DI-VST that improve VAEs and can discover interpretable semantics via either auto encoding or context predicting. Our methods have been validated on real-world dialog datasets to discover semantic representations and enhance encoder-decoder models with interpretable generation.

Decoupled Networks

Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations. Inspired by the observation that CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference, we propose a generic decoupled learning framework which models the intra-class variation and semantic difference independently. Specifically, we first reparametrize the inner product to a decoupled form and then generalize it to the decoupled convolution operator which serves as the building block of our decoupled networks. We present several effective instances of the decoupled convolution operator. Each decoupled operator is well motivated and has an intuitive geometric interpretation. Based on these decoupled operators, we further propose to directly learn the operator from data. Extensive experiments show that such decoupled reparameterization renders significant performance gain with easier convergence and stronger robustness.

On the stab number of rectangle intersection graphs
From Weakly Chaotic Dynamics to Deterministic Subdiffusion via Copula Modeling
Mapping Images to Psychological Similarity Spaces Using Neural Networks
A Self-paced Regularization Framework for Partial-Label Learning
Sampling the Riemann-Theta Boltzmann Machine
The Statistical Model for Ticker, an Adaptive Single-Switch Text-Entry Method for Visually Impaired Users
Generalized Linear Model for Gamma Distributed Variables via Elastic Net Regularization
Generating Descriptions from Structured Data Using a Bifocal Attention Mechanism and Gated Orthogonalization
A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization
Robust Probabilistic Analysis of Transmission Power Systems based on Equivalent Circuit Formulation
Stochastic subgradient method converges on tame functions
Enumeration in Incremental FPT-Time
Inseparability and Conservative Extensions of Description Logic Ontologies: A Survey
Genus From Sandpile Torsor Algorithm
Spectral gap in random bipartite biregular graphs and its applications
Metrics that respect the support
Broadcast Domination of Triangular Matchstick Graphs and the Triangular Lattice
A Deep Representation Empowered Distant Supervision Paradigm for Clinical Information Extraction
Decidability of Timed Communicating Automata
Identification of Induction Motors with Smart Circuit Breakers
An Aggregated Multicolumn Dilated Convolution Network for Perspective-Free Counting
Autotune: A Derivative-free Optimization Framework for Hyperparameter Tuning
Spectrally Efficient OFDM System Design under Disguised Jamming
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
A Multi-Axis Annotation Scheme for Event Temporal Relations
A New Formulation of The Shortest Path Problem with On-Time Arrival Reliability
On mean-field \(GI/GI/1\) queueing model: existence, uniqueness, convergence
A Metropolis-Hastings algorithm for posterior measures with self-decomposable priors
HandyNet: A One-stop Solution to Detect, Segment, Localize & Analyze Driver Hands
ConnNet: A Long-Range Relation-Aware Pixel-Connectivity Network for Salient Segmentation
Online Improper Learning with an Approximation Oracle
Large Scale Automated Reading of Frontal and Lateral Chest X-Rays using Dual Convolutional Neural Networks
Sherali-Adams Integrality Gaps Matching the Log-Density Threshold
Modulus of continuity for polymer fluctuations and weight profiles in Poissonian last passage percolation
Current large deviations for partially asymmetric particle systems on a ring
Joint entity recognition and relation extraction as a multi-head selection problem
Inter-Annotator Agreement Networks
DeepRec: A deep encoder-decoder network for directly solving the PET reconstruction inverse problem
Massive quality factors of disorder-induced cavity modes in photonic crystal waveguides through long-range correlations
Subgoal Discovery for Hierarchical Dialogue Policy Learning
A 0.086-mm$^2$ 9.8-pJ/SOP 64k-Synapse 256-Neuron Online-Learning Digital Spiking Neuromorphic Processor in 28nm CMOS
Comment on ‘Sum of squares of uniform random variables’ by I. Weissman
Propensity Score Methods for Merging Observational and Experimental Datasets
On the ground state of spiking network activity in mammalian cortex
Designing Practical PTASes for Minimum Feedback Vertex Set in Planar Graphs
Gradient Masking Causes CLEVER to Overestimate Adversarial Perturbation Size
Estimating 3D Human Pose on a Configurable Bed from a Single Pressure Image
Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding
Stability analysis of event-triggered anytime control with multiple control laws
Massively Parallel Cross-Lingual Learning in Low-Resource Target Language Translation
Line arrangements and r-Stirling partitions
Event Extraction with Generative Adversarial Imitation Learning
Dynamic Ensemble Selection VS K-NN: why and when Dynamic Selection obtains higher classification performance?
Neural-inspired sensors enable sparse, efficient classification of spatiotemporal data
Social Bots for Online Public Health Interventions
A Cell-Division Search Technique for Inversion with Application to Picture-Discovery and Magnetotellurics
Stochastic Answer Networks for Natural Language Inference
Entity-aware Image Caption Generation
A Nutritional Label for Rankings
A Deep Learning Approach for Air Pollution Forecasting in South Korea Using Encoder-Decoder Networks & LSTM
Taylor’s law for Human Linguistic Sequences
Periodic solution of stochastic process in the distributional sense
Random weighted averages, partition structures and generalized arcsine laws
Unsupervised Natural Language Generation with Denoising Autoencoders
Chain, Generalization of Covering Code, and Deterministic Algorithm for k-SAT
Learning to Refine Human Pose Estimation
Multi-task Learning for Universal Sentence Representations: What Syntactic and Semantic Information is Captured?
Optimization of a plate with holes
A Stable and Effective Learning Strategy for Trainable Greedy Decoding
Genealogical distance under selection
Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing
Coloring of cozero-divisor graphs of commutative von Neumann regular rings
Resolving the Lord’s Paradox
Multi-view registration of unordered range scans by fast correspondence propagation of multi-scale descriptors
DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension
Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate
Best subset selection in linear regression via bi-objective mixed integer linear programming
On Associative Confounder Bias
Variational Inference In Pachinko Allocation Machines
Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons
Formal Verification of Platoon Control Strategies
Automated essay scoring with string kernels and word embeddings
Faster Shift-Reduce Constituent Parsing with a Non-Binary, Bottom-Up Strategy
Eval all, trust a few, do wrong to none: Comparing sentence generation models
Efficient Beam Training and Channel Estimation for Millimeter Wave Communications Under Mobility
Finer Tight Bounds for Coloring on Clique-Width
Neural Davidsonian Semantic Proto-role Labeling
Conditional heteroskedasticity in crypto-asset returns
Parallel Implementations of Cellular Automata for Traffic Models
Context-Attentive Embeddings for Improved Sentence Representations
Capacity of Multiple One-Bit Transceivers in a Rayleigh Environment
Macdonald denominators for affine root systems, orthogonal theta functions, and elliptic determinantal point processes
Global Convergence Analysis of the Flower Pollination Algorithm: A Discrete-Time Markov Chain Approach
Stability of the Stochastic Gradient Method for an Approximated Large Scale Kernel Machine
Learning in Games with Cumulative Prospect Theoretic Preferences
Sufficient conditions for the global rigidity of periodic graphs
Integrating Stance Detection and Fact Checking in a Unified Corpus
A 2/3-Approximation Algorithm for Vertex-weighted Matching in Bipartite Graphs
Tracing Equilibrium in Dynamic Markets via Distributed Adaptation
ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking
Synthesized Texture Quality Assessment via Multi-scale Spatial and Statistical Texture Attributes of Image and Gradient Magnitude Coefficients
Modeling and Experimental Verification of Adaptive 100% Stator Ground Fault Protection Schemes for Synchronous Generators
Angiodysplasia Detection and Localization Using Deep Convolutional Neural Networks
Ramanujan Graphs and Digraphs
New counts for the number of triangulations of cyclic polytopes
Cross-lingual Semantic Parsing
Learning Myelin Content in Multiple Sclerosis from Multimodal MRI through Adversarial Training
Predicting User Performance and Bitcoin Price Using Block Chain Transaction Network
First Impressions: A Survey on Computer Vision-Based Apparent Personality Trait Analysis
Semi-supervised User Geolocation via Graph Convolutional Networks
Multi-Head Decoder for End-to-End Speech Recognition
HeteroMed: Heterogeneous Information Network for Medical Diagnosis
Nonparametric Bayesian Instrumental Variable Analysis: Evaluating Heterogeneous Effects of Arterial Access Sites for Opening Blocked Blood Vessels
Query Focused Variable Centroid Vectors for Passage Re-ranking in Semantic Search
Adversarial Training for Community Question Answer Selection Based on Multi-scale Matching
Attenuate Locally, Win Globally: An Attenuation-based Framework for Online Stochastic Matching with Timeouts
A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding
Efficient Large-Scale Domain Classification with Personalized Attention
MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server
On a positivity preserving numerical scheme for jump-extended CIR process: the alpha-stable case
Spin torque oscillator for microwave assisted magnetization reversal
Inducing and Embedding Senses with Scaled Gumbel Softmax
A Spherical Probability Distribution Model of the User-Induced Mobile Phone Orientation
Anchor-based Nearest Class Mean Loss for Convolutional Neural Networks
Tunable glassiness on a two-dimensional atomic spin array
IIIDYT at SemEval-2018 Task 3: Irony detection in English tweets
Swarm robotics in wireless distributed protocol design for coordinating robots involved in cooperative tasks
A Primal-Dual Online Deterministic Algorithm for Matching with Delays
Rician $K$-Factor-Based Analysis of XLOS Service Probability in 5G Outdoor Ultra-Dense Networks
On the Mean Residence Time in Stochastic Lattice-Gas Models
Sampling in Uniqueness from the Potts and Random-Cluster Models on Random Regular Graphs
A constrained risk inequality for general losses
Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment
Matching Fingerphotos to Slap Fingerprint Images