Self-Attentive Sequential Recommendation

Sequential dynamics are a key feature of many modern recommender systems, which seek to capture the `context’ of users’ activities on the basis of actions they have performed recently. To capture such patterns, two approaches have proliferated: Markov Chains (MCs) and Recurrent Neural Networks (RNNs). Markov Chains assume that a user’s next action can be predicted on the basis of just their last (or last few) actions, while RNNs in principle allow for longer-term semantics to be uncovered. Generally speaking, MC-based methods perform best in extremely sparse datasets, where model parsimony is critical, while RNNs perform better in denser datasets where higher model complexity is affordable. The goal of our work is to balance these two goals, by proposing a self-attention based sequential model (SASRec) that allows us to capture long-term semantics (like an RNN), but, using an attention mechanism, makes its predictions based on relatively few actions (like an MC). At each time step, SASRec seeks to identify which items are `relevant’ from a user’s action history, and use them to predict the next item. Extensive empirical studies show that our method outperforms various state-of-the-art sequential models (including MC/CNN/RNN-based approaches) on both sparse and dense datasets. Moreover, the model is an order of magnitude more efficient than comparable CNN/RNN-based models. Visualizations on attention weights also show how our model adaptively handles datasets with various density, and uncovers meaningful patterns in activity sequences.

An Adaptive Conversational Bot Framework

How can we enable users to heavily specify criteria for database queries in a user-friendly way? This paper describes a general framework of a conversational bot that extracts meaningful information from user’s sentences, that asks subsequent questions to complete missing information, and that adjusts its questions and information-extraction parameters for later conversations depending on users’ behavior. Additionally, we provide a comparison of existing tools and give novel techniques to implement such framework. Finally, we exemplify the framework with a bot to query movies in a database, whose code is available for Microsoft employees.

KDSL: a Knowledge-Driven Supervised Learning Framework for Word Sense Disambiguation

We propose KDSL, a new word sense disambiguation (WSD) framework that utilizes knowledge to automatically generate sense-labeled data for supervised learning. First, from WordNet, we automatically construct a semantic knowledge base called DisDict, which provides refined feature words that highlight the differences among word senses, i.e., synsets. Second, we automatically generate new sense-labeled data by DisDict from unlabeled corpora. Third, these generated data, together with manually labeled data, are fed to a supervised learning neural network to model the semantic relations among synsets, feature words and their contexts. Jointly with the supervised learning process, we also implement unsupervised learning on unlabeled data as an auxiliary task. The experimental results show that KDSL outperforms several representative state-of-the-art methods on various major benchmarks. Interestingly, it performs relatively well even when manually labeled data is unavailable, thus provides a new promising backoff strategy for WSD.

Superhighway: Bypass Data Sparsity in Cross-Domain CF

Cross-domain collaborative filtering (CF) aims to alleviate data sparsity in single-domain CF by leveraging knowledge transferred from related domains. Many traditional methods focus on enriching compared neighborhood relations in CF directly to address the sparsity problem. In this paper, we propose superhighway construction, an alternative explicit relation-enrichment procedure, to improve recommendations by enhancing cross-domain connectivity. Specifically, assuming partially overlapped items (users), superhighway bypasses multi-hop inter-domain paths between cross-domain users (items, respectively) with direct paths to enrich the cross-domain connectivity. The experiments conducted on a real-world cross-region music dataset and a cross-platform movie dataset show that the proposed superhighway construction significantly improves recommendation performance in both target and source domains.

Using Taste Groups for Collaborative Filtering

Implicit feedback is the simplest form of user feedback that can be used for item recommendation. It is easy to collect and domain independent. However, there is a lack of negative examples. Existing works circumvent this problem by making various assumptions regarding the unconsumed items, which fail to hold when the user did not consume an item because she was unaware of it. In this paper, we propose as a novel method for addressing the lack of negative examples in implicit feedback. The motivation is that if there is a large group of users who share the same taste and none of them consumed an item, then it is highly likely that the item is irrelevant to this taste. We use Hierarchical Latent Tree Analysis(HLTA) to identify taste-based user groups and make recommendations for a user based on her memberships in the groups.

Semantic Matching Against a Corpus: New Applications and Methods

We consider the case of a domain expert who wishes to explore the extent to which a particular idea is expressed in a text collection. We propose the task of semantically matching the idea, expressed as a natural language proposition, against a corpus. We create two preliminary tasks derived from existing datasets, and then introduce a more realistic one on disaster recovery designed for emergency managers, whom we engaged in a user study. On the latter, we find that a new model built from natural language entailment data produces higher-quality matches than simple word-vector averaging, both on expert-crafted queries and on ones produced by the subjects themselves. This work provides a proof-of-concept for such applications of semantic matching and illustrates key challenges.

Tree-Based Bayesian Treatment Effect Analysis

The inclusion of the propensity score as a covariate in Bayesian regression trees for causal inference can reduce the bias in treatment effect estimations, which occurs due to the regularization-induced confounding phenomenon. This study advocate for the use of the propensity score by evaluating it under a full-Bayesian variable selection setting, and the use of Individual Conditional Expectation Plots, which is a graphical tool that can improve treatment effect analysis on tree-based Bayesian models and others ‘black box’ models. The first one, even if poorly estimated, can lead to bias reduction on the estimated treatment effects, while the latter can be used to found groups of individuals which have different responses to the applied treatment, and analyze the impact of each variable in the estimated treatment effect.

Layer Trajectory LSTM

It is popular to stack LSTM layers to get better modeling power, especially when large amount of training data is available. However, an LSTM-RNN with too many vanilla LSTM layers is very hard to train and there still exists the gradient vanishing issue if the network goes too deep. This issue can be partially solved by adding skip connections between layers, such as residual LSTM. In this paper, we propose a layer trajectory LSTM (ltLSTM) which builds a layer-LSTM using all the layer outputs from a standard multi-layer time-LSTM. This layer-LSTM scans the outputs from time-LSTMs, and uses the summarized layer trajectory information for final senone classification. The forward-propagation of time-LSTM and layer-LSTM can be handled in two separate threads in parallel so that the network computation time is the same as the standard time-LSTM. With a layer-LSTM running through layers, a gated path is provided from the output layer to the bottom layer, alleviating the gradient vanishing issue. Trained with 30 thousand hours of EN-US Microsoft internal data, the proposed ltLSTM performed significantly better than the standard multi-layer LSTM and residual LSTM, with up to 9.0% relative word error rate reduction across different tasks.

Hierarchical Quantized Representations for Script Generation

Scripts define knowledge about how everyday scenarios (such as going to a restaurant) are expected to unfold. One of the challenges to learning scripts is the hierarchical nature of the knowledge. For example, a suspect arrested might plead innocent or guilty, and a very different track of events is then expected to happen. To capture this type of information, we propose an autoencoder model with a latent space defined by a hierarchy of categorical variables. We utilize a recently proposed vector quantization based approach, which allows continuous embeddings to be associated with each latent variable value. This permits the decoder to softly decide what portions of the latent hierarchy to condition on by attending over the value embeddings for a given setting. Our model effectively encodes and generates scripts, outperforming a recent language modeling-based method on several standard tasks, and allowing the autoencoder model to achieve substantially lower perplexity scores compared to the previous language modeling-based method.

Towards Semi-Supervised Learning for Deep Semantic Role Labeling

Neural models have shown several state-of-the-art performances on Semantic Role Labeling (SRL). However, the neural models require an immense amount of semantic-role corpora and are thus not well suited for low-resource languages or domains. The paper proposes a semi-supervised semantic role labeling method that outperforms the state-of-the-art in limited SRL training corpora. The method is based on explicitly enforcing syntactic constraints by augmenting the training objective with a syntactic-inconsistency loss component and uses SRL-unlabeled instances to train a joint-objective LSTM. On CoNLL-2012 English section, the proposed semi-supervised training with 1%, 10% SRL-labeled data and varying amounts of SRL-unlabeled data achieves +1.58, +0.78 F1, respectively, over the pre-trained models that were trained on SOTA architecture with ELMo on the same SRL-labeled data. Additionally, by using the syntactic-inconsistency loss on inference time, the proposed model achieves +3.67, +2.1 F1 over pre-trained model on 1%, 10% SRL-labeled data, respectively.

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.

Accelerated proximal boosting

Gradient boosting is a prediction method that iteratively combines weak learners to produce a complex and accurate model. From an optimization point of view, the learning procedure of gradient boosting mimics a gradient descent on a functional variable. This paper proposes to build upon the proximal point algorithm when the empirical risk to minimize is not differentiable. In addition, the novel boosting approach, called accelerated proximal boosting, benefits from Nesterov’s acceleration in the same way as gradient boosting [Biau et al., 2018]. Advantages of leveraging proximal methods for boosting are illustrated by numerical experiments on simulated and real-world data. In particular, we exhibit a favorable comparison over gradient boosting regarding convergence rate and prediction accuracy.

What can we learn from Semantic Tagging?

We investigate the effects of multi-task learning using the recently introduced task of semantic tagging. We employ semantic tagging as an auxiliary task for three different NLP tasks: part-of-speech tagging, Universal Dependency parsing, and Natural Language Inference. We compare full neural network sharing, partial neural network sharing, and what we term the learning what to share setting where negative transfer between tasks is less likely. Our findings show considerable improvements for all tasks, particularly in the learning what to share setting, which shows consistent gains across all tasks.

Cross-Domain Collaborative Learning via Cluster Canonical Correlation Analysis and Random Walker for Hyperspectral Image Classification

This paper introduces a novel heterogenous domain adaptation (HDA) method for hyperspectral image classification with a limited amount of labeled samples in both domains. The method is achieved in the way of cross-domain collaborative learning (CDCL), which is addressed via cluster canonical correlation analysis (C-CCA) and random walker (RW) algorithms. To be specific, the proposed CDCL method is an iterative process of three main stages, i.e. twice of RW-based pseudolabeling and cross domain learning via C-CCA. Firstly, given the initially labeled target samples as training set (\mathbf{TS}), the RW-based pseudolabeling is employed to update \mathbf{TS} and extract target clusters (\mathbf{TCs}) by fusing the segmentation results obtained by RW and extended RW (ERW) classifiers. Secondly, cross domain learning via C-CCA is applied using labeled source samples and \mathbf{TCs}. The unlabeled target samples are then classified with the estimated probability maps using the model trained in the projected correlation subspace. Thirdly, both \mathbf{TS} and estimated probability maps are used for updating \mathbf{TS} again via RW-based pseudolabeling. When the iterative process finishes, the result obtained by the ERW classifier using the final \mathbf{TS} and estimated probability maps is regarded as the final classification map. Experimental results on four real HSIs demonstrate that the proposed method can achieve better performance compared with the state-of-the-art HDA and ERW methods.

Rule induction for global explanation of trained models

Understanding the behavior of a trained network and finding explanations for its outputs is important for improving the network’s performance and generalization ability, and for ensuring trust in automated systems. Several approaches have previously been proposed to identify and visualize the most important features by analyzing a trained network. However, the relations between different features and classes are lost in most cases. We propose a technique to induce sets of if-then-else rules that capture these relations to globally explain the predictions of a network. We first calculate the importance of the features in the trained network. We then weigh the original inputs with these feature importance scores, simplify the transformed input space, and finally fit a rule induction model to explain the model predictions. We find that the output rule-sets can explain the predictions of a neural network trained for 4-class text classification from the 20 newsgroups dataset to a macro-averaged F-score of 0.80. We make the code available at https://…/interpret_with_rules.

Notes on Deep Learning for NLP

My notes on Deep Learning for NLP.

Correlated Time Series Forecasting using Deep Neural Networks: A Summary of Results

Cyber-physical systems often consist of entities that interact with each other over time. Meanwhile, as part of the continued digitization of industrial processes, various sensor technologies are deployed that enable us to record time-varying attributes (a.k.a., time series) of such entities, thus producing correlated time series. To enable accurate forecasting on such correlated time series, this paper proposes two models that combine convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The first model employs a CNN on each individual time series, combines the convoluted features, and then applies an RNN on top of the convoluted features in the end to enable forecasting. The second model adds additional auto-encoders into the individual CNNs, making the second model a multi-task learning model, which provides accurate and robust forecasting. Experiments on two real-world correlated time series data set suggest that the proposed two models are effective and outperform baselines in most settings. This report extends the paper ‘Correlated Time Series Forecasting using Multi-Task Deep Neural Networks,’ to appear in ACM CIKM 2018, by providing additional experimental results.

MACNet: Multi-scale Atrous Convolution Networks for Food Places Classification in Egocentric Photo-streams

First-person (wearable) camera continually captures unscripted interactions of the camera user with objects, people, and scenes reflecting his personal and relational tendencies. One of the preferences of people is their interaction with food events. The regulation of food intake and its duration has a great importance to protect against diseases. Consequently, this work aims to develop a smart model that is able to determine the recurrences of a person on food places during a day. This model is based on a deep end-to-end model for automatic food places recognition by analyzing egocentric photo-streams. In this paper, we apply multi-scale Atrous convolution networks to extract the key features related to food places of the input images. The proposed model is evaluated on an in-house private dataset called ‘EgoFoodPlaces’. Experimental results shows promising results of food places classification recognition in egocentric photo-streams.

Searching Toward Pareto-Optimal Device-Aware Neural Architectures

Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding. However, most existing works only optimize for model accuracy and largely ignore other important factors imposed by the underlying hardware and devices, such as latency and energy, when making inference. In this paper, we first introduce the problem of NAS and provide a survey on recent works. Then we deep dive into two recent advancements on extending NAS into multiple-objective frameworks: MONAS and DPP-Net. Both MONAS and DPP-Net are capable of optimizing accuracy and other objectives imposed by devices, searching for neural architectures that can be best deployed on a wide spectrum of devices: from embedded systems and mobile devices to workstations. Experimental results are poised to show that architectures found by MONAS and DPP-Net achieves Pareto optimality w.r.t the given objectives for various devices.

Review Helpfulness Prediction with Embedding-Gated CNN

Product reviews, in the form of texts dominantly, significantly help consumers finalize their purchasing decisions. Thus, it is important for e-commerce companies to predict review helpfulness to present and recommend reviews in a more informative manner. In this work, we introduce a convolutional neural network model that is able to extract abstract features from multi-granularity representations. Inspired by the fact that different words contribute to the meaning of a sentence differently, we consider to learn word-level embedding-gates for all the representations. Furthermore, as it is common that some product domains/categories have rich user reviews, other domains not. To help domains with less sufficient data, we integrate our model into a cross-domain relationship learning framework for effectively transferring knowledge from other domains. Extensive experiments show that our model yields better performance than the existing methods.

Towards security defect prediction with AI

In this study, we investigate the limits of the current state of the art AI system for detecting buffer overflows and compare it with current static analysis tools. To do so, we developed a code generator, s-bAbI, capable of producing an arbitrarily large number of code samples of controlled complexity. We found that the static analysis engines we examined have good precision, but poor recall on this dataset, except for a sound static analyzer that has good precision and recall. We found that the state of the art AI system, a memory network modeled after Choi et al. [1], can achieve similar performance to the static analysis engines, but requires an exhaustive amount of training data in order to do so. Our work points towards future approaches that may solve these problems; namely, using representations of code that can capture appropriate scope information and using deep learning methods that are able to perform arithmetic operations.

Extreme Value Theory for Open Set Classification – GPD and GEV Classifiers

Classification tasks usually assume that all possible classes are present during the training phase. This is restrictive if the algorithm is used over a long time and possibly encounters samples from unknown classes. The recently introduced extreme value machine, a classifier motivated by extreme value theory, addresses this problem and achieves competitive performance in specific cases. We show that this algorithm can fail when the geometries of known and unknown classes differ. To overcome this problem, we propose two new algorithms relying on approximations from extreme value theory. We show the effectiveness of our classifiers in simulations and on the LETTER and MNIST data sets.

Question Answering by Reasoning Across Documents with Graph Convolutional Networks

Most research in reading comprehension has focused on answering questions based on individual documents or even single paragraphs. We introduce a method which integrates and reasons relying on information spread within documents and across multiple documents. We frame it as an inference problem on a graph. Mentions of entities are nodes of this graph where edges encode relations between different mentions (e.g., within- and cross-document co-references). Graph convolutional networks (GCNs) are applied to these graphs and trained to perform multi-step reasoning. Our Entity-GCN method is scalable and compact, and it achieves state-of-the-art results on the WikiHop dataset (Welbl et al. 2017).

Deep Reinforcement Learning in Portfolio Management

In this paper, we implement two state-of-art continuous reinforcement learning algorithms, Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) in portfolio management. Both of them are widely-used in game playing and robot control. What’s more, PPO has appealing theoretical propeties which is hopefully potential in portfolio management. We present the performances of them under different settings, including different learning rate, objective function, markets, feature combinations, in order to provide insights for parameter tuning, features selection and data preparation.

FPGA Implementation of Convolutional Neural Networks with Fixed-Point Calculations

Neural network-based methods for image processing are becoming widely used in practical applications. Modern neural networks are computationally expensive and require specialized hardware, such as graphics processing units. Since such hardware is not always available in real life applications, there is a compelling need for the design of neural networks for mobile devices. Mobile neural networks typically have reduced number of parameters and require a relatively small number of arithmetic operations. However, they usually still are executed at the software level and use floating-point calculations. The use of mobile networks without further optimization may not provide sufficient performance when high processing speed is required, for example, in real-time video processing (30 frames per second). In this study, we suggest optimizations to speed up computations in order to efficiently use already trained neural networks on a mobile device. Specifically, we propose an approach for speeding up neural networks by moving computation from software to hardware and by using fixed-point calculations instead of floating-point. We propose a number of methods for neural network architecture design to improve the performance with fixed-point calculations. We also show an example of how existing datasets can be modified and adapted for the recognition task in hand. Finally, we present the design and the implementation of a floating-point gate array-based device to solve the practical problem of real-time handwritten digit classification from mobile camera video feed.

Ab initio study of optical and bulk properties of cesium lead halide perovskite solid solutions
MARL-FWC: Optimal Coordination of Freeway Traffic Control Measures
Zero-shot Transfer Learning for Semantic Parsing
A Particle Filter based Multi-Objective Optimization Algorithm: PFOPS
A Quantum Many-body Wave Function Inspired Language Modeling Approach
Iterative Deep Learning for Road Topology Extraction
Estimating the distribution and thinning parameters of a homogeneous multimode Poisson process
Replication of Wiener-transformable stochastic processes with application to financial markets with memory
Edge Caching for Cache Intensity under Probabilistic Delay Constraint
An Efficient Matheuristic for the Minimum-Weight Dominating Set Problem
Contextual Audio-Visual Switching For Speech Enhancement in Real-World Environments
Graphene: A Context-Preserving Open Information Extraction System
The University of Cambridge’s Machine Translation Systems for WMT18
Learning To Split and Rephrase From Wikipedia Edit History
Cluster-Wise Cooperative Eco-Approach and Departure Application for Connected and Automated Vehicles along Signalized Arterials
Treewidth and gonality of glued grid graphs
Residualized Factor Adaptation for Community Social Media Prediction Tasks
Expected Number of Vertices of a Hypercube Slice
Convergence of Krasulina Scheme
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Scientific Question Answering
Symmetries of polytopes with fixed edge lengths
Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations
Concentrated Differentially Private Gradient Descent with Adaptive per-Iteration Privacy Budget
Spectrum-Adapted Polynomial Approximation for Matrix Functions
Extracting Epistatic Interactions in Type 2 Diabetes Genome-Wide Data Using Stacked Autoencoder
Riemann metric approach to optimal sampling of multidimensional free-energy landscapes
Bounds on the conditional and average treatment effect in the presence of unobserved confounders
Deep Lidar CNN to Understand the Dynamics of Moving Vehicles
Enumerating Top-k Quasi-Cliques
Strong cliques in vertex-transitive graphs
Lipschitz regularized Deep Neural Networks converge and generalize
Cost-efficient Data Acquisition on Online Data Marketplaces for Correlation Analysis
Approximately counting bases of bicircular matroids
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?
Variational Bayesian Approach and Gauss-Markov-Potts prior model
Power of Ensemble Diversity and Randomization for Energy Aggregation
Embedding Covert Information in Broadcast Communications
Temporal Saliency Adaptation in Egocentric Videos
On Learning 3D Face Morphable Model from In-the-wild Images
Impact of News Organizations’ Trustworthiness and Social Media Activity on Audience Engagement
Autonomous drone cinematographer: Using artistic principles to create smooth, safe, occlusion-free trajectories for aerial filming
Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation
Ensuring Privacy with Constrained Additive Noise by Minimizing Fisher Information
Privacy-preserving Decentralized Optimization via Decomposition
ARBEE: Towards Automated Recognition of Bodily Expression of Emotion In the Wild
Full Speed Ahead: 3D Spatial Database Acceleration with GPUs
Cycle-of-Learning for Autonomous Systems from Human Interaction
Probabilistic Sparse Subspace Clustering Using Delayed Association
Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation
Complexity and mission computability of adaptive computing systems
Mapping Language to Code in Programmatic Context
The eternal dominating set problem for interval graphs
On self-avoiding polygons and walks: the snake method via pattern fluctuation
The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions
Nonlinear regression based on a hybrid quantum computer
On the cover time of the emerging giant
Stein’s method and Narayana numbers
Decoding binary Reed-Muller codes via Groebner bases
Elastic bands across the path: A new framework and methods to lower bound DTW
Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment
Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders
Decoupling Strategy and Generation in Negotiation Dialogues
Replay attack spoofing detection system using replay noise by multi-task learning
Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes
On Tree-Based Neural Sentence Modeling
Diffusion Approximations for Online Principal Component Estimation and Global Convergence
From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks
Unified Receiver Design in Wireless Relay Networks Using Mixed-Integer Programming Techniques
The matching number of tree and bipartite degree sequences
Estimating dynamic mechanical quantities and their associated uncertainties: application guidance
Neural Metaphor Detection in Context
APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning
Electric Vehicle Charging Station Placement Method for Urban Areas
Foam evaluation and Kronheimer–Mrowka theories
Wasserstein is all you need
Analytic Moments for GARCH Processes
Recent progress on scaling algorithms and applications
On the minimum number of facets of a 2-neighborly polytope
Joint Doppler and Channel Estimation with Nested Arrays for Millimeter Wave Communications
Tail indices for AX+B recursion with triangular matrices
Image-based Survival Analysis for Lung Cancer Patients using CNNs
On principal frequencies and inradius in convex sets
Switching Cost Models as Hypothesis Tests
An Operation Sequence Model for Explainable Neural Machine Translation
Improved deviation inequalities for convex functions
Fractional Multiscale Fusion-based De-hazing
Asymmetry of copulas arising from shock models
Large Deviations for dynamical fluctuations of Open Markov processes, with application to random cascades on trees
Camera-based Image Forgery Localization using Convolutional Neural Networks
Characterizing the Influence of Features on Reading Difficulty Estimation for Non-native Readers
Stochastic Collocation with Non-Gaussian Correlated Process Variations: Theory, Algorithms and Applications
Identifying the sentiment styles of YouTube’s vloggers
Effect of pinning on the yielding transition of amorphous solids
The first exit time of fractional Brownian motion from a parabolic domain
Development and Evaluation of a Personalized Computer-aided Question Generation for English Learners to Improve Proficiency and Correct Mistakes
Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging
Bringing personalized learning into computer-aided question generation
On spike and slab empirical Bayes multiple testing
Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
A general weak and strong error analysis of the recursive quantization with an application to jump diffusions
Path-complete $p$-dominant switching linear systems
Consistency of estimators and variance estimators for two-stage sampling
Effect of fluctuation on mean-field density of state of amorphous solids
Total positivity of a class of combinatorial matrices
Tensor Alignment based Domain Adaptation for Hyperspectral Image Classification
Optimal Linear Broadcast Rates of the Two-Sender Unicast Index Coding Problem with Fully-Participated Interactions
Universal Scaling Limits for Generalized Gamma Polytopes
Interact as You Intend: Intention-Driven Human-Object Interaction Detection
PS-Sim: A Framework for Scalable Simulation of Participatory Sensing Data
Continuous-time Duality for Super-replication with Transient Price Impact
Approximate Exploration through State Abstraction
Estimation of income inequality from grouped data
Partitioning edge-coloured infinite complete bipartite graphs into monochromatic paths
Botnet Campaign Detection on Twitter
Analyzing the qualitative properties of white noise on a family of infectious disease models in a highly random environment
The Anti-Ramsey Problem for the Sidon equation
Modelling Langford’s Problem: A Viewpoint for Search
Generalizations of TASEP in discrete and continuous inhomogeneous space
Application of Machine Learning in Rock Facies Classification with Physics-Motivated Feature Augmentation
Signal to interference ratio percolation for Cox point processes
Strong disorder in nodal semimetals: Schwinger-Dyson–Ward approach
Neural Cross-Lingual Named Entity Recognition with Minimal Resources
Ramsey numbers of Berge-hypergraphs and related structures
Quasilinear rough partial differential equations with transport noise
PanoRoom: From the Sphere to the 3D Layout
Top-down Attention Recurrent VLAD Encoding for Action Recognition in Videos
Stochastic order on metric spaces and the ordered Kantorovich monad
Precoding for Dual Polarization Soliton Transmission
Dropout with Tabu Strategy for Regularizing Deep Neural Networks
Zero forcing and maximum nullity for hypergraphs
On exponential convergence rate of distribution for some non-regenerative reliability system
Autoencoders, Kernels, and Multilayer Perceptrons for Electron Micrograph Restoration and Compression
Limiting the Spread of Fake News on Social Media Platforms by Evaluating Users’ Trustworthiness
Kasteleyn cokernels and perfect matchings on planar bipartite graphs
A Neural Model of Adaptation in Reading
Level Planarity: Transitivity vs. Even Crossings
Certified Mapper: Repeated testing for acyclicity and obstructions to the nerve lemma
Maximum and minimum degree conditions for embedding trees
Attention-based Neural Text Segmentation
Neural Compositional Denotational Semantics for Question Answering
Revisiting Character-Based Neural Machine Translation with Capacity and Compression
Entropic repulsion for the Gaussian free field conditioned on disconnection by level-sets