• Ab initio study of optical and bulk properties of cesium lead halide perovskite solid solutions
• MARL-FWC: Optimal Coordination of Freeway Traffic Control Measures
• Zero-shot Transfer Learning for Semantic Parsing
• A Particle Filter based Multi-Objective Optimization Algorithm: PFOPS
• A Quantum Many-body Wave Function Inspired Language Modeling Approach
• Iterative Deep Learning for Road Topology Extraction
• Estimating the distribution and thinning parameters of a homogeneous multimode Poisson process
• Replication of Wiener-transformable stochastic processes with application to financial markets with memory
• Edge Caching for Cache Intensity under Probabilistic Delay Constraint
• An Efficient Matheuristic for the Minimum-Weight Dominating Set Problem
• Contextual Audio-Visual Switching For Speech Enhancement in Real-World Environments
• Graphene: A Context-Preserving Open Information Extraction System
• The University of Cambridge’s Machine Translation Systems for WMT18
• Learning To Split and Rephrase From Wikipedia Edit History
• Cluster-Wise Cooperative Eco-Approach and Departure Application for Connected and Automated Vehicles along Signalized Arterials
• Treewidth and gonality of glued grid graphs
• Residualized Factor Adaptation for Community Social Media Prediction Tasks
• Expected Number of Vertices of a Hypercube Slice
• Convergence of Krasulina Scheme
• Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Scientific Question Answering
• Symmetries of polytopes with fixed edge lengths
• Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations
• Concentrated Differentially Private Gradient Descent with Adaptive per-Iteration Privacy Budget
• Spectrum-Adapted Polynomial Approximation for Matrix Functions
• Extracting Epistatic Interactions in Type 2 Diabetes Genome-Wide Data Using Stacked Autoencoder
• Riemann metric approach to optimal sampling of multidimensional free-energy landscapes
• Bounds on the conditional and average treatment effect in the presence of unobserved confounders
• Deep Lidar CNN to Understand the Dynamics of Moving Vehicles
• Enumerating Top-k Quasi-Cliques
• Strong cliques in vertex-transitive graphs
• Lipschitz regularized Deep Neural Networks converge and generalize
• Cost-efficient Data Acquisition on Online Data Marketplaces for Correlation Analysis
• Approximately counting bases of bicircular matroids
• Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?
• Variational Bayesian Approach and Gauss-Markov-Potts prior model
• Power of Ensemble Diversity and Randomization for Energy Aggregation
• Embedding Covert Information in Broadcast Communications
• Temporal Saliency Adaptation in Egocentric Videos
• On Learning 3D Face Morphable Model from In-the-wild Images
• Impact of News Organizations’ Trustworthiness and Social Media Activity on Audience Engagement
• Autonomous drone cinematographer: Using artistic principles to create smooth, safe, occlusion-free trajectories for aerial filming
• Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation
• Ensuring Privacy with Constrained Additive Noise by Minimizing Fisher Information
• Privacy-preserving Decentralized Optimization via Decomposition
• ARBEE: Towards Automated Recognition of Bodily Expression of Emotion In the Wild
• Full Speed Ahead: 3D Spatial Database Acceleration with GPUs
• Cycle-of-Learning for Autonomous Systems from Human Interaction
• Probabilistic Sparse Subspace Clustering Using Delayed Association
• Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation
• Complexity and mission computability of adaptive computing systems
• Mapping Language to Code in Programmatic Context
• The eternal dominating set problem for interval graphs
• On self-avoiding polygons and walks: the snake method via pattern fluctuation
• The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions
• Nonlinear regression based on a hybrid quantum computer
• On the cover time of the emerging giant
• Stein’s method and Narayana numbers
• Decoding binary Reed-Muller codes via Groebner bases
• Elastic bands across the path: A new framework and methods to lower bound DTW
• Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment
• Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders
• Decoupling Strategy and Generation in Negotiation Dialogues
• Replay attack spoofing detection system using replay noise by multi-task learning
• Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes
• On Tree-Based Neural Sentence Modeling
• Diffusion Approximations for Online Principal Component Estimation and Global Convergence
• From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks
• Unified Receiver Design in Wireless Relay Networks Using Mixed-Integer Programming Techniques
• The matching number of tree and bipartite degree sequences
• Estimating dynamic mechanical quantities and their associated uncertainties: application guidance
• Neural Metaphor Detection in Context
• APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning
• Electric Vehicle Charging Station Placement Method for Urban Areas
• Foam evaluation and Kronheimer–Mrowka theories
• Wasserstein is all you need
• Analytic Moments for GARCH Processes
• Recent progress on scaling algorithms and applications
• On the minimum number of facets of a 2-neighborly polytope
• Joint Doppler and Channel Estimation with Nested Arrays for Millimeter Wave Communications
• Tail indices for AX+B recursion with triangular matrices
• Image-based Survival Analysis for Lung Cancer Patients using CNNs
• On principal frequencies and inradius in convex sets
• Switching Cost Models as Hypothesis Tests
• An Operation Sequence Model for Explainable Neural Machine Translation
• Improved deviation inequalities for convex functions
• Fractional Multiscale Fusion-based De-hazing
• Asymmetry of copulas arising from shock models
• Large Deviations for dynamical fluctuations of Open Markov processes, with application to random cascades on trees
• Camera-based Image Forgery Localization using Convolutional Neural Networks
• Characterizing the Influence of Features on Reading Difficulty Estimation for Non-native Readers
• Stochastic Collocation with Non-Gaussian Correlated Process Variations: Theory, Algorithms and Applications
• Identifying the sentiment styles of YouTube’s vloggers
• Effect of pinning on the yielding transition of amorphous solids
• The first exit time of fractional Brownian motion from a parabolic domain
• Development and Evaluation of a Personalized Computer-aided Question Generation for English Learners to Improve Proficiency and Correct Mistakes
• Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging
• Bringing personalized learning into computer-aided question generation
• On spike and slab empirical Bayes multiple testing
• Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
• A general weak and strong error analysis of the recursive quantization with an application to jump diffusions
• Path-complete $p$-dominant switching linear systems
• Consistency of estimators and variance estimators for two-stage sampling
• Effect of fluctuation on mean-field density of state of amorphous solids
• Total positivity of a class of combinatorial matrices
• Tensor Alignment based Domain Adaptation for Hyperspectral Image Classification
• Optimal Linear Broadcast Rates of the Two-Sender Unicast Index Coding Problem with Fully-Participated Interactions
• Universal Scaling Limits for Generalized Gamma Polytopes
• Interact as You Intend: Intention-Driven Human-Object Interaction Detection
• PS-Sim: A Framework for Scalable Simulation of Participatory Sensing Data
• Continuous-time Duality for Super-replication with Transient Price Impact
• Approximate Exploration through State Abstraction
• Estimation of income inequality from grouped data
• Partitioning edge-coloured infinite complete bipartite graphs into monochromatic paths
• Botnet Campaign Detection on Twitter
• Analyzing the qualitative properties of white noise on a family of infectious disease models in a highly random environment
• The Anti-Ramsey Problem for the Sidon equation
• Modelling Langford’s Problem: A Viewpoint for Search
• Generalizations of TASEP in discrete and continuous inhomogeneous space
• Application of Machine Learning in Rock Facies Classification with Physics-Motivated Feature Augmentation
• Signal to interference ratio percolation for Cox point processes
• Strong disorder in nodal semimetals: Schwinger-Dyson–Ward approach
• Neural Cross-Lingual Named Entity Recognition with Minimal Resources
• Ramsey numbers of Berge-hypergraphs and related structures
• Quasilinear rough partial differential equations with transport noise
• PanoRoom: From the Sphere to the 3D Layout
• Top-down Attention Recurrent VLAD Encoding for Action Recognition in Videos
• Stochastic order on metric spaces and the ordered Kantorovich monad
• Precoding for Dual Polarization Soliton Transmission
• Dropout with Tabu Strategy for Regularizing Deep Neural Networks
• Zero forcing and maximum nullity for hypergraphs
• On exponential convergence rate of distribution for some non-regenerative reliability system
• Autoencoders, Kernels, and Multilayer Perceptrons for Electron Micrograph Restoration and Compression
• Limiting the Spread of Fake News on Social Media Platforms by Evaluating Users’ Trustworthiness
• Kasteleyn cokernels and perfect matchings on planar bipartite graphs
• A Neural Model of Adaptation in Reading
• Level Planarity: Transitivity vs. Even Crossings
• Certified Mapper: Repeated testing for acyclicity and obstructions to the nerve lemma
• Maximum and minimum degree conditions for embedding trees
• Attention-based Neural Text Segmentation
• Neural Compositional Denotational Semantics for Question Answering
• Revisiting Character-Based Neural Machine Translation with Capacity and Compression
• Entropic repulsion for the Gaussian free field conditioned on disconnection by level-sets
Sequential dynamics are a key feature of many modern recommender systems, which seek to capture the `context’ of users’ activities on the basis of actions they have performed recently. To capture such patterns, two approaches have proliferated: Markov Chains (MCs) and Recurrent Neural Networks (RNNs). Markov Chains assume that a user’s next action can be predicted on the basis of just their last (or last few) actions, while RNNs in principle allow for longer-term semantics to be uncovered. Generally speaking, MC-based methods perform best in extremely sparse datasets, where model parsimony is critical, while RNNs perform better in denser datasets where higher model complexity is affordable. The goal of our work is to balance these two goals, by proposing a self-attention based sequential model (SASRec) that allows us to capture long-term semantics (like an RNN), but, using an attention mechanism, makes its predictions based on relatively few actions (like an MC). At each time step, SASRec seeks to identify which items are `relevant’ from a user’s action history, and use them to predict the next item. Extensive empirical studies show that our method outperforms various state-of-the-art sequential models (including MC/CNN/RNN-based approaches) on both sparse and dense datasets. Moreover, the model is an order of magnitude more efficient than comparable CNN/RNN-based models. Visualizations on attention weights also show how our model adaptively handles datasets with various density, and uncovers meaningful patterns in activity sequences.
How can we enable users to heavily specify criteria for database queries in a user-friendly way? This paper describes a general framework of a conversational bot that extracts meaningful information from user’s sentences, that asks subsequent questions to complete missing information, and that adjusts its questions and information-extraction parameters for later conversations depending on users’ behavior. Additionally, we provide a comparison of existing tools and give novel techniques to implement such framework. Finally, we exemplify the framework with a bot to query movies in a database, whose code is available for Microsoft employees.
We propose KDSL, a new word sense disambiguation (WSD) framework that utilizes knowledge to automatically generate sense-labeled data for supervised learning. First, from WordNet, we automatically construct a semantic knowledge base called DisDict, which provides refined feature words that highlight the differences among word senses, i.e., synsets. Second, we automatically generate new sense-labeled data by DisDict from unlabeled corpora. Third, these generated data, together with manually labeled data, are fed to a supervised learning neural network to model the semantic relations among synsets, feature words and their contexts. Jointly with the supervised learning process, we also implement unsupervised learning on unlabeled data as an auxiliary task. The experimental results show that KDSL outperforms several representative state-of-the-art methods on various major benchmarks. Interestingly, it performs relatively well even when manually labeled data is unavailable, thus provides a new promising backoff strategy for WSD.
Cross-domain collaborative filtering (CF) aims to alleviate data sparsity in single-domain CF by leveraging knowledge transferred from related domains. Many traditional methods focus on enriching compared neighborhood relations in CF directly to address the sparsity problem. In this paper, we propose superhighway construction, an alternative explicit relation-enrichment procedure, to improve recommendations by enhancing cross-domain connectivity. Specifically, assuming partially overlapped items (users), superhighway bypasses multi-hop inter-domain paths between cross-domain users (items, respectively) with direct paths to enrich the cross-domain connectivity. The experiments conducted on a real-world cross-region music dataset and a cross-platform movie dataset show that the proposed superhighway construction significantly improves recommendation performance in both target and source domains.
Implicit feedback is the simplest form of user feedback that can be used for item recommendation. It is easy to collect and domain independent. However, there is a lack of negative examples. Existing works circumvent this problem by making various assumptions regarding the unconsumed items, which fail to hold when the user did not consume an item because she was unaware of it. In this paper, we propose as a novel method for addressing the lack of negative examples in implicit feedback. The motivation is that if there is a large group of users who share the same taste and none of them consumed an item, then it is highly likely that the item is irrelevant to this taste. We use Hierarchical Latent Tree Analysis(HLTA) to identify taste-based user groups and make recommendations for a user based on her memberships in the groups.
We consider the case of a domain expert who wishes to explore the extent to which a particular idea is expressed in a text collection. We propose the task of semantically matching the idea, expressed as a natural language proposition, against a corpus. We create two preliminary tasks derived from existing datasets, and then introduce a more realistic one on disaster recovery designed for emergency managers, whom we engaged in a user study. On the latter, we find that a new model built from natural language entailment data produces higher-quality matches than simple word-vector averaging, both on expert-crafted queries and on ones produced by the subjects themselves. This work provides a proof-of-concept for such applications of semantic matching and illustrates key challenges.
The inclusion of the propensity score as a covariate in Bayesian regression trees for causal inference can reduce the bias in treatment effect estimations, which occurs due to the regularization-induced confounding phenomenon. This study advocate for the use of the propensity score by evaluating it under a full-Bayesian variable selection setting, and the use of Individual Conditional Expectation Plots, which is a graphical tool that can improve treatment effect analysis on tree-based Bayesian models and others ‘black box’ models. The first one, even if poorly estimated, can lead to bias reduction on the estimated treatment effects, while the latter can be used to found groups of individuals which have different responses to the applied treatment, and analyze the impact of each variable in the estimated treatment effect.
It is popular to stack LSTM layers to get better modeling power, especially when large amount of training data is available. However, an LSTM-RNN with too many vanilla LSTM layers is very hard to train and there still exists the gradient vanishing issue if the network goes too deep. This issue can be partially solved by adding skip connections between layers, such as residual LSTM. In this paper, we propose a layer trajectory LSTM (ltLSTM) which builds a layer-LSTM using all the layer outputs from a standard multi-layer time-LSTM. This layer-LSTM scans the outputs from time-LSTMs, and uses the summarized layer trajectory information for final senone classification. The forward-propagation of time-LSTM and layer-LSTM can be handled in two separate threads in parallel so that the network computation time is the same as the standard time-LSTM. With a layer-LSTM running through layers, a gated path is provided from the output layer to the bottom layer, alleviating the gradient vanishing issue. Trained with 30 thousand hours of EN-US Microsoft internal data, the proposed ltLSTM performed significantly better than the standard multi-layer LSTM and residual LSTM, with up to 9.0% relative word error rate reduction across different tasks.
Scripts define knowledge about how everyday scenarios (such as going to a restaurant) are expected to unfold. One of the challenges to learning scripts is the hierarchical nature of the knowledge. For example, a suspect arrested might plead innocent or guilty, and a very different track of events is then expected to happen. To capture this type of information, we propose an autoencoder model with a latent space defined by a hierarchy of categorical variables. We utilize a recently proposed vector quantization based approach, which allows continuous embeddings to be associated with each latent variable value. This permits the decoder to softly decide what portions of the latent hierarchy to condition on by attending over the value embeddings for a given setting. Our model effectively encodes and generates scripts, outperforming a recent language modeling-based method on several standard tasks, and allowing the autoencoder model to achieve substantially lower perplexity scores compared to the previous language modeling-based method.
Neural models have shown several state-of-the-art performances on Semantic Role Labeling (SRL). However, the neural models require an immense amount of semantic-role corpora and are thus not well suited for low-resource languages or domains. The paper proposes a semi-supervised semantic role labeling method that outperforms the state-of-the-art in limited SRL training corpora. The method is based on explicitly enforcing syntactic constraints by augmenting the training objective with a syntactic-inconsistency loss component and uses SRL-unlabeled instances to train a joint-objective LSTM. On CoNLL-2012 English section, the proposed semi-supervised training with 1%, 10% SRL-labeled data and varying amounts of SRL-unlabeled data achieves +1.58, +0.78 F1, respectively, over the pre-trained models that were trained on SOTA architecture with ELMo on the same SRL-labeled data. Additionally, by using the syntactic-inconsistency loss on inference time, the proposed model achieves +3.67, +2.1 F1 over pre-trained model on 1%, 10% SRL-labeled data, respectively.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.
Gradient boosting is a prediction method that iteratively combines weak learners to produce a complex and accurate model. From an optimization point of view, the learning procedure of gradient boosting mimics a gradient descent on a functional variable. This paper proposes to build upon the proximal point algorithm when the empirical risk to minimize is not differentiable. In addition, the novel boosting approach, called accelerated proximal boosting, benefits from Nesterov’s acceleration in the same way as gradient boosting [Biau et al., 2018]. Advantages of leveraging proximal methods for boosting are illustrated by numerical experiments on simulated and real-world data. In particular, we exhibit a favorable comparison over gradient boosting regarding convergence rate and prediction accuracy.
We investigate the effects of multi-task learning using the recently introduced task of semantic tagging. We employ semantic tagging as an auxiliary task for three different NLP tasks: part-of-speech tagging, Universal Dependency parsing, and Natural Language Inference. We compare full neural network sharing, partial neural network sharing, and what we term the learning what to share setting where negative transfer between tasks is less likely. Our findings show considerable improvements for all tasks, particularly in the learning what to share setting, which shows consistent gains across all tasks.
This paper introduces a novel heterogenous domain adaptation (HDA) method for hyperspectral image classification with a limited amount of labeled samples in both domains. The method is achieved in the way of cross-domain collaborative learning (CDCL), which is addressed via cluster canonical correlation analysis (C-CCA) and random walker (RW) algorithms. To be specific, the proposed CDCL method is an iterative process of three main stages, i.e. twice of RW-based pseudolabeling and cross domain learning via C-CCA. Firstly, given the initially labeled target samples as training set (), the RW-based pseudolabeling is employed to update and extract target clusters () by fusing the segmentation results obtained by RW and extended RW (ERW) classifiers. Secondly, cross domain learning via C-CCA is applied using labeled source samples and . The unlabeled target samples are then classified with the estimated probability maps using the model trained in the projected correlation subspace. Thirdly, both and estimated probability maps are used for updating again via RW-based pseudolabeling. When the iterative process finishes, the result obtained by the ERW classifier using the final and estimated probability maps is regarded as the final classification map. Experimental results on four real HSIs demonstrate that the proposed method can achieve better performance compared with the state-of-the-art HDA and ERW methods.
Understanding the behavior of a trained network and finding explanations for its outputs is important for improving the network’s performance and generalization ability, and for ensuring trust in automated systems. Several approaches have previously been proposed to identify and visualize the most important features by analyzing a trained network. However, the relations between different features and classes are lost in most cases. We propose a technique to induce sets of if-then-else rules that capture these relations to globally explain the predictions of a network. We first calculate the importance of the features in the trained network. We then weigh the original inputs with these feature importance scores, simplify the transformed input space, and finally fit a rule induction model to explain the model predictions. We find that the output rule-sets can explain the predictions of a neural network trained for 4-class text classification from the 20 newsgroups dataset to a macro-averaged F-score of 0.80. We make the code available at https://…/interpret_with_rules.
My notes on Deep Learning for NLP.
Cyber-physical systems often consist of entities that interact with each other over time. Meanwhile, as part of the continued digitization of industrial processes, various sensor technologies are deployed that enable us to record time-varying attributes (a.k.a., time series) of such entities, thus producing correlated time series. To enable accurate forecasting on such correlated time series, this paper proposes two models that combine convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The first model employs a CNN on each individual time series, combines the convoluted features, and then applies an RNN on top of the convoluted features in the end to enable forecasting. The second model adds additional auto-encoders into the individual CNNs, making the second model a multi-task learning model, which provides accurate and robust forecasting. Experiments on two real-world correlated time series data set suggest that the proposed two models are effective and outperform baselines in most settings. This report extends the paper ‘Correlated Time Series Forecasting using Multi-Task Deep Neural Networks,’ to appear in ACM CIKM 2018, by providing additional experimental results.
First-person (wearable) camera continually captures unscripted interactions of the camera user with objects, people, and scenes reflecting his personal and relational tendencies. One of the preferences of people is their interaction with food events. The regulation of food intake and its duration has a great importance to protect against diseases. Consequently, this work aims to develop a smart model that is able to determine the recurrences of a person on food places during a day. This model is based on a deep end-to-end model for automatic food places recognition by analyzing egocentric photo-streams. In this paper, we apply multi-scale Atrous convolution networks to extract the key features related to food places of the input images. The proposed model is evaluated on an in-house private dataset called ‘EgoFoodPlaces’. Experimental results shows promising results of food places classification recognition in egocentric photo-streams.
Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding. However, most existing works only optimize for model accuracy and largely ignore other important factors imposed by the underlying hardware and devices, such as latency and energy, when making inference. In this paper, we first introduce the problem of NAS and provide a survey on recent works. Then we deep dive into two recent advancements on extending NAS into multiple-objective frameworks: MONAS and DPP-Net. Both MONAS and DPP-Net are capable of optimizing accuracy and other objectives imposed by devices, searching for neural architectures that can be best deployed on a wide spectrum of devices: from embedded systems and mobile devices to workstations. Experimental results are poised to show that architectures found by MONAS and DPP-Net achieves Pareto optimality w.r.t the given objectives for various devices.
Product reviews, in the form of texts dominantly, significantly help consumers finalize their purchasing decisions. Thus, it is important for e-commerce companies to predict review helpfulness to present and recommend reviews in a more informative manner. In this work, we introduce a convolutional neural network model that is able to extract abstract features from multi-granularity representations. Inspired by the fact that different words contribute to the meaning of a sentence differently, we consider to learn word-level embedding-gates for all the representations. Furthermore, as it is common that some product domains/categories have rich user reviews, other domains not. To help domains with less sufficient data, we integrate our model into a cross-domain relationship learning framework for effectively transferring knowledge from other domains. Extensive experiments show that our model yields better performance than the existing methods.
In this study, we investigate the limits of the current state of the art AI system for detecting buffer overflows and compare it with current static analysis tools. To do so, we developed a code generator, s-bAbI, capable of producing an arbitrarily large number of code samples of controlled complexity. We found that the static analysis engines we examined have good precision, but poor recall on this dataset, except for a sound static analyzer that has good precision and recall. We found that the state of the art AI system, a memory network modeled after Choi et al. , can achieve similar performance to the static analysis engines, but requires an exhaustive amount of training data in order to do so. Our work points towards future approaches that may solve these problems; namely, using representations of code that can capture appropriate scope information and using deep learning methods that are able to perform arithmetic operations.
Classification tasks usually assume that all possible classes are present during the training phase. This is restrictive if the algorithm is used over a long time and possibly encounters samples from unknown classes. The recently introduced extreme value machine, a classifier motivated by extreme value theory, addresses this problem and achieves competitive performance in specific cases. We show that this algorithm can fail when the geometries of known and unknown classes differ. To overcome this problem, we propose two new algorithms relying on approximations from extreme value theory. We show the effectiveness of our classifiers in simulations and on the LETTER and MNIST data sets.
Most research in reading comprehension has focused on answering questions based on individual documents or even single paragraphs. We introduce a method which integrates and reasons relying on information spread within documents and across multiple documents. We frame it as an inference problem on a graph. Mentions of entities are nodes of this graph where edges encode relations between different mentions (e.g., within- and cross-document co-references). Graph convolutional networks (GCNs) are applied to these graphs and trained to perform multi-step reasoning. Our Entity-GCN method is scalable and compact, and it achieves state-of-the-art results on the WikiHop dataset (Welbl et al. 2017).
In this paper, we implement two state-of-art continuous reinforcement learning algorithms, Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) in portfolio management. Both of them are widely-used in game playing and robot control. What’s more, PPO has appealing theoretical propeties which is hopefully potential in portfolio management. We present the performances of them under different settings, including different learning rate, objective function, markets, feature combinations, in order to provide insights for parameter tuning, features selection and data preparation.
Neural network-based methods for image processing are becoming widely used in practical applications. Modern neural networks are computationally expensive and require specialized hardware, such as graphics processing units. Since such hardware is not always available in real life applications, there is a compelling need for the design of neural networks for mobile devices. Mobile neural networks typically have reduced number of parameters and require a relatively small number of arithmetic operations. However, they usually still are executed at the software level and use floating-point calculations. The use of mobile networks without further optimization may not provide sufficient performance when high processing speed is required, for example, in real-time video processing (30 frames per second). In this study, we suggest optimizations to speed up computations in order to efficiently use already trained neural networks on a mobile device. Specifically, we propose an approach for speeding up neural networks by moving computation from software to hardware and by using fixed-point calculations instead of floating-point. We propose a number of methods for neural network architecture design to improve the performance with fixed-point calculations. We also show an example of how existing datasets can be modified and adapted for the recognition task in hand. Finally, we present the design and the implementation of a floating-point gate array-based device to solve the practical problem of real-time handwritten digit classification from mobile camera video feed.