Incidental Supervision from Question-Answering Signals

Human annotations are costly for many natural language processing (NLP) tasks, especially for those requiring NLP expertise. One promising solution is to use natural language to annotate natural language. However, it remains an open problem how to get supervision signals or learn representations from natural language annotations. This paper studies the case where the annotations are in the format of question-answering (QA) and proposes an effective way to learn useful representations for other tasks. We also find that the representation retrieved from question-answer meaning representation (QAMR) data can almost universally improve on a wide range of tasks, suggesting that such kind of natural language annotations indeed provide unique information on top of modern language models.

Interdependency between the Stock Market and Financial News

Stock prices are driven by various factors. In particular, many individual investors who have relatively little financial knowledge rely heavily on the information from news stories when making investment decisions in the stock market. However, these stories may not reflect future stock prices because of the subjectivity in the news; stock prices may instead affect the news contents. This study aims to discover whether it is news or stock prices that have a greater impact on the other. To achieve this, we analyze the relationship between news sentiment and stock prices based on time series analysis using five different classification models. Our experimental results show that stock prices have a bigger impact on the news contents than news does on stock prices.

Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

This paper makes one step forward towards characterizing a new family of \textit{model-free} Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function (V), alongside an approximation of the state-action value function (Q). Our analysis starts with a thorough study of the Deep Quality-Value Learning (DQV) algorithm, a DRL algorithm which has been shown to outperform popular techniques such as Deep-Q-Learning (DQN) and Double-Deep-Q-Learning (DDQN) \cite{sabatelli2018deep}. Intending to investigate why DQV’s learning dynamics allow this algorithm to perform so well, we formulate a set of research questions which help us characterize a new family of DRL algorithms. Among our results, we present some specific cases in which DQV’s performance can get harmed and introduce a novel \textit{off-policy} DRL algorithm, called DQV-Max, which can outperform DQV. We then study the behavior of the V and Q functions that are learned by DQV and DQV-Max and show that both algorithms might perform so well on several DRL test-beds because they are less prone to suffer from the overestimation bias of the Q function.

Employ Multimodal Machine Learning for Content quality analysis

The task of identifying high-quality content becomes increasingly important, and it can improve overall reading time and CTR(click-through rate estimates). Generalizes quality analysis only focused on single Modal,such as image or text,but in today’s mainstream media sites a lot of information is presented in graphic form.In this paper we propose a MultiModal quality recognition approach for the quality score. First we use two feature extractors,one for image and another for the text. After that we use an Siamese Network with the rank loss as the optimization objective.Compare with other approach,our approach get a more accuracy result.

Self-Attention with Structural Position Representations

Although self-attention networks (SANs) have advanced the state-of-the-art on various NLP tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al., 2018). In this work, we propose to augment SANs with structural position representations to model the latent structure of the input sentence, which is complementary to the standard sequential positional representations. Specifically, we use dependency tree to represent the grammatical structure of a sentence, and propose two strategies to encode the positional relationships among words in the dependency tree. Experimental results on NIST Chinese-to-English and WMT14 English-to-German translation tasks show that the proposed approach consistently boosts performance over both the absolute and relative sequential position representations.

DeepHealth: Deep Learning for Health Informatics

Machine learning and deep learning have provided us with an exploration of a whole new research era. As more data and better computational power become available, they have been implemented in various fields. The demand for artificial intelligence in the field of health informatics is also increasing and we can expect to see the potential benefits of artificial intelligence applications in healthcare. Deep learning can help clinicians diagnose disease, identify cancer sites, identify drug effects for each patient, understand the relationship between genotypes and phenotypes, explore new phenotypes, and predict infectious disease outbreaks with high accuracy. In contrast to traditional models, its approach does not require domain-specific data pre-process, and it is expected that it will ultimately change human life a lot in the future. Despite its notable advantages, there are some challenges on data (high dimensionality, heterogeneity, time dependency, sparsity, irregularity, lack of label) and model (reliability, interpretability, feasibility, security, scalability) for practical use. This article presents a comprehensive review of research applying deep learning in health informatics with a focus on the last five years in the fields of medical imaging, electronic health records, genomics, sensing, and online communication health, as well as challenges and promising directions for future research. We highlight ongoing popular approaches’ research and identify several challenges in building deep learning models.

You Shall Know a User by the Company It Keeps: Dynamic Representations for Social Media Users in NLP

Information about individuals can help to better understand what they say, particularly in social media where texts are short. Current approaches to modelling social media users pay attention to their social connections, but exploit this information in a static way, treating all connections uniformly. This ignores the fact, well known in sociolinguistics, that an individual may be part of several communities which are not equally relevant in all communicative situations. We present a model based on Graph Attention Networks that captures this observation. It dynamically explores the social graph of a user, computes a user representation given the most relevant connections for a target task, and combines it with linguistic information to make a prediction. We apply our model to three different tasks, evaluate it against alternative models, and analyse the results extensively, showing that it significantly outperforms other current methods.

Assortment Auctions: A Myersonian Characterization for Markov Chain based Choice Models

We introduce the assortment auction optimization problem, defined as follows. A seller has a set of substitute products with exogenously-given prices. Each buyer has a ranked list from which she would like to purchase at most one product. The buyers report their lists to the seller, who then allocates products to the buyers using a truthful mechanism, subject to constraints on how many products can be allocated. The seller collects revenues equal to the prices of the products allocated, and would like to design an auction to maximize total revenue, when the buyers’ lists are drawn independently from known distributions. If there is a single buyer, then our problem reduces to the assortment optimization problem, which is solved for Markov Chain choice models. We extend this result and compute the optimal assortment auction when each buyer’s list distribution arises from its own Markov chain. Moreover, we show that the optimal auction is structurally “Myersonian”, in that each buyer is assigned a virtual valuation based on her list and Markov chain, and then the mechanism maximizes virtual surplus. Since Markov Chain choice models capture valuation distributions, our optimal assortment auction generalizes the classical Myerson’s auction. Markov chains also capture the commonly used MNL choice model. We show that without the Markov chain assumption, the optimal assortment auction may be structurally non-Myersonian. Finally, we apply the concept of an assortment auction in online assortment problems. We show that any personalized assortment policy is a special case of a truthful assortment auction, and that moreover, the optimal auction provides a tighter relaxation for online policies than the commonly-used “deterministic LP”. Using this fact, we improve many online assortment policies, and derive the first approximation guarantees that strictly exceed 1-1/e.

Can A User Guess What Her Followers Want?

Whenever a social media user decides to share a story, she is typically pleased to receive likes, comments, shares, or, more generally, feedback from her followers. As a result, she may feel compelled to use the feedback she receives to (re-)estimate her followers’ preferences and decides which stories to share next to receive more (positive) feedback. Under which conditions can she succeed? In this work, we first investigate this problem from a theoretical perspective and then provide a set of practical algorithms to identify and characterize such behavior in social media. More specifically, we address the above problem from the perspective of sequential decision making and utility maximization. For a wide family of utility functions, we first show that, to succeed, a user needs to actively trade off exploitation– sharing stories which lead to more (positive) feedback–and exploration– sharing stories to learn about her followers’ preferences. However, exploration is not necessary if a user utilizes the feedback her followers provide to other users in addition to the feedback she receives. Then, we develop a utility estimation framework for observation data, which relies on statistical hypothesis testing to determine whether a user utilizes the feedback she receives from each of her followers to decide what to post next. Experiments on synthetic data illustrate our theoretical findings and show that our estimation framework is able to accurately recover users’ underlying utility functions. Experiments on several real datasets gathered from Twitter and Reddit reveal that up to 82% (43%) of the Twitter (Reddit) users in our datasets do use the feedback they receive to decide what to post next.

Topics to Avoid: Demoting Latent Confounds in Text Classification

Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well. In this work, we observe this limitation with respect to the task of native language identification. We find that standard text classifiers which perform well on the test set end up learning topical features which are confounds of the prediction task (e.g., if the input text mentions Sweden, the classifier predicts that the author’s native language is Swedish). We propose a method that represents the latent topical confounds and a model which ‘unlearns’ confounding features by predicting both the label of the input text and the confound; but we train the two predictors adversarially in an alternating fashion to learn a text representation that predicts the correct label but is less prone to using information about the confound. We show that this model generalizes better and learns features that are indicative of the writing style rather than the content.

Modeling and simulation of large-scale Systems: a systematic comparison of modeling paradigms

A trend across most areas where simulation-driven development is used is the ever increasing size and complexity of the systems under consideration, pushing established methods of modeling and simulation towards their limits. This paper complements existing surveys on large-scale modeling and simulation of physical systems by conducting expert surveys. We conducted a two-stage empirical survey in order to investigate research needs, current challenges as well as promising modeling and simulation paradigms. Furthermore, we applied the analytic hierarchy process method to prioritise the strengths and weakness of different modeling paradigms. The results of this study show that experts consider acausal modeling techniques to be suitable for modeling large scale systems, while causal techniques are considered less suitable.

How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

Replacing static word embeddings with contextualized word representations has yielded significant improvements on many NLP tasks. However, just how contextual are the contextualized representations produced by models such as ELMo and BERT? Are there infinitely many context-specific representations for each word, or are words essentially assigned one of a finite number of word-sense representations? For one, we find that the contextualized representations of all words are not isotropic in any layer of the contextualizing model. While representations of the same word in different contexts still have a greater cosine similarity than those of two different words, this self-similarity is much lower in upper layers. This suggests that upper layers of contextualizing models produce more context-specific representations, much like how upper layers of LSTMs produce more task-specific representations. In all layers of ELMo, BERT, and GPT-2, on average, less than 5% of the variance in a word’s contextualized representations can be explained by a static embedding for that word, providing some justification for the success of contextualized representations.

Bayesian Neural Tree Models for Nonparametric Regression

Frequentist and Bayesian methods differ in many aspects, but share some basic optimal properties. In real-life classification and regression problems, situations exist in which a model based on one of the methods is preferable based on some subjective criterion. Nonparametric classification and regression techniques, such as decision trees and neural networks, have frequentist (classification and regression trees (CART) and artificial neural networks) as well as Bayesian (Bayesian CART and Bayesian neural networks) approaches to learning from data. In this work, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. Both models exploit the architecture of decision trees and have lesser number of parameters to tune than advanced neural networks. Such models can simultaneously perform feature selection and prediction, are highly flexible, and generalize well in settings with a limited number of training observations. We study the consistency of the proposed models, and derive the optimal value of an important model parameter. We also provide illustrative examples using a wide variety of real-life regression data sets.

Edge Intelligence: the Confluence of Edge Computing and Artificial Intelligence

Along with the deepening development in communication technologies and the surge of mobile devices, a brand-new computation paradigm, Edge Computing, is surging in popularity. Meanwhile, Artificial Intelligence (AI) applications are thriving with the breakthroughs in deep learning and the upgrade of hardware architectures. Billions of bytes of data, generated at the network edge, put great demands on data processing and structural optimization. Therefore, there exists a strong demand to integrate Edge Computing and AI, which gives birth to Edge Intelligence. In this article, we divide Edge Intelligence into AI for edge (Intelligence-enabled Edge Computing) and AI on edge (Artificial Intelligence on Edge). The former focuses on providing a more optimal solution to the key concerns in Edge Computing with the help of popular and effective AI technologies while the latter studies how to carry out the entire process of building AI models, i.e., model training and inference, on edge. This article focuses on giving insights into this new inter-disciplinary field from a broader vision and perspective. It discusses the core concepts and the research road-map, which should provide the necessary background for potential future research programs in Edge Intelligence.

Recurrent Neural Networks for Time Series Forecasting: Current Status and Future Directions

Recurrent Neural Networks (RNN) have become competitive forecasting methods, as most notably shown in the winning method of the recent M4 competition. However, established statistical models such as ETS and ARIMA gain their popularity not only from their high accuracy, but they are also suitable for non-expert users as they are robust, efficient, and automatic. In these areas, RNNs have still a long way to go. We present an extensive empirical study and an open-source software framework of existing RNN architectures for forecasting, that allow us to develop guidelines and best practices for their use. For example, we conclude that RNNs are capable of modelling seasonality directly if the series in the dataset possess homogeneous seasonal patterns, otherwise we recommend a deseasonalization step. Comparisons against ETS and ARIMA demonstrate that the implemented (semi-)automatic RNN models are no silver bullets, but they are competitive alternatives in many situations.

DeepDB: Learn from Data, not from Queries!

The typical approach for learned DBMS components is to capture the behavior by running a representative set of queries and use the observations to train a machine learning model. This workload-driven approach, however, has two major downsides. First, collecting the training data can be very expensive, since all queries need to be executed on potentially large databases. Second, training data has to be recollected when the workload and the data changes. To overcome these limitations, we take a different route: we propose to learn a pure data-driven model that can be used for different tasks such as query answering or cardinality estimation. This data-driven model also supports ad-hoc queries and updates of the data without the need of full retraining when the workload or data changes. Indeed, one may now expect that this comes at a price of lower accuracy since workload-driven models can make use of more information. However, this is not the case. The results of our empirical evaluation demonstrate that our data-driven approach not only provides better accuracy than state-of-the-art learned components but also generalizes better to unseen queries.

Enriching Medcial Terminology Knowledge Bases via Pre-trained Language Model and Graph Convolutional Network

Enriching existing medical terminology knowledge bases (KBs) is an important and never-ending work for clinical research because new terminology alias may be continually added and standard terminologies may be newly renamed. In this paper, we propose a novel automatic terminology enriching approach to supplement a set of terminologies to KBs. Specifically, terminology and entity characters are first fed into pre-trained language model to obtain semantic embedding. The pre-trained model is used again to initialize the terminology and entity representations, then they are further embedded through graph convolutional network to gain structure embedding. Afterwards, both semantic and structure embeddings are combined to measure the relevancy between the terminology and the entity. Finally, the optimal alignment is achieved based on the order of relevancy between the terminology and all the entities in the KB. Experimental results on clinical indicator terminology KB, collected from 38 top-class hospitals of Shanghai Hospital Development Center, show that our proposed approach outperforms baseline methods and can effectively enrich the KB.