With the current upsurge in the usage of social media platforms, the trend of using short text (microtext) in place of standard words has seen a significant rise. The usage of microtext poses a considerable performance issue in concept-level sentiment analysis, since models are trained on standard words. This paper discusses the impact of coupling sub-symbolic (phonetics) with symbolic (machine learning) Artificial Intelligence to transform the out-of-vocabulary concepts into their standard in-vocabulary form. The phonetic distance is calculated using the Sorensen similarity algorithm. The phonetically similar invocabulary concepts thus obtained are then used to compute the correct polarity value, which was previously being miscalculated because of the presence of microtext. Our proposed framework increases the accuracy of polarity detection by 6% as compared to the earlier model. This also validates the fact that microtext normalization is a necessary pre-requisite for the sentiment analysis task.
With the rapid development of knowledge bases(KBs),question answering(QA)based on KBs has become a hot research issue. In this paper,we propose two frameworks(i.e.,pipeline framework,an end-to-end framework)to focus answering single-relation factoid question. In both of two frameworks,we study the effect of context information on the quality of QA,such as the entity’s notable type,out-degree. In the end-to-end framework,we combine char-level encoding and self-attention mechanisms,using weight sharing and multi-task strategies to enhance the accuracy of QA. Experimental results show that context information can get better results of simple QA whether it is the pipeline framework or the end-to-end framework. In addition,we find that the end-to-end framework achieves results competitive with state-of-the-art approaches in terms of accuracy and take much shorter time than them.
In the field of sequential recommendation, deep learning methods have received a lot of attention in the past few years and surpassed traditional models such as Markov chain-based and factorization-based ones. However, DL-based methods also have some critical drawbacks, such as insufficient modeling of user representation and ignoring to distinguish the different types of interactions (i.e., user behavior) among users and items. In this view, this survey focuses on DL-based sequential recommender systems by taking the aforementioned issues into consideration. Specifically, we illustrate the concept of sequential recommendation, propose a categorization of existing algorithms in terms of three types of behavioral sequence, summarize the key factors affecting the performance of DL-based models, and conduct corresponding evaluations to demonstrate the effects of these factors. We conclude this survey by systematically outlining future directions and challenges in this field.
Lifelong machine learning is a novel machine learning paradigm which continually learns tasks and accumulates knowledge for reuse. The knowledge extracting and reusing abilities enable lifelong machine learning to understand the knowledge for solving a task and obtain the ability to solve the related problems. In sentiment classification, traditional approaches like Naive Bayes focus on the probability for each word with positive or negative sentiment. However, the lifelong machine learning in this paper will investigate this problem in a different angle and attempt to discover which words determine the sentiment of a review. We will pay all attention to obtain knowledge during learning for future learning rather than just solve a current task.
Giving or recommending appropriate content based on the quality of experience is the most important and challenging issue in recommender systems. As collaborative filtering (CF) is one of the most prominent and popular techniques used for recommender systems, we propose a new clustering-based CF (CBCF) method using an incentivized/penalized user (IPU) model only with ratings given by users, which is thus easy to implement. We aim to design such a simple clustering-based approach with no further prior information while improving the recommendation accuracy. To be precise, the purpose of CBCF with the IPU model is to improve recommendation performance such as precision, recall, and $F_1$ score by carefully exploiting different preferences among users. Specifically, we formulate a constrained optimization problem, in which we aim to maximize the recall (or equivalently $F_1$ score) for a given precision. To this end, users are divided into several clusters based on the actual rating data and Pearson correlation coefficient. Afterwards, we give each item an incentive/penalty according to the preference tendency by users within the same cluster. Our experimental results show a significant performance improvement over the baseline CF scheme without clustering in terms of recall or $F_1$ score for a given precision.
In the last two decades, the landscape of text generation has undergone tremendous changes and is being reshaped by the success of deep learning. New technologies for text generation ranging from template-based methods to neural network-based methods emerged. Meanwhile, the research objectives have also changed from generating smooth and coherent sentences to infusing personalized traits to enrich the diversification of newly generated content. With the rapid development of text generation solutions, one comprehensive survey is urgent to summarize the achievements and track the state of the arts. In this survey paper, we present the general systematical framework, illustrate the widely utilized models and summarize the classic applications of text generation.
The article deals with the problem which led to Big Data. Big Data information technology is the set of methods and means of processing different types of structured and unstructured dynamic large amounts of data for their analysis and use of decision support. Features of NoSQL databases and categories are described. The developed Big Data Model ‘Entity and Features’ allows determining the distance between the sources of data on the availability of information about a particular entity. The information structure of Big Data has been devised. It became a basis for further research and for concentrating on a problem of development of diverse data without their preliminary integration.
Meta-learning is a tool that allows us to build sample-efficient learning systems. Here we show that, once meta-trained, LSTM Meta-Learners aren’t just faster learners than their sample-inefficient deep learning (DL) and reinforcement learning (RL) brethren, but that they actually pursue fundamentally different learning trajectories. We study their learning dynamics on three sets of structured tasks for which the corresponding learning dynamics of DL and RL systems have been previously described: linear regression (Saxe et al., 2013), nonlinear regression (Rahaman et al., 2018; Xu et al., 2018), and contextual bandits (Schaul et al., 2019). In each case, while sample-inefficient DL and RL Learners uncover the task structure in a staggered manner, meta-trained LSTM Meta-Learners uncover almost all task structure concurrently, congruent with the patterns expected from Bayes-optimal inference algorithms. This has implications for research areas wherever the learning behaviour itself is of interest, such as safety, curriculum design, and human-in-the-loop machine learning.
TensorNetwork is an open source library for implementing tensor network algorithms. Tensor networks are sparse data structures originally designed for simulating quantum many-body physics, but are currently also applied in a number of other research areas, including machine learning. We demonstrate the use of the API with applications both physics and machine learning, with details appearing in companion papers.
TensorNetwork is an open source library for implementing tensor network algorithms in TensorFlow. We describe a tree tensor network (TTN) algorithm for approximating the ground state of either a periodic quantum spin chain (1D) or a lattice model on a thin torus (2D), and implement the algorithm using TensorNetwork. We use a standard energy minimization procedure over a TTN ansatz with bond dimension $\chi$, with a computational cost that scales as $O(\chi^4)$. Using bond dimension $\chi \in [32,256]$ we compare the use of CPUs with GPUs and observe significant computational speed-ups, up to a factor of $100$, using a GPU and the TensorNetwork library.
Self Normalizing Neural Networks (SNN) proposed on Feed Forward Neural Networks (FNN) outperform regular FNN architectures in various machine learning tasks. Particularly in the domain of Computer Vision, the activation function Scaled Exponential Linear Units (SELU) proposed for SNNs, perform better than other non linear activations such as ReLU. The goal of SNN is to produce a normalized output for a normalized input. Established neural network architectures like feed forward networks and Convolutional Neural Networks (CNN) lack the intrinsic nature of normalizing outputs. Hence, requiring additional layers such as Batch Normalization. Despite the success of SNNs, their characteristic features on other network architectures like CNN haven’t been explored, especially in the domain of Natural Language Processing. In this paper we aim to show the effectiveness of proposed, Self Normalizing Convolutional Neural Networks (SCNN) on text classification. We analyze their performance with the standard CNN architecture used on several text classification datasets. Our experiments demonstrate that SCNN achieves comparable results to standard CNN model with significantly fewer parameters. Furthermore it also outperforms CNN with equal number of parameters.
This report describes a technical methodology to render the Apache Spark execution engine adaptive. It presents the engineering solutions, which specifically target to adaptively reorder predicates in data streams with evolving statistics. The system extension developed is available as an open-source prototype. Indicative experimental results show its overhead and sensitivity to tuning parameters.
Synthetic datasets have long been thought of as second-rate, to be used only when ‘real’ data collected directly from the real world is unavailable. But this perspective assumes that raw data is clean, unbiased, and trustworthy, which it rarely is. Moreover, the benefits of synthetic data for privacy and for bias correction are becoming increasingly important in any domain that works with people. Curated synthetic datasets – synthetic data derived from minimal perturbations of real data – enable early stage product development and collaboration, protect privacy, afford reproducibility, increase dataset diversity in research, and protect disadvantaged groups from problematic inferences on the original data that reflects systematic discrimination. Rather than representing a departure from the true state of the world, in this paper we argue that properly generated synthetic data is a step towards responsible and equitable research and development of machine learning systems.
The growing interest in both the automation of machine learning and deep learning has inevitably led to the development of automated methods for neural architecture optimization. The choice of the network architecture has proven to be critical, and many advances in deep learning spring from its immediate improvements. However, deep learning techniques are computationally intensive and their application requires a high level of domain knowledge. Therefore, even partial automation of this process would help make deep learning more accessible to both researchers and practitioners. With this survey, we provide a formalism which unifies and categorizes the landscape of existing methods along with a detailed analysis that compares and contrasts the different approaches. We achieve this via a discussion of common architecture search spaces and architecture optimization algorithms based on principles of reinforcement learning and evolutionary algorithms along with approaches that incorporate surrogate and one-shot models. Additionally, we address the new research directions which include constrained and multi-objective architecture search as well as automated data augmentation, optimizer and activation function search.
Numerical evaluations with comparisons to baselines play a central role when judging research in recommender systems. In this paper, we show that running baselines properly is difficult. We demonstrate this issue on two extensively studied datasets. First, we show that results for baselines that have been used in numerous publications over the past five years for the Movielens 10M benchmark are suboptimal. With a careful setup of a vanilla matrix factorization baseline, we are not only able to improve upon the reported results for this baseline but even outperform the reported results of any newly proposed method. Secondly, we recap the tremendous effort that was required by the community to obtain high quality results for simple methods on the Netflix Prize. Our results indicate that empirical findings in research papers are questionable unless they were obtained on standardized benchmarks where baselines have been tuned extensively by the research community.
We introduce NAMSG, an adaptive first-order algorithm for training neural networks. The method is efficient in computation and memory, and straightforward to implement. It computes the gradients at configurable remote observation points, in order to expedite the convergence by adjusting the step size for directions with different curvatures, in the stochastic setting. It also scales the updating vector elementwise by a nonincreasing preconditioner, to take the advantages of AMSGRAD. We analyze the convergence properties for both convex and nonconvex problems, by modeling the training process as a dynamic system, and provide a guideline to select the observation distance without grid search. We also propose a datadependent regret bound, which guarantees the convergence in the convex setting. Experiments demonstrate that NAMSG works well in practice and compares favorably to popular adaptive methods, such as ADAM, NADAM, and AMSGRAD.
We study the problem of discovering functional dependencies (FD) from a noisy dataset. We focus on FDs that correspond to statistical dependencies in a dataset and draw connections between FD discovery and structure learning in probabilistic graphical models. We show that discovering FDs from a noisy dataset is equivalent to learning the structure of a graphical model over binary random variables, where each random variable corresponds to a functional of the dataset attributes. We build upon this observation to introduce AutoFD a conceptually simple framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that our methods can recover true functional dependencies across a diverse array of real-world and synthetic datasets, even in the presence of noisy or missing data. We find that AutoFD scales to large data instances with millions of tuples and hundreds of attributes while it yields an average F1 improvement of 2 times against state-of-the-art FD discovery methods.
In this paper, we propose a novel edge-labeling graph neural network (EGNN), which adapts a deep neural network on the edge-labeling graph, for few-shot learning. The previous graph neural network (GNN) approaches in few-shot learning have been based on the node-labeling framework, which implicitly models the intra-cluster similarity and the inter-cluster dissimilarity. In contrast, the proposed EGNN learns to predict the edge-labels rather than the node-labels on the graph that enables the evolution of an explicit clustering by iteratively updating the edge-labels with direct exploitation of both intra-cluster similarity and the inter-cluster dissimilarity. It is also well suited for performing on various numbers of classes without retraining, and can be easily extended to perform a transductive inference. The parameters of the EGNN are learned by episodic training with an edge-labeling loss to obtain a well-generalizable model for unseen low-data problem. On both of the supervised and semi-supervised few-shot image classification tasks with two benchmark datasets, the proposed EGNN significantly improves the performances over the existing GNNs.
As a promising clustering method, graph-based clustering converts the input data to a graph and regards the clustering as a graph partition problem. However, traditional graph clustering methods usually suffer from two main limitations: i), graph clustering is a feed-forward process, and cannot make use of the information from clustering result, which is more discriminative than the original graph; and ii), once the graph is constructed, the clustering process is no longer related to the input data, which may neglect rich information of raw features. To solve the above defects, we propose to learn the similarity graph adaptively, which compromises the information from the raw features, the initial graph and the clustering result. And thus, the proposed model is naturally cast as a joint model to learn the graph and generate the clustering result simultaneously, which is further efficiently solved with convergence theoretically guaranteed. The advantage of the proposed model is demonstrated by comparing with 19 state of-the-art clustering methods on 10 datasets with 4 clustering metrics.
Providing unexpected recommendations is an important task for recommender systems. To do this, we need to start from the expectations of users and deviate from these expectations when recommending items. Previously proposed approaches model user expectations in the feature space, making them limited to the items that the user has visited or expected by the deduction of associated rules, without including the items that the user could also expect from the latent, complex and heterogeneous interactions between users, items and entities. In this paper, we define unexpectedness in the latent space rather than in the feature space and develop a novel Latent Convex Hull (LCH) method to provide unexpected recommendations. Extensive experiments on two real-world datasets demonstrate the effectiveness of the proposed model that significantly outperforms alternative state-of-the-art unexpected recommendation methods in terms of unexpectedness measures while achieving the same level of accuracy.
Distantly-labeled data can be used to scale up training of statistical models, but it is typically noisy and that noise can vary with the distant labeling technique. In this work, we propose a two-stage procedure for handling this type of data: denoise it with a learned model, then train our final model on clean and denoised distant data with standard supervised training. Our denoising approach consists of two parts. First, a filtering function discards examples from the distantly labeled data that are wholly unusable. Second, a relabeling function repairs noisy labels for the retained examples. Each of these components is a model trained on synthetically-noised examples generated from a small manually-labeled set. We investigate this approach on the ultra-fine entity typing task of Choi et al. (2018). Our baseline model is an extension of their model with pre-trained ELMo representations, which already achieves state-of-the-art performance. Adding distant data that has been denoised with our learned models gives further performance gains over this base model, outperforming models trained on raw distant data or heuristically-denoised distant data.
We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after $K$ episodes is $O(HL(KH)^{\frac{d-1}{d}})$ where $L$ is a smoothness parameter, and $d$ is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret.