Deep neural networks (DNNs) have shown an inherent vulnerability to adversarial examples which are maliciously crafted on real examples by attackers, aiming at making target DNNs misbehave. The threats of adversarial examples are widely existed in image, voice, speech, and text recognition and classification. Inspired by the previous work, researches on adversarial attacks and defenses in text domain develop rapidly. To the best of our knowledge, this article presents a comprehensive review on adversarial examples in text. We analyze the advantages and shortcomings of recent adversarial examples generation methods and elaborate the efficiency and limitations on countermeasures. Finally, we discuss the challenges in adversarial texts and provide a research direction of this aspect.
Modelling relationships between entities in real-world systems with a simple graph is a standard approach. However, reality is better embraced as several interdependent subsystems (or layers). Recently the concept of a multilayer network model has emerged from the field of complex systems. This model can be applied to a wide range of real-world datasets. Examples of multilayer networks can be found in the domains of life sciences, sociology, digital humanities and more. Within the domain of graph visualization there are many systems which visualize datasets having many characteristics of multilayer graphs. This report provides a state of the art and a structured analysis of contemporary multilayer network visualization, not only for researchers in visualization, but also for those who aim to visualize multilayer networks in the domain of complex systems, as well as those developing systems across application domains. We have explored the visualization literature to survey visualization techniques suitable for multilayer graph visualization, as well as tools, tasks, and analytic techniques from within application domains. This report also identifies the outstanding challenges for multilayer graph visualization and suggests future research directions for addressing them.
With the increasingly varied applications of deep learning, transfer learning has emerged as a critically important technique. However, the central question of how much feature reuse in transfer is the source of benefit remains unanswered. In this paper, we present an in-depth analysis of the effects of transfer, focusing on medical imaging, which is a particularly intriguing setting. Here, transfer learning is extremely popular, but data differences between pretraining and finetuing are considerable, reiterating the question of what is transferred. With experiments on two large scale medical imaging datasets, and CIFAR-10, we find transfer has almost negligible effects on performance, but significantly helps convergence speed. However, in all of these settings, convergence without transfer can be sped up dramatically by using only mean and variance statistics of the pretrained weights. Visualizing the lower layer filters shows that models trained from random initialization do not learn Gabor filters on medical images. We use CCA (canonical correlation analysis) to study the learned representations of the different models, finding that pretrained models are surprisingly similar to random initialization at higher layers. This similarity is evidenced both through model learning dynamics and a transfusion experiment, which explores the convergence speed using a subset of pretrained weights.
Deep Neural Networks (DNNs) are increasingly deployed in safety-critical applications including autonomous vehicles and medical diagnostics. To reduce the residual risk for unexpected DNN behaviour and provide evidence for their trustworthy operation, DNNs should be thoroughly tested. The DeepFault whitebox DNN testing approach presented in our paper addresses this challenge by employing suspiciousness measures inspired by fault localization to establish the hit spectrum of neurons and identify suspicious neurons whose weights have not been calibrated correctly and thus are considered responsible for inadequate DNN performance. DeepFault also uses a suspiciousness-guided algorithm to synthesize new inputs, from correctly classified inputs, that increase the activation values of suspicious neurons. Our empirical evaluation on several DNN instances trained on MNIST and CIFAR-10 datasets shows that DeepFault is effective in identifying suspicious neurons. Also, the inputs synthesized by DeepFault closely resemble the original inputs, exercise the identified suspicious neurons and are highly adversarial.
In many machine learning applications, one needs to interactively select a sequence of items (e.g., recommending movies based on a user’s feedback) or make sequential decisions in certain orders (e.g., guiding an agent through a series of states). Not only do sequences already pose a dauntingly large search space, but we must take into account past observations, as well as the uncertainty of future outcomes. Without further structure, finding an optimal sequence is notoriously challenging, if not completely intractable. In this paper, we introduce adaptive sequence submodularity, a rich framework that generalizes the notion of submodularity to adaptive policies that explicitly consider sequential dependencies between items. We show that once such dependencies are encoded by a directed graph, an adaptive greedy policy is guaranteed to achieve a constant factor approximation guarantee, where the constant naturally depends on the structural properties of the underlying graph. Additionally, to demonstrate the practical utility of our results, we run experiments on Amazon product recommendation and Wikipedia link prediction tasks.
Neural networks are becoming increasingly prevalent in software, and it is therefore important to be able to verify their behavior. Because verifying the correctness of neural networks is extremely challenging, it is common to focus on the verification of other properties of these systems. One important property, in particular, is robustness. Most existing definitions of robustness, however, focus on the worst-case scenario where the inputs are adversarial. Such notions of robustness are too strong, and unlikely to be satisfied by-and verifiable for-practical neural networks. Observing that real-world inputs to neural networks are drawn from non-adversarial probability distributions, we propose a novel notion of robustness: probabilistic robustness, which requires the neural network to be robust with at least $(1 - \epsilon)$ probability with respect to the input distribution. This probabilistic approach is practical and provides a principled way of estimating the robustness of a neural network. We also present an algorithm, based on abstract interpretation and importance sampling, for checking whether a neural network is probabilistically robust. Our algorithm uses abstract interpretation to approximate the behavior of a neural network and compute an overapproximation of the input regions that violate robustness. It then uses importance sampling to counter the effect of such overapproximation and compute an accurate estimate of the probability that the neural network violates the robustness property.
The Bayesian Optimisation Algorithm (BOA) is an Estimation of Distribution Algorithm (EDA) that uses a Bayesian network as probabilistic graphical model (PGM). Determining the optimal Bayesian network structure given a solution sample is an NP-hard problem. This step should be completed at each iteration of BOA, resulting in a very time-consuming process. For this reason most implementations use greedy estimation algorithms such as K2. However, we show in this paper that significant changes in PGM structure do not occur so frequently, and can be particularly sparse at the end of evolution. A statistical study of BOA is thus presented to characterise a pattern of PGM adjustments that can be used as a guide to reduce the frequency of PGM updates during the evolutionary process. This is accomplished by proposing a new BOA-based optimisation approach (FBOA) whose PGM is not updated at each iteration. This new approach avoids the computational burden usually found in the standard BOA. The results compare the performances of both algorithms on an NK-landscape optimisation problem using the correlation between the ruggedness and the expected runtime over enumerated instances. The experiments show that FBOA presents competitive results while significantly saving computational time.
Semantic parsing using hierarchical representations has recently been proposed for task oriented dialog with promising results [Gupta et al 2018]. In this paper, we present three different improvements to the model: contextualized embeddings, ensembling, and pairwise re-ranking based on a language model. We taxonomize the errors possible for the hierarchical representation, such as wrong top intent, missing spans or split spans, and show that the three approaches correct different kinds of errors. The best model combines the three techniques and gives 6.4% better exact match accuracy than the state-of-the-art, with an error reduction of 33%, resulting in a new state-of-the-art result on the Task Oriented Parsing (TOP) dataset.
The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: achievability bounds for optimal decoding methods, efficient algorithms with practical storage and computation requirements, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the rate and capacity of group testing, indicating the amount of information learned per test. Considering both noiseless and noisy settings, we identify several regimes where existing algorithms are provably optimal or near-optimal, as well as regimes where there remains greater potential for improvement. In addition, we survey results concerning a number of variations on the standard group testing problem, including partial recovery criteria, adaptive algorithms with a limited number of stages, constrained test designs, and sublinear-time algorithms.
Deep reinforcement learning has seen great success across a breadth of tasks such as in game playing and robotic manipulation. However, the modern practice of attempting to learn tabula rasa disregards the logical structure of many domains and the wealth of readily-available human domain experts’ knowledge that could help “warm start” the learning process. Further, learning from demonstration techniques are not yet sufficient to infer this knowledge through sampling-based mechanisms in large state and action spaces, or require immense amounts of data. We present a new reinforcement learning architecture that can encode expert knowledge, in the form of propositional logic, directly into a neural, tree-like structure of fuzzy propositions that are amenable to gradient descent. We show that our novel architecture is able to outperform reinforcement and imitation learning techniques across an array of canonical challenge problems for artificial intelligence.
In this work, we address optimization problems where the objective function is a nonlinear function of an expected value, i.e., compositional stochastic {strongly convex programs}. We consider the case where the decision variable is not vector-valued but instead belongs to a reproducing Kernel Hilbert Space (RKHS), motivated by risk-aware formulations of supervised learning and Markov Decision Processes defined over continuous spaces. We develop the first memory-efficient stochastic algorithm for this setting, which we call Compositional Online Learning with Kernels (COLK). COLK, at its core a two-time-scale stochastic approximation method, addresses the fact that (i) compositions of expected value problems cannot be addressed by classical stochastic gradient due to the presence of the inner expectation; and (ii) the RKHS-induced parameterization has complexity which is proportional to the iteration index which is mitigated through greedily constructed subspace projections. We establish almost sure convergence of COLK with attenuating step-sizes, and linear convergence in mean to a neighborhood with constant step-sizes, as well as the fact that its complexity is at-worst finite. The experiments with robust formulations of supervised learning demonstrate that COLK reliably converges, attains consistent performance across training runs, and thus overcomes overfitting.
The forthcoming communication systems are advancing towards improved flexibility in various aspects. Improved flexibility is crucial to cater diverse service requirements. This letter proposes a novel waveform design scheme that exploits axis warping to enable peaceful coexistence of different pulse shapes. A warping transform manipulates the lattice samples non-uniformly and provides flexibility to handle the time-frequency occupancy of a signal. The proposed approach enables the utilization of flexible pulse shapes in a quasi-orthogonal manner and increases the spectral efficiency. In addition, the rectangular resource block structure, which assists an efficient resource allocation, is preserved with the warped waveform design as well.
Neural networks underpin many of the best-performing AI systems. Their success is largely due to their strong approximation properties, superior predictive performance, and scalability. However, a major caveat is explainability: neural networks are often perceived as black boxes that permit little insight into how predictions are being made. We tackle this issue by developing a pivotal test to assess the statistical significance of the feature variables of a neural network. We propose a gradient-based test statistic and study its asymptotics using nonparametric techniques. The limiting distribution is given by a mixture of chi-square distributions. The tests enable one to discern the impact of individual variables on the prediction of a neural network. The test statistic can be used to rank variables according to their influence. Simulation results illustrate the computational efficiency and the performance of the test. An empirical application to house price valuation highlights the behavior of the test using actual data.
Sentiment analysis has been emerging recently as one of the major natural language processing (NLP) tasks in many applications. Especially, as social media channels (e.g. social networks or forums) have become significant sources for brands to observe user opinions about their products, this task is thus increasingly crucial. However, when applied with real data obtained from social media, we notice that there is a high volume of short and informal messages posted by users on those channels. This kind of data makes the existing works suffer from many difficulties to handle, especially ones using deep learning approaches. In this paper, we propose an approach to handle this problem. This work is extended from our previous work, in which we proposed to combine the typical deep learning technique of Convolutional Neural Networks with domain knowledge. The combination is used for acquiring additional training data augmentation and a more reasonable loss function. In this work, we further improve our architecture by various substantial enhancements, including negation-based data augmentation, transfer learning for word embeddings, the combination of word-level embeddings and character-level embeddings, and using multitask learning technique for attaching domain knowledge rules in the learning process. Those enhancements, specifically aiming to handle short and informal messages, help us to enjoy significant improvement in performance once experimenting on real datasets.
Adversarial examples in machine learning for images are widely publicized and explored. Illustrations of misclassifications caused by slightly perturbed inputs are abundant and commonly known (e.g., a picture of panda imperceptibly perturbed to fool the classifier into incorrectly labeling it as a gibbon). Similar attacks on deep learning (DL) for radio frequency (RF) signals and their mitigation strategies are scarcely addressed in the published work. Yet, RF adversarial examples (AdExs) with minimal waveform perturbations can cause drastic, targeted misclassification results, particularly against spectrum sensing/survey applications (e.g. BPSK is mistaken for 8-PSK). Our research on deep learning AdExs and proposed defense mechanisms are RF-centric, and incorporate physical world, over-the-air (OTA) effects. We herein present defense mechanisms based on pre-training the target classifier using an autoencoder. Our results validate this approach as a viable mitigation method to subvert adversarial attacks against deep learning-based communications and radar sensing systems.
One of the ways to train deep neural networks effectively is to use residual connections. Residual connections can be classified as being either identity connections or bridge-connections with a reshaping convolution. Empirical observations on CIFAR-10 and CIFAR-100 datasets using a baseline Resnet model, with bridge-connections removed, have shown a significant reduction in accuracy. This reduction is due to lack of contribution, in the form of feature maps, by the bridge-connections. Hence bridge-connections are vital for Resnet. However, all feature maps in the bridge-connections are considered to be equally important. In this work, an upgraded architecture ‘Res-SE-Net’ is proposed to further strengthen the contribution from the bridge-connections by quantifying the importance of each feature map and weighting them accordingly using Squeeze-and-Excitation (SE) block. It is demonstrated that Res-SE-Net generalizes much better than Resnet and SE-Resnet on the benchmark CIFAR-10 and CIFAR-100 datasets.
Image Super-Resolution (SR) is an important class of image processing techniques to enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. In this survey, we aim to give a survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.
In recent years several novel models were developed to process natural language, development of accurate language translation systems have helped us overcome geographical barriers and communicate ideas effectively. These models are developed mostly for a few languages that are widely used while other languages are ignored. Most of the languages that are spoken share lexical, syntactic and sematic similarity with several other languages and knowing this can help us leverage the existing model to build more specific and accurate models that can be used for other languages, so here I have explored the idea of representing several known popular languages in a lower dimension such that their similarities can be visualized using simple 2 dimensional plots. This can even help us understand newly discovered languages that may not share its vocabulary with any of the existing languages.
Timeline-based planning is an approach originally developed in the context of space mission planning and scheduling, where problem domains are modelled as systems made of a number of independent but interacting components, whose behaviour over time, the timelines, is governed by a set of temporal constraints. This approach is different from the action-based perspective of common PDDL-like planning languages. Timeline-based systems have been successfully deployed in a number of space missions and other domains. However, despite this practical success, a thorough theoretical understanding of the paradigm was missing. This thesis fills this gap, providing the first detailed account of formal and computational properties of the timeline-based approach to planning. In particular, we show that a particularly restricted variant of the formalism is already expressive enough to compactly capture action-based temporal planning problems. Then, finding a solution plan for a timeline-based planning problem is proved to be EXPSPACE-complete. Then, we study the problem of timeline-based planning with uncertainty, that include external components whose behaviour is not under the control of the planned system. We identify a few issues in the state-of-the-art approach based on flexible plans, proposing timeline-based games, a more general game-theoretic formulation of the problem, that addresses those issues. We show that winning strategies for such games can be found in doubly-exponential time. Then, we study the expressiveness of the formalism from a logic point of view, showing that (most of) timeline-based planning problems can be captured by Bounded TPTL with Past, a fragment of TPTL+P that, unlike the latter, keeps an EXPSPACE satisfiability problem. The logic is introduced and its satisfiabilty problem is solved by extending a recent one-pass tree-shaped tableau method for LTL.
A common technique to improve speed and robustness of learning in deep reinforcement learning (DRL) and many other machine learning algorithms is to run multiple learning agents in parallel. A neglected component in the development of these algorithms has been how best to arrange the learning agents involved to better facilitate distributed search. Here we draw upon results from the networked optimization and collective intelligence literatures suggesting that arranging learning agents in less than fully connected topologies (the implicit way agents are commonly arranged in) can improve learning. We explore the relative performance of four popular families of graphs and observe that one such family (Erdos-Renyi random graphs) empirically outperforms the standard fully-connected communication topology across several DRL benchmark tasks. We observe that 1000 learning agents arranged in an Erdos-Renyi graph can perform as well as 3000 agents arranged in the standard fully-connected topology, showing the large learning improvement possible when carefully designing the topology over which agents communicate. We complement these empirical results with a preliminary theoretical investigation of why less than fully connected topologies can perform better. Overall, our work suggests that distributed machine learning algorithms could be made more efficient if the communication topology between learning agents was optimized.
The proliferation of smart mobile devices has spurred an explosive growth of mobile crowd-learning services, where service providers rely on the user community to voluntarily collect, report, and share real-time information for a collection of scattered points of interest. A critical factor affecting the future large-scale adoption of such mobile crowd-learning applications is the freshness of the crowd-learned information, which can be measured by a metric termed “age-of-information” (AoI). However, we show that the AoI of mobile crowd-learning could be arbitrarily bad under selfish users’ behaviors if the system is poorly designed. This motivates us to design efficient reward mechanisms to incentivize mobile users to report information in time, with the goal of keeping the AoI and congestion level of each PoI low. Toward this end, we consider a simple linear AoI-based reward mechanism and analyze its AoI and congestion performances in terms of price of anarchy (PoA), which characterizes the degradation of the system efficiency due to selfish behavior of users. Remarkably, we show that the proposed mechanism achieves the optimal AoI performance asymptotically in a deterministic scenario. Further, we prove that the proposed mechanism achieves a bounded PoA in general stochastic cases, and the bound only depends on system parameters. Particularly, when the service rates of PoIs are symmetric in stochastic cases, the achieved PoA is upper-bounded by $1/2$ asymptotically. Collectively, this work advances our understanding of information freshness in mobile crowd-learning systems.
Sum-Product Networks (SPNs) are hierarchical probabilistic graphical models capable of fast and exact inference. Applications of SPNs to real-world data such as large image datasets has been fairly limited in previous literature. We introduce Convolutional Sum-Product Networks (ConvSPNs) which exploit the inherent structure of images in a way similar to deep convolutional neural networks, optionally with weight sharing. ConvSPNs encode spatial relationships through local products and local sum operations. ConvSPNs obtain state-of-the-art results compared to other SPN-based approaches on several visual datasets, including color images, for both generative as well as discriminative tasks. ConvSPNs are the first pure-SPN models applied to color images that do not depend on additional techniques for feature extraction. In addition, we introduce two novel methods for regularizing SPNs trained with hard EM. Both regularization methods have been motivated by observing an exponentially decreasing variance of log probabilities with respect to the depth of randomly structured SPNs. We show that our regularization provides substantial further improvements in generative visual tasks.
Distributed learning is central for large-scale training of deep-learning models. However, they are exposed to a security threat in which Byzantine participants can interrupt or control the learning process. Previous attack models and their corresponding defenses assume that the rogue participants are (a) omniscient (know the data of all other participants), and (b) introduce large change to the parameters. We show that small but well-crafted changes are sufficient, leading to a novel non-omniscient attack on distributed learning that go undetected by all existing defenses. We demonstrate our attack method works not only for preventing convergence but also for repurposing of the model behavior (backdooring). We show that 20% of corrupt workers are sufficient to degrade a CIFAR10 model accuracy by 50%, as well as to introduce backdoors into MNIST and CIFAR10 models without hurting their accuracy
Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the main components and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used image and video datasets and the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.
Conversational agents are systems with a conversational interface that afford interaction in spoken language. These systems are becoming prevalent and are preferred in various contexts and for many users. Despite their increasing success, the automated testing infrastructure to support the effective and efficient development of such systems compared to traditional software systems is still limited. Automated testing framework for conversational systems can improve the quality of these systems by assisting developers to write, execute, and maintain test cases. In this paper, we introduce our work-in-progress automated testing framework, and its realization in the Python programming language. We discuss some research problems in the development of such an automated testing framework for conversational agents. In particular, we point out the problems of the specification of the expected behavior, known as test oracles, and semantic comparison of utterances.
Emission control technologies installed on power plants are a key feature of many air pollution regulations in the US. While such regulations are predicated on the presumed relationships between emissions, ambient air pollution, and human health, many of these relationships have never been empirically verified. The goal of this paper is to develop new statistical methods to quantify these relationships. We frame this problem as one of mediation analysis to evaluate the extent to which the effect of a particular control technology on ambient pollution is mediated through causal effects on power plant emissions. Since power plants emit various compounds that contribute to ambient pollution, we develop new methods for multiple intermediate variables that are measured contemporaneously, may interact with one another, and may exhibit joint mediating effects. Specifically, we propose new methods leveraging two related frameworks for causal inference in the presence of mediating variables: principal stratification and causal mediation analysis. We define principal effects based on multiple mediators, and also introduce a new decomposition of the total effect of an intervention on ambient pollution into the natural direct effect and natural indirect effects for all combinations of mediators. Both approaches are anchored to the same observed-data models, which we specify with Bayesian nonparametric techniques. We provide assumptions for estimating principal causal effects, then augment these with an additional assumption required for causal mediation analysis. The two analyses, interpreted in tandem, provide the first empirical investigation of the presumed causal pathways that motivate important air quality regulatory policies.
We consider a context-based dynamic pricing problem of online products which have low sales. Sales data from Alibaba, a major global online retailer, illustrate the prevalence of low-sale products. For these products, existing single-product dynamic pricing algorithms do not work well due to insufficient data samples. To address this challenge, we propose pricing policies that concurrently perform clustering over products and set individual pricing decisions on the fly. By clustering data and identifying products that have similar demand patterns, we utilize sales data from products within the same cluster to improve demand estimation and allow for better pricing decisions. We evaluate the algorithms using the regret, and the result shows that when product demand functions come from multiple clusters, our algorithms significantly outperform traditional single-product pricing policies. Numerical experiments using a real dataset from Alibaba demonstrate that the proposed policies, compared with several benchmark policies, increase the revenue. The results show that online clustering is an effective approach to tackling dynamic pricing problems associated with low-sale products. Our algorithms were further implemented in a field study at Alibaba with 40 products for 30 consecutive days, and compared to the products which use business-as-usual pricing policy of Alibaba. The results from the field experiment show that the overall revenue increased by 10.14%.
With the increasing importance of online communities, discussion forums, and customer reviews, Internet ‘trolls’ have proliferated thereby making it difficult for information seekers to find relevant and correct information. In this paper, we consider the problem of detecting and identifying Internet trolls, almost all of which are human agents. Identifying a human agent among a human population presents significant challenges compared to detecting automated spam or computerized robots. To learn a troll’s behavior, we use contextual anomaly detection to profile each chat user. Using clustering and distance-based methods, we use contextual data such as the group’s current goal, the current time, and the username to classify each point as an anomaly. A user whose features significantly differ from the norm will be classified as a troll. We collected 38 million data points from the viral Internet fad, Twitch Plays Pokemon. Using clustering and distance-based methods, we develop heuristics for identifying trolls. Using MapReduce techniques for preprocessing and user profiling, we are able to classify trolls based on 10 features extracted from a user’s lifetime history.
Image colorization achieves more and more realistic results with the increasing computation power of recent deep learning techniques. It becomes more difficult to identify the fake colorized images by human eyes. In this work, we propose a novel forensic method to distinguish between natural images (NIs) and colorized images (CIs) based on convolutional neural network (CNN). Our method is able to achieve high classification accuracy and cope with the challenging scenario of blind detection, i.e., no training sample is available from ‘unknown’ colorization algorithm that we may encounter during the testing phase. This blind detection performance can be regarded as a generalization performance. First, we design and implement a base network, which can attain better performance in terms of classification accuracy and generalization (in most cases) compared with state-of-the-art methods. Furthermore, we design a new branch, which analyzes smaller regions of extracted features, and insert it into the above base network. Consequently, our network can not only improve the classification accuracy, but also enhance the generalization in the vast majority of cases. To further improve the performance of blind detection, we propose to automatically construct negative samples through linear interpolation of paired natural and colorized images. Then, we progressively insert these negative samples into the original training dataset and continue to train the network. Experimental results demonstrate that our method can achieve stable and high generalization performance when tested against different state-of-the-art colorization algorithms.
Due to globalization, geographic boundaries no longer serve as effective shields for the spread of infectious diseases. In order to aid bio-surveillance analysts in disease tracking, recent research has been devoted to developing information retrieval and analysis methods utilizing the vast corpora of publicly available documents on the internet. In this work, we present methods for the automated retrieval and classification of documents related to active public health events. We demonstrate classification performance on an auto-generated corpus, using recurrent neural network, TF-IDF, and Naive Bayes log count ratio document representations. By jointly modeling the title and description of a document, we achieve 97% recall and 93.3% accuracy with our best performing bio-surveillance event classification model: logistic regression on the combined output from a pair of bidirectional recurrent neural networks.
Incorporating knowledge graph (KG) into recommender system is promising in improving the recommendation accuracy and explainability. However, existing methods largely assume that a KG is complete and simply transfer the ‘knowledge’ in KG at the shallow level of entity raw data or embeddings. This may lead to suboptimal performance, since a practical KG can hardly be complete, and it is common that a KG has missing facts, relations, and entities. Thus, we argue that it is crucial to consider the incomplete nature of KG when incorporating it into recommender system. In this paper, we jointly learn the model of recommendation and knowledge graph completion. Distinct from previous KG-based recommendation methods, we transfer the relation information in KG, so as to understand the reasons that a user likes an item. As an example, if a user has watched several movies directed by (relation) the same person (entity), we can infer that the director relation plays a critical role when the user makes the decision, thus help to understand the user’s preference at a finer granularity. Technically, we contribute a new translation-based recommendation model, which specially accounts for various preferences in translating a user to an item, and then jointly train it with a KG completion model by combining several transfer schemes. Extensive experiments on two benchmark datasets show that our method outperforms state-of-the-art KG-based recommendation methods. Further analysis verifies the positive effect of joint training on both tasks of recommendation and KG completion, and the advantage of our model in understanding user preference. We publish our project at https://…/joint-kg-recommender.
Unlike other languages, the Arabic language has a morphological complexity which makes the Arabic sentiment analysis is a challenging task. Moreover, the presence of the dialects in the Arabic texts have made the sentiment analysis task is more challenging, due to the absence of specific rules that govern the writing or speaking system. Generally, one of the problems of sentiment analysis is the high dimensionality of the feature vector. To resolve this problem, many feature selection methods have been proposed. In contrast to the dialectal Arabic language, these selection methods have been investigated widely for the English language. This work investigated the effect of feature selection methods and their combinations on dialectal Arabic sentiment classification. The feature selection methods are Information Gain (IG), Correlation, Support Vector Machine (SVM), Gini Index (GI), and Chi-Square. A number of experiments were carried out on dialectical Jordanian reviews with using an SVM classifier. Furthermore, the effect of different term weighting schemes, stemmers, stop words removal, and feature models on the performance were investigated. The experimental results showed that the best performance of the SVM classifier was obtained after the SVM and correlation feature selection methods had been combined with the uni-gram model.