This paper argues the need for research to realize uncertainty-aware artificial intelligence and machine learning (AI\&ML) systems for decision support by describing a number of motivating scenarios. Furthermore, the paper defines uncertainty-awareness and lays out the challenges along with surveying some promising research directions. A theoretical demonstration illustrates how two emerging uncertainty-aware ML and AI technologies could be integrated and be of value for a route planning operation.
Mediation analysis aims at disentangling the effects of a treatment on an outcome through alternative causal mechanisms and has become a popular practice in biomedical and social science applications. The causal framework based on counterfactuals is currently the standard approach to mediation, with important methodological advances introduced in the literature in the last decade, especially for simple mediation, that is with one mediator at the time. Among a variety of alternative approaches, K. Imai et al. showed theoretical results and developed an R package to deal with simple mediation as well as with multiple mediation involving multiple mediators conditionally independent given the treatment and baseline covariates. This approach does not allow to consider the often encountered situation in which an unobserved common cause induces a spurious correlation between the mediators. In this context, which we refer to as mediation with uncausally related mediators, we show that, under appropriate hypothesis, the natural direct and indirect effects are non-parametrically identifiable. These results are promptly translated into unbiased estimators using the same quasi-Bayesian algorithm developed by Imai et al. We validate our method by an original simulation study. As an illustration, we apply our method on a real data set from a large cohort to assess the effect of hormone replacement treatment on breast cancer risk through three mediators, namely dense mammographic area, nondense area and body mass index.
Artificial Intelligence (AI) approaches to problem-solving and decision-making are becoming more and more complex, leading to a decrease in the understandability of solutions. The European Union’s new General Data Protection Regulation tries to tackle this problem by stipulating a ‘right to explanation’ for decisions made by AI systems. One of the AI paradigms that may be affected by this new regulation is Answer Set Programming (ASP). Thanks to the emergence of efficient solvers, ASP has recently been used for problem-solving in a variety of domains, including medicine, cryptography, and biology. To ensure the successful application of ASP as a problem-solving paradigm in the future, explanations of ASP solutions are crucial. In this survey, we give an overview of approaches that provide an answer to the question of why an answer set is a solution to a given problem, notably off-line justifications, causal graphs, argumentative explanations and why-not provenance, and highlight their similarities and differences. Moreover, we review methods explaining why a set of literals is not an answer set or why no solution exists at all.
Simulation plays an essential role in comprehending a target system in many fields of social and industrial sciences. A major task in simulation is the estimation of parameters, and optimal parameters to express the observed data need to directly elucidate the properties of the target system as the design of the simulator is based on the expert’s domain knowledge. However, skilled human experts struggle to find the desired parameters.Data assimilation therefore becomes an unavoidable task in simulator design to reduce the cost of simulator optimization. Another necessary task is extrapolation; in many practical cases, the prediction based on simulation results will be often outside of the dominant range of the given data area, and this is referred to as the covariate shift. This paper focuses on the regression problem with the covariate shift. While the parameter estimation for the covariate shift has been studied thoroughly in parametric and nonparametric settings, conventional statistical methods of parameter searching are not applicable in the data assimilation of the simulation owing to the properties of the likelihood function: intractable or nondifferentiable. To address these problems, we propose a novel framework of Bayesian inference based on kernel mean embedding that comprises an extended kernel approximate Bayesian computation (ABC) of the importance weighted regression, kernel herding, and the kernel sum rule. This framework makes the prediction available in covariate shift situations, and its effectiveness is evaluated in both synthetic numerical experiments and a widely used production simulator.
Determining the programming language of a source code file has been considered in the research community; it has been shown that Machine Learning (ML) and Natural Language Processing (NLP) algorithms can be effective in identifying the programming language of source code files. However, determining the programming language of a code snippet or a few lines of source code is still a challenging task. Online forums such as Stack Overflow and code repositories such as GitHub contain a large number of code snippets. In this paper, we describe Source Code Classification (SCC), a classifier that can identify the programming language of code snippets written in 21 different programming languages. A Multinomial Naive Bayes (MNB) classifier is employed which is trained using Stack Overflow posts. It is shown to achieve an accuracy of 75% which is higher than that with Programming Languages Identification (PLI a proprietary online classifier of snippets) whose accuracy is only 55.5%. The average score for precision, recall and the F1 score with the proposed tool are 0.76, 0.75 and 0.75, respectively. In addition, it can distinguish between code snippets from a family of programming languages such as C, C++ and C#, and can also identify the programming language version such as C# 3.0, C# 4.0 and C# 5.0.
Stack Overflow is the most popular Q&A website among software developers. As a platform for knowledge sharing and acquisition, the questions posted in Stack Overflow usually contain a code snippet. Stack Overflow relies on users to properly tag the programming language of a question and it simply assumes that the programming language of the snippets inside a question is the same as the tag of the question itself. In this paper, we propose a classifier to predict the programming language of questions posted in Stack Overflow using Natural Language Processing (NLP) and Machine Learning (ML). The classifier achieves an accuracy of 91.1% in predicting the 24 most popular programming languages by combining features from the title, body and the code snippets of the question. We also propose a classifier that only uses the title and body of the question and has an accuracy of 81.1%. Finally, we propose a classifier of code snippets only that achieves an accuracy of 77.7%. These results show that deploying Machine Learning techniques on the combination of text and the code snippets of a question provides the best performance. These results demonstrate also that it is possible to identify the programming language of a snippet of few lines of source code. We visualize the feature space of two programming languages Java and SQL in order to identify some special properties of information inside the questions in Stack Overflow corresponding to these languages.
Most existing recommender systems leverage the data of one type of user behaviors only, such as the purchase behavior in E-commerce that is directly related to the business KPI (Key Performance Indicator) of conversion rate. Besides the key behavioral data, we argue that other forms of user behaviors also provide valuable signal on a user’s preference, such as views, clicks, adding a product to shop carts and so on. They should be taken into account properly to provide quality recommendation for users. In this work, we contribute a novel solution named NMTR (short for Neural Multi-Task Recommendation) for learning recommender systems from multiple types of user behaviors. We develop a neural network model to capture the complicated and multi-type interactions between users and items. In particular, our model accounts for the cascading relationship among behaviors (e.g., a user must click on a product before purchasing it). To fully exploit the signal in the data of multiple types of behaviors, we perform a joint optimization based on the multi-task learning framework, where the optimization on a behavior is treated as a task. Extensive experiments on two real-world datasets demonstrate that NMTR significantly outperforms state-of-the-art recommender systems that are designed to learn from both single-behavior data and multi-behavior data. Further analysis shows that modeling multiple behaviors is particularly useful for providing recommendation for sparse users that have very few interactions.
Recently, along with the rapid development of mobile communication technology, edge computing theory and techniques have been attracting more and more attentions from global researchers and engineers, which can significantly bridge the capacity of cloud and requirement of devices by the network edges, and thus can accelerate the content deliveries and improve the quality of mobile services. In order to bring more intelligence to the edge systems, compared to traditional optimization methodology, and driven by the current deep learning techniques, we propose to integrate the Deep Reinforcement Learning techniques and Federated Learning framework with the mobile edge systems, for optimizing the mobile edge computing, caching and communication. And thus, we design the ‘In-Edge AI’ framework in order to intelligently utilize the collaboration among devices and edge nodes to exchange the learning parameters for a better training and inference of the models, and thus to carry out dynamic system-level optimization and application-level enhancement while reducing the unnecessary system communication load. ‘In-Edge AI’ is evaluated and proved to have near-optimal performance but relatively low overhead of learning, while the system is cognitive and adaptive to the mobile communication systems. Finally, we discuss several related challenges and opportunities for unveiling a promising upcoming future of ‘In-Edge AI’.
Neural networks are increasingly deployed in real-world safety-critical domains such as autonomous driving, aircraft collision avoidance, and malware detection. However, these networks have been shown to often mispredict on inputs with minor adversarial or even accidental perturbations. Consequences of such errors can be disastrous and even potentially fatal as shown by the recent Tesla autopilot crash. Thus, there is an urgent need for formal analysis systems that can rigorously check neural networks for violations of different safety properties such as robustness against adversarial perturbations within a certain $L$-norm of a given image. An effective safety analysis system for a neural network must be able to either ensure that a safety property is satisfied by the network or find a counterexample, i.e., an input for which the network will violate the property. Unfortunately, most existing techniques for performing such analysis struggle to scale beyond very small networks and the ones that can scale to larger networks suffer from high false positives and cannot produce concrete counterexamples in case of a property violation. In this paper, we present a new efficient approach for rigorously checking different safety properties of neural networks that significantly outperforms existing approaches by multiple orders of magnitude. Our approach can check different safety properties and find concrete counterexamples for networks that are 10$\times$ larger than the ones supported by existing analysis techniques. We believe that our approach to estimating tight output bounds of a network for a given input range can also help improve the explainability of neural networks and guide the training process of more robust neural networks.
We propose SoaAlloc, a dynamic object allocator for Single-Method Multiple-Objects applications in CUDA. SoaAlloc is the first allocator for GPUs that (a) arranges allocations in a SIMD-friendly Structure of Arrays (SOA) data layout, (b) provides a do-all operation for maximizing the benefit of SOA, and (c) is on par with state-of-the-art memory allocators for raw (de)allocation time. Our benchmarks show that the SOA layout leads to significantly better memory bandwidth utilization, resulting in a 2x speedup of application code.
We present an analysis into the inner workings of Convolutional Neural Networks (CNNs) for processing text. CNNs used for computer vision can be interpreted by projecting filters into image space, but for discrete sequence inputs CNNs remain a mystery. We aim to understand the method by which the networks process and classify text. We examine common hypotheses to this problem: that filters, accompanied by global max-pooling, serve as ngram detectors. We show that filters may capture several different semantic classes of ngrams by using different activation patterns, and that global max-pooling induces behavior which separates important ngrams from the rest. Finally, we show practical use cases derived from our findings in the form of model interpretability (explaining a trained model by deriving a concrete identity for each filter, bridging the gap between visualization tools in vision tasks and NLP) and prediction interpretability (explaining predictions).
This paper describes how to carry out a feasibility study for a potential knowledge based system application. It discusses factors to be considered under three headings: the business case, the technical feasibility, and stakeholder issues. It concludes with a case study of a feasibility study for a KBS to guide surgeons in diagnosis and treatment of thyroid conditions.
While the machine learning literature dedicated to fully automated reasoning algorithms is abundant, the number of methods enabling the inference process on the basis of previously defined knowledge structures is scanter. Fuzzy Cognitive Maps (FCMs) are neural networks that can be exploited towards this goal because of their flexibility to handle external knowledge. However, FCMs suffer from a number of issues that range from the limited prediction horizon to the absence of theoretically sound learning algorithms able to produce accurate predictions. In this paper, we propose a neural network system named Short-term Cognitive Networks that tackle some of these limitations. In our model weights are not constricted and may have a causal nature or not. As a second contribution, we present a nonsynaptic learning algorithm to improve the network performance without modifying the previously defined weights. Moreover, we derive a stop condition to prevent the learning algorithm from iterating without decreasing the simulation error.
In open set learning, a model must be able to generalize to novel classes when it encounters a sample that does not belong to any of the classes it has seen before. Open set learning poses a realistic learning scenario that is receiving growing attention. Existing studies on open set learning mainly focused on detecting novel classes, but few studies tried to model them for differentiating novel classes. We recognize that novel classes should be different from each other, and propose distribution networks for open set learning that can learn and model different novel classes. We hypothesize that, through a certain mapping, samples from different classes with the same classification criterion should follow different probability distributions from the same distribution family. We estimate the probability distribution for each known class and a novel class is detected when a sample is not likely to belong to any of the known distributions. Due to the large feature dimension in the original feature space, the probability distributions in the original feature space are difficult to estimate. Distribution networks map the samples in the original feature space to a latent space where the distributions of known classes can be jointly learned with the network. In the latent space, we also propose a distribution parameter transfer strategy for novel class detection and modeling. By novel class modeling, the detected novel classes can serve as known classes to the subsequent classification. Our experimental results on image datasets MNIST and CIFAR10 and text dataset Ohsumed show that the distribution networks can detect novel classes accurately and model them well for the subsequent classification tasks.
The prosperity of smart mobile devices has made mobile crowdsensing (MCS) a promising paradigm for completing complex sensing and computation tasks. In the past, great efforts have been made on the design of incentive mechanisms and task allocation strategies from MCS platform’s perspective to motivate mobile users’ participation. However, in practice, MCS participants face many uncertainties coming from their sensing environment as well as other participants’ strategies, and how do they interact with each other and make sensing decisions is not well understood. In this paper, we take MCS participants’ perspective to derive an online sensing policy to maximize their payoffs via MCS participation. Specifically, we model the interactions of mobile users and sensing environments as a multi-agent Markov decision process. Each participant cannot observe others’ decisions, but needs to decide her effort level in sensing tasks only based on local information, e.g., its own record of sensed signals’ quality. To cope with the stochastic sensing environment, we develop an intelligent crowdsensing algorithm IntelligentCrowd by leveraging the power of multi-agent reinforcement learning (MARL). Our algorithm leads to the optimal sensing policy for each user to maximize the expected payoff against stochastic sensing environments, and can be implemented at individual participant’s level in a distributed fashion. Numerical simulations demonstrate that IntelligentCrowd significantly improves users’ payoffs in sequential MCS tasks under various sensing dynamics.
As Artificial Intelligence (AI) technologies proliferate, concern has centered around the long-term dangers of job loss or threats of machines causing harm to humans. All of this concern, however, detracts from the more pertinent and already existing threats posed by AI today: its ability to amplify bias found in training datasets, and swiftly impact marginalized populations at scale. Government and public sector institutions have a responsibility to citizens to establish a dialogue with technology developers and release thoughtful policy around data standards to ensure diverse representation in datasets to prevent bias amplification and ensure that AI systems are built with inclusion in mind.
In this paper, we present LaSOT, a high-quality benchmark for Large-scale Single Object Tracking. LaSOT consists of 1,400 sequences with more than 3.5M frames in total. Each frame in these sequences is carefully and manually annotated with a bounding box, making LaSOT the largest, to the best of our knowledge, densely annotated tracking benchmark. The average sequence length of LaSOT is more than 2,500 frames, and each sequence comprises various challenges deriving from the wild where target objects may disappear and re-appear again in the view. By releasing LaSOT, we expect to provide the community a large-scale dedicated benchmark with high-quality for both the training of deep trackers and the veritable evaluation of tracking algorithms. Moreover, considering the close connections of visual appearance and natural language, we enrich LaSOT by providing additional language specification, aiming at encouraging the exploration of natural linguistic feature for tracking. A thorough experimental evaluation of 35 tracking algorithms on LaSOT is presented with detailed analysis, and the results demonstrate that there is still a big room to improvements. The benchmark and evaluation results are made publicly available at https://…/.
Bayesian model-based clustering is a widely applied procedure for discovering groups of related observations in a dataset. These approaches use Bayesian mixture models, estimated with MCMC, which provide posterior samples of the model parameters and clustering partition. While inference on model parameters is well established, inference on the clustering partition is less developed. A new method is developed for estimating the optimal partition from the pairwise posterior similarity matrix generated by a Bayesian cluster model. This approach uses non-negative matrix factorization (NMF) to provide a low-rank approximation to the similarity matrix. The factorization permits hard or soft partitions and is shown to perform better than several popular alternatives under a variety of penalty functions.
This document introduces a strategy to solve linear optimization problems. The strategy is based on the bounding condition each constraint produces on each one of the problem’s dimension. The solution of a linear optimization problem is located at the intersection of the constraints defining the extreme vertex. By identifying the constraints that limit the growth of the objective function value, we formulate linear equations system leading to the optimization problem’s solution.The most complex operation of the algorithm is the inversion of a matrix sized by the number of dimensions of the problem. Therefore, the algorithm’s complexity is comparable to the corresponding to the classical Simplex method and the more recently developed Linear Programming algorithms. However, the algorithm offers the advantage of being non-iterative.
This paper presents a novel approach for automatic rule learning applicable to an autonomous driving system using real driving data.
Background: Finding biomedical named entities is one of the most essential tasks in biomedical text mining. Recently, deep learning-based approaches have been applied to biomedical named entity recognition (BioNER) and showed promising results. However, as deep learning approaches need an abundant amount of training data, a lack of data can hinder performance. BioNER datasets are scarce resources and each dataset covers only a small subset of entity types. Furthermore, many bio entities are polysemous, which is one of the major obstacles in named entity recognition. Results: To address the lack of data and the entity type misclassification problem, we propose CollaboNet which utilizes a combination of multiple NER models. In CollaboNet, models trained on a different dataset are connected to each other so that a target model obtains information from other collaborator models to reduce false positives. Every model is an expert on their target entity type and takes turns serving as a target and a collaborator model during training time. The experimental results show that CollaboNet can be used to greatly reduce the number of false positives and misclassified entities including polysemous words. CollaboNet achieved state-of-the-art performance in terms of precision, recall and F1 score. Conclusions: We demonstrated the benefits of combining multiple models for BioNER. Our model has successfully reduced the number of misclassified entities and improved the performance by leveraging multiple datasets annotated for different entity types. Given the state-of-the-art performance of our model, we believe that CollaboNet can improve the accuracy of downstream biomedical text mining applications such as bio-entity relation extraction.
Machine learning methods such as convolutional neural networks (CNNs) are becoming an integral part of scientific research in many disciplines, spatial vector data often fail to be analyzed using these powerful learning methods because of its irregularities. With the aid of graph Fourier transform and convolution theorem, it is possible to convert the convolution as a point-wise product in Fourier domain and construct a learning architecture of CNN on graph for the analysis task of irregular spatial data. In this study, we used the classification task of building patterns as a case study to test this method, and experiments showed that this method has achieved outstanding results in identifying regular and irregular patterns, and has significantly improved in comparing with other methods.
Existing dialog datasets contain a sequence of utterances and responses without any explicit background knowledge associated with them. This has resulted in the development of models which treat conversation as a sequence-to-sequence generation task i.e, given a sequence of utterances generate the response sequence). This is not only an overly simplistic view of conversation but it is also emphatically different from the way humans converse by heavily relying on their background knowledge about the topic (as opposed to simply relying on the previous sequence of utterances). For example, it is common for humans to (involuntarily) produce utterances which are copied or suitably modified from background articles they have read about the topic. To facilitate the development of such natural conversation models which mimic the human process of conversing, we create a new dataset containing movie chats wherein each response is explicitly generated by copying and/or modifying sentences from unstructured background knowledge such as plots, comments and reviews about the movie. We establish baseline results on this dataset (90K utterances from 9K conversations) using three different models: (i) pure generation based models which ignore the background knowledge (ii) generation based models which learn to copy information from the background knowledge when required and (iii) span prediction based models which predict the appropriate response span in the background knowledge.