Improving Natural Language Inference with a Pretrained Parser

We introduce a novel approach to incorporate syntax into natural language inference (NLI) models. Our method uses contextual token-level vector representations from a pretrained dependency parser. Like other contextual embedders, our method is broadly applicable to any neural model. We experiment with four strong NLI models (decomposable attention model, ESIM, BERT, and MT-DNN), and show consistent benefit to accuracy across three NLI benchmarks.

Performance of Recommender Systems: Based on Content Navigator and Collaborative Filtering

In the world of big data, many people find it difficult to access the information they need quickly and accurately. In order to overcome this, research on the system that recommends information accurately to users is continuously conducted. Collaborative Filtering is one of the famous algorithms among the most used in the industry. However, collaborative filtering is difficult to use in online systems because user recommendation is highly volatile in recommendation quality and requires computation using large matrices. To overcome this problem, this paper proposes a method similar to database queries and a clustering method (Contents Navigator) originating from a complex network.

Epistemic Logic Programs: A Different World View

Epistemic Logic Programs (ELPs), an extension of Answer Set Programming (ASP) with epistemic operators, have received renewed attention from the research community in recent years. Classically, evaluating an ELP yields a set of world views, with each being a set of answer sets. In this paper, we propose an alternative definition of world views that represents them as three-valued assignments, where each atom can be either always true, always false, or neither. Based on several examples, we show that this definition is natural and intuitive. We also investigate relevant computational properties of these new semantics, and explore how other notions, like strong equivalence, are affected.

Value of Information in Probabilistic Logic Programs

In medical decision making, we have to choose among several expensive diagnostic tests such that the certainty about a patient’s health is maximized while remaining within the bounds of resources like time and money. The expected increase in certainty in the patient’s condition due to performing a test is called the value of information (VoI) for that test. In general, VoI relates to acquiring additional information to improve decision-making based on probabilistic reasoning in an uncertain system. This paper presents a framework for acquiring information based on VoI in uncertain systems modeled as Probabilistic Logic Programs (PLPs). Optimal decision-making in uncertain systems modeled as PLPs have already been studied before. But, acquiring additional information to further improve the results of making the optimal decision has remained open in this context. We model decision-making in an uncertain system with a PLP and a set of top-level queries, with a set of utility measures over the distributions of these queries. The PLP is annotated with a set of atoms labeled as ‘observable’; in the medical diagnosis example, the observable atoms will be results of diagnostic tests. Each observable atom has an associated cost. This setting of optimally selecting observations based on VoI is more general than that considered by any prior work. Given a limited budget, optimally choosing observable atoms based on VoI is intractable in general. We give a greedy algorithm for constructing a ‘conditional plan’ of observations: a schedule where the selection of what atom to observe next depends on earlier observations. We show that, preempting the algorithm anytime before completion provides a usable result, the result improves over time, and, in the absence of a well-defined budget, converges to the optimal solution.

Information Extraction Tool Text2ALM: From Narratives to Action Language System Descriptions

In this work we design a narrative understanding tool Text2ALM. This tool uses an action language ALM to perform inferences on complex interactions of events described in narratives. The methodology used to implement the Text2ALM system was originally outlined by Lierler, Inclezan, and Gelfond (2017) via a manual process of converting a narrative to an ALM model. It relies on a conglomeration of resources and techniques from two distinct fields of artificial intelligence, namely, natural language processing and knowledge representation and reasoning. The effectiveness of system Text2ALM is measured by its ability to correctly answer questions from the bAbI tasks published by Facebook Research in 2015. This tool matched or exceeded the performance of state-of-the-art machine learning methods in six of the seven tested tasks. We also illustrate that the Text2ALM approach generalizes to a broader spectrum of narratives.

On the Strong Equivalences of LPMLN Programs

By incorporating the methods of Answer Set Programming (ASP) and Markov Logic Networks (MLN), LPMLN becomes a powerful tool for non-monotonic, inconsistent and uncertain knowledge representation and reasoning. To facilitate the applications and extend the understandings of LPMLN, we investigate the strong equivalences between LPMLN programs in this paper, which is regarded as an important property in the field of logic programming. In the field of ASP, two programs P and Q are strongly equivalent, iff for any ASP program R, the programs P and Q extended by R have the same stable models. In other words, an ASP program can be replaced by one of its strong equivalent without considering its context, which helps us to simplify logic programs, enhance inference engines, construct human-friendly knowledge bases etc. Since LPMLN is a combination of ASP and MLN, the notions of strong equivalences in LPMLN is quite different from that in ASP. Firstly, we present the notions of p-strong and w-strong equivalences between LPMLN programs. Secondly, we present a characterization of the notions by generalizing the SE-model approach in ASP. Finally, we show the use of strong equivalences in simplifying LPMLN programs, and present a sufficient and necessary syntactic condition that guarantees the strong equivalence between a single LPMLN rule and the empty program.

Strong Equivalence for LPMLN Programs

LPMLN is a probabilistic extension of answer set programs with the weight scheme adapted from Markov Logic. We study the concept of strong equivalence in LPMLN, which is a useful mathematical tool for simplifying a part of an LPMLN program without looking at the rest of it. We show that the verification of strong equivalence in LPMLN can be reduced to equivalence checking in classical logic via a reduct and choice rules as well as to equivalence checking under the ‘soft’ logic of here-and-there. The result allows us to leverage an answer set solver for LPMLN strong equivalence checking. The study also suggests us a few reformulations of the LPMLN semantics using choice rules, the logic of here-and-there, and classical logic.

BigData Applications from Graph Analytics to Machine Learning by Aggregates in Recursion

In the past, the semantic issues raised by the non-monotonic nature of aggregates often prevented their use in the recursive statements of logic programs and deductive databases. However, the recently introduced notion of Pre-mappability (PreM) has shown that, in key applications of interest, aggregates can be used in recursion to optimize the perfect-model semantics of aggregate-stratified programs. Therefore we can preserve the declarative formal semantics of such programs while achieving a highly efficient operational semantics that is conducive to scalable implementations on parallel and distributed platforms. In this paper, we show that with PreM, a wide spectrum of classical algorithms of practical interest, ranging from graph analytics and dynamic programming based optimization problems to data mining and machine learning applications can be concisely expressed in declarative languages by using aggregates in recursion. Our examples are also used to show that PreM can be checked using simple techniques and templatized verification strategies. A wide range of advanced BigData applications can now be expressed declaratively in logic-based languages, including Datalog, Prolog, and even SQL, while enabling their execution with superior performance and scalability.

A Temporal Module for Logical Frameworks

In artificial intelligence, multi agent systems constitute an interesting typology of society modeling, and have in this regard vast fields of application, which extend to the human sciences. Logic is often used to model such kind of systems as it is easier to verify than other approaches, and provides explainability and potential validation. In this paper we define a time module suitable to add time to many logic representations of agents.

Induction of Non-monotonic Logic Programs To Explain Statistical Learning Models

We present a fast and scalable algorithm to induce non-monotonic logic programs from statistical learning models. We reduce the problem of search for best clauses to instances of the High-Utility Itemset Mining (HUIM) problem. In the HUIM problem, feature values and their importance are treated as transactions and utilities respectively. We make use of TreeExplainer, a fast and scalable implementation of the Explainable AI tool SHAP, to extract locally important features and their weights from ensemble tree models. Our experiments with UCI standard benchmarks suggest a significant improvement in terms of classification evaluation metrics and running time of the training algorithm compared to ALEPH, a state-of-the-art Inductive Logic Programming (ILP) system.

Conversational AI : Open Domain Question Answering and Commonsense Reasoning

Our research is focused on making a human-like question answering system which can answer rationally. The distinguishing characteristic of our approach is that it will use automated common sense reasoning to truly ‘understand’ dialogues, allowing it to converse like a human. Humans often make many assumptions during conversations. We infer facts not told explicitly by using our common sense. Incorporating commonsense knowledge in a question answering system will simply make it more robust.

Distributed Machine Learning on Mobile Devices: A Survey

In recent years, mobile devices have gained increasingly development with stronger computation capability and larger storage. Some of the computation-intensive machine learning and deep learning tasks can now be run on mobile devices. To take advantage of the resources available on mobile devices and preserve users’ privacy, the idea of mobile distributed machine learning is proposed. It uses local hardware resources and local data to solve machine learning sub-problems on mobile devices, and only uploads computation results instead of original data to contribute to the optimization of the global model. This architecture can not only relieve computation and storage burden on servers, but also protect the users’ sensitive information. Another benefit is the bandwidth reduction, as various kinds of local data can now participate in the training process without being uploaded to the server. In this paper, we provide a comprehensive survey on recent studies of mobile distributed machine learning. We survey a number of widely-used mobile distributed machine learning methods. We also present an in-depth discussion on the challenges and future directions in this area. We believe that this survey can demonstrate a clear overview of mobile distributed machine learning and provide guidelines on applying mobile distributed machine learning to real applications.

An Adaptive Parareal Algorithm

In this paper, we consider the problem of accelerating the numerical simulation of time dependent problems by time domain decomposition. The available algorithms enabling such decompositions present severe efficiency limitations and are an obstacle for the solution of large scale and high dimensional problems. Our main contribution is the improvement of the parallel efficiency of the parareal in time method. The parareal method is based on combining predictions made by a numerically inexpensive solver (with coarse physics and/or coarse resolution) with corrections coming from an expensive solver (with high-fidelity physics and high resolution). At convergence, the parareal algorithm provides a solution that has the fine solver’s high-fidelity physics and high resolution In the classical version of parareal, the fine solver has a fixed high accuracy which is the major obstacle to achieve a competitive parallel efficiency. In this paper, we develop an adaptive variant of the algorithm that overcomes this obstacle. Thanks to this, the only remaining factor impacting performance becomes the cost of the coarse solver. We show both theoretically and in a numerical example that the parallel efficiency becomes very competitive when the cost of the coarse solver is small.

Distance Geometry and Data Science

Data are often represented as graphs. Many common tasks in data science are based on distances between entities. While some data science methodologies natively take graphs as their input, there are many more that take their input in vectorial form. In this survey we discuss the fundamental problem of mapping graphs to vectors, and its relation with mathematical programming. We discuss applications, solution methods, dimensional reduction techniques and some of their limits. We then present an application of some of these ideas to neural networks, showing that distance geometry techniques can give competitive performance with respect to more traditional graph-to-vector mappings.

Knowledge representation and diagnostic inference using Bayesian networks in the medical discourse

For the diagnostic inference under uncertainty Bayesian networks are investigated. The method is based on an adequate uniform representation of the necessary knowledge. This includes both generic and experience-based specific knowledge, which is stored in a knowledge base. For knowledge processing, a combination of the problem-solving methods of concept-based and case-based reasoning is used. Concept-based reasoning is used for the diagnosis, therapy and medication recommendation and evaluation of generic knowledge. Exceptions in the form of specific patient cases are processed by case-based reasoning. In addition, the use of Bayesian networks allows to deal with uncertainty, fuzziness and incompleteness. Thus, the valid general concepts can be issued according to their probability. To this end, various inference mechanisms are introduced and subsequently evaluated within the context of a developed prototype. Tests are employed to assess the classification of diagnoses by the network.

To Aid Statistical Inference, Emphasize Unconditional Descriptions of Statistics

We have elsewhere reviewed proposals to reform terminology and improve interpretations of conventional statistics by emphasizing logical and information concepts over probability concepts. We here give detailed reasons and methods for reinterpreting statistics (including but not limited to) P-values and interval estimates in unconditional terms, which describe compatibility of observations with an entire set of analysis assumptions, rather than just a narrow target hypothesis. Such reinterpretations help avoid overconfident inferences whenever there is uncertainty about the assumptions used to derive and compute the statistical results. Examples of such assumptions include not only standard statistical modeling assumptions, but also assumptions about absence of systematic errors, protocol violations, and data corruption. Unconditional descriptions introduce uncertainty about such assumptions directly into statistical presentations of results, rather than leaving that only to the informal discussion that ensues. We thus view unconditional description as a vital component of good statistical training and presentation.