Answering Fuzzy Conjunctive Queries over Finitely Valued Fuzzy Ontologies

Fuzzy Description Logics (DLs) provide a means for representing vague knowledge about an application domain. In this paper, we study fuzzy extensions of conjunctive queries (CQs) over the DL \mathcal{SROIQ} based on finite chains of degrees of truth. To answer such queries, we extend a well-known technique that reduces the fuzzy ontology to a classical one, and use classical DL reasoners as a black box. We improve the complexity of previous reduction techniques for finitely valued fuzzy DLs, which allows us to prove tight complexity results for answering certain kinds of fuzzy CQs. We conclude with an experimental evaluation of a prototype implementation, showing the feasibility of our approach.

Artificial Prediction Markets for Online Prediction of Continuous Variables-A Preliminary Report

We propose the Artificial Continuous Prediction Market (ACPM) as a means to predict a continuous real value, by integrating a range of data sources and aggregating the results of different machine learning (ML) algorithms. ACPM adapts the concept of the (physical) prediction market to address the prediction of real values instead of discrete events. Each ACPM participant has a data source, a ML algorithm and a local decision-making procedure that determines what to bid on what value. The contributions of ACPM are: (i) adaptation to changes in data quality by the use of learning in: (a) the market, which weights each market participant to adjust the influence of each on the market prediction and (b) the participants, which use a Q-learning based trading strategy to incorporate the market prediction into their subsequent predictions, (ii) resilience to a changing population of low- and high-performing participants. We demonstrate the effectiveness of ACPM by application to an influenza-like illnesses data set, showing ACPM out-performs a range of well-known regression models and is resilient to variation in data source quality.

FactorBase: SQL for Learning A Multi-Relational Graphical Model

We describe FactorBase, a new SQL-based framework that leverages a relational database management system to support multi-relational model discovery. A multi-relational statistical model provides an integrated analysis of the heterogeneous and interdependent data resources in the database. We adopt the BayesStore design philosophy: statistical models are stored and managed as first-class citizens inside a database. Whereas previous systems like BayesStore support multi-relational inference, FactorBase supports multi-relational learning. A case study on six benchmark databases evaluates how our system supports a challenging machine learning application, namely learning a first-order Bayesian network model for an entire database. Model learning in this setting has to examine a large number of potential statistical associations across data tables. Our implementation shows how the SQL constructs in FactorBase facilitate the fast, modular, and reliable development of highly scalable model learning systems.

Machine Wald

The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed \emph{by humans} because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Indeed enabling computers to \emph{think} as \emph{humans} have the ability to do when faced with uncertainty is challenging in several major ways: (1) Finding optimal statistical models remains to be formulated as a well posed problem when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (2) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. With this purpose, this paper explores the foundations of a rigorous framework for the scientific computation of optimal statistical estimators/models and reviews their connections with Decision Theory, Machine Learning, Bayesian Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty Quantification and Information Based Complexity.

Measuring Word Significance using Distributed Representations of Words

Distributed representations of words as real-valued vectors in a relatively low-dimensional space aim at extracting syntactic and semantic features from large text corpora. A recently introduced neural network, named word2vec (Mikolov et al., 2013a; Mikolov et al., 2013b), was shown to encode semantic information in the direction of the word vectors. In this brief report, it is proposed to use the length of the vectors, together with the term frequency, as measure of word significance in a corpus. Experimental evidence using a domain-specific corpus of abstracts is presented to support this proposal. A useful visualization technique for text corpora emerges, where words are mapped onto a two-dimensional plane and automatically ranked by significance.

Normalized Hierarchical SVM

We present improved methods of using structured SVMs in a large-scale hierarchical classification problem, that is when labels are leaves, or sets of leaves, in a tree or a DAG. We examine the need to normalize both the regularization and the margin and show how doing so significantly improves performance, including allowing achieving state-of-the-art results where unnormalized structured SVMs do not perform better than flat models. We also describe a further extension of hierarchical SVMs that highlight the connection between hierarchical SVMs and matrix factorization models.

Particle Gibbs Split-Merge Sampling for Bayesian Inference in Mixture Models

This paper presents a new Markov chain Monte Carlo method to sample from the posterior distribution of conjugate mixture models. This algorithm relies on a flexible split-merge procedure built using the particle Gibbs sampler. Contrary to available split-merge procedures, the resulting so-called Particle Gibbs Split-Merge sampler does not require the computation of a complex acceptance ratio, is simple to implement using existing sequential Monte Carlo libraries and can be parallelized. We investigate its performance experimentally on synthetic problems as well as on geolocation and cancer genomics data. In all these examples, the particle Gibbs split-merge sampler outperforms state-of-the-art split-merge methods by up to an order of magnitude for a fixed computational complexity.

Projection predictive variable selection using Stan+R

This document is additional material to our previous study comparing several strategies for variable subset selection. Our recommended approach was to fit the full model with all the candidate variables and best possible prior information, and perform the variable selection using the projection predictive framework. Here we give an example of performing such an analysis, using Stan for fitting the model, and R for the variable selection.

Type-Constrained Representation Learning in Knowledge Graphs

Large knowledge graphs increasingly add value to various applications that require machines to recognize and understand queries and their semantics, as in search or question answering systems. Latent variable models have increasingly gained attention for the statistical modeling of knowledge graphs, showing promising results in tasks related to knowledge graph completion and cleaning. Besides storing facts about the world, schema-based knowledge graphs are backed by rich semantic descriptions of entities and relation-types that allow machines to understand the notion of things and their semantic relationships. In this work, we study how type-constraints can generally support the statistical modeling with latent variable models. More precisely, we integrated prior knowledge in form of type-constraints in various state of the art latent variable approaches. Our experimental results show that prior knowledge on relation-types significantly improves these models up to 77% in link-prediction tasks. The achieved improvements are especially prominent when a low model complexity is enforced, a crucial requirement when these models are applied to very large datasets. Unfortunately, type-constraints are neither always available nor always complete e.g., they can become fuzzy when entities lack proper typing. We also show that in these cases, it can be beneficial to apply a local closed-world assumption that approximates the semantics of relation-types based on observations made in the data.

Web Search Result Clustering based on Heuristic Search and k-means

Giving user a simple and well organized web search result has been a topic of active information Retrieval (IR) research. Irrespective of how small or ambiguous a query is, a user always wants the desired result on the first display of an IR system. Clustering of an IR system result can render a way, which fulfills the actual information need of a user. In this paper, an approach to cluster an IR system result is presented.The approach is a combination of heuristics and k-means technique using cosine similarity. Our heuristic approach detects the initial value of k for creating initial centroids. This eliminates the problem of external specification of the value k, which may lead to unwanted result if wrongly specified. The centroids created in this way are more specific and meaningful in the context of web search result. Another advantage of the proposed method is the removal of the objective means function of k-means which makes cluster sizes same. The end result of the proposed approach consists of different clusters of documents having different sizes.

10-tough chordal graphs are hamiltonian

A Combinatorial Model of Interference in Frequency Hopping Schemes

A method for determining the mod-$p^k$ behaviour of recursive sequences

A note on convex characters and Fibonacci numbers

A note on independence complexes of chordal graphs and dismantling

A Practical Guide to CNNs and Fisher Vectors for Image Instance Retrieval

A revised Moore bound for partially directed graphs

Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application

Additive bases in groups

Algebraic structures on cohomology of configuration spaces of manifolds with flows

Anisotropic spin relaxation in $n$-GaAs from strong inhomogeneous hyperfine fields produced by the dynamical polarization of nuclei

Bounded solutions, $L^p (p>1)$ solutions and $L^1$ solutions for one-dimensional BSDEs under general assumptions

Dynamics of an Adaptive Randomly Reinforced Urn

Efficient counting with optimal resilience

Energy Structure of Optimal Positional Strategies in Mean Payoff Games

Generic Criticality in Ecological and Neuronal Networks

Interacting particle systems with sticky boundary

Majority Bootstrap Percolation on $G(n,p)$

Non-commutative lattice problems

On diagonal lower bound of Markov kernel from $L^2$ analyticity

On Maximal Layers Of Random Orders

On some Gaussian Bernstein processes in RN and the periodic Ornstein-Uhlenbeck process

On the center of the Hurwitz graph

On the range of the Campanino and Pétritis random walk

On the safe set of Cartesian product of two complete graphs

Order Selection of Autoregressive Processes using Bridge Criterion

Pairs of dot products in finite fields and rings

Primal-Dual Active-Set Methods for Isotonic Regression and Trend Filtering

Probabilistic Power Flow Computation via Low-Rank and Sparse Tensor Recovery

Realistic Tunneling States for the Magnetic Effects in Non-Metallic Real Glasses

Relative Compressed Suffix Trees

Sequential Monte Carlo with Parameter Learning for non-Markovian State-Space Models

Simulating Brain Reaction to Methamphetamine Regarding Consumer Personality

Stochastic Eternal Inflation in a Bianchi Type I Universe

Testing the Sphericity of a covariance matrix when the dimension is much larger than the sample size

The $k$-resultant modulus set problem on algebraic varieties over finite fields

The Thue choice number versus the Thue chromatic number of graphs

The Unit Bar Visibility Number of a Graph

Time Versus Cost Tradeoffs for Deterministic Rendezvous in Networks

Topology Control of wireless sensor network using Quantum Inspired Genetic algorithm

Two Particle-in-Grid Realisations on Spacetrees

Two-Tier Prediction of Solar Power Generation with Limited Sensing Resource

Unified Acceleration Method for Packing and Covering Problems via Diameter Reduction

Zero-one laws for connectivity in inhomogeneous random key graphs