A Light Sliding-Window Part-of-Speech Tagger for the Apertium Free/Open-Source Machine Translation Platform

This paper describes a free/open-source implementation of the light sliding-window (LSW) part-of-speech tagger for the Apertium free/open-source machine translation platform. Firstly, the mechanism and training process of the tagger are reviewed, and a new method for incorporating linguistic rules is proposed. Secondly, experiments are conducted to compare the performances of the tagger under different window settings, with or without Apertium-style ‘forbid’ rules, with or without Constraint Grammar, and also with respect to the traditional HMM tagger in Apertium.

A Study Investigating Typical Concepts and Guidelines for Ontology Building

In semantic technologies, the shared common understanding of the structure of information among artifacts (people or software agents) can be realized by building an ontology. To do this, it is imperative for an ontology builder to answer several questions: a) what are the main components of an ontology? b) How an ontology look likes and how it works? c) Verify if it is required to consider reusing existing ontologies or not? c) What is the complexity of the ontology to be developed? d) What are the principles of ontology design and development? e) How to evaluate an ontology? This paper answers all the key questions above. The aim of this paper is to present a set of guiding principles to help ontology developers and also inexperienced users to answer such questions.

Accelerating Optimization via Adaptive Prediction

We present a powerful general framework for designing data-dependent optimization algorithms, building upon and unifying recent techniques in adaptive regularization, optimistic gradient predictions, and problem-dependent randomization. We first present a series of new regret guarantees that hold at any time and under very minimal assumptions, and then show how different relaxations recover existing algorithms, both basic as well as more recent sophisticated ones. Finally, we show how combining adaptivity, optimism, and problem-dependent randomization can guide the design of algorithms that benefit from more favorable guarantees than recent state-of-the-art methods.

Algorithmic statistics, prediction and machine learning

Algorithmic statistics considers the following problem: given a binary string x (e.g., some experimental data), find a ‘good’ explanation of this data. It uses algorithmic information theory to define formally what is a good explanation. In this paper we extend this framework in two directions. First, the explanations are not only interesting in themselves but also used for prediction: we want to know what kind of data we may reasonably expect in similar situations (repeating the same experiment). We show that some kind of hierarchy can be constructed both in terms of algorithmic statistics and using the notion of a priori probability, and these two approaches turn out to be equivalent. Second, a more realistic approach that goes back to machine learning theory, assumes that we have not a single data string x but some set of ‘positive examples’ x_1,\ldots,x_l that all belong to some unknown set A, a property that we want to learn. We want this set A to contain all positive examples and to be as small and simple as possible. We show how algorithmic statistic can be extended to cover this situation.

Bootstrap Nonlinear Regression Application in a Design of an Experiment Data for Fewer Sample Size

This paper reports on application of bootstrap nonlinear regression method to a design of an experiment dataset with fewer experimental runs. Design with desired properties was augmented and verified using graphical techniques. The augmented design with the desired properties benefited the accuracy of the approximated function used. The computation power of R-language and SAS for computing nonlinear function and bootstrap was also compared.

Class Association Rules Mining based Rough Set Method

This paper investigates the mining of class association rules with rough set approach. In data mining, an association occurs between two set of elements when one element set happen together with another. A class association rule set (CARs) is a subset of association rules with classes specified as their consequences. We present an efficient algorithm for mining the finest class rule set inspired form Apriori algorithm, where the support and confidence are computed based on the elementary set of lower approximation included in the property of rough set theory. Our proposed approach has been shown very effective, where the rough set approach for class association discovery is much simpler than the classic association method.

Exploring Query Categorisation for Query Expansion: A Study

The vocabulary mismatch problem is one of the important challenges facing traditional keyword-based Information Retrieval Systems. The aim of query expansion (QE) is to reduce this query-document mismatch by adding related or synonymous words or phrases to the query. Several existing query expansion algorithms have proved their merit, but they are not uniformly beneficial for all kinds of queries. Our long-term goal is to formulate methods for applying QE techniques tailored to individual queries, rather than applying the same general QE method to all queries. As an initial step, we have proposed a taxonomy of query classes (from a QE perspective) in this report. We have discussed the properties of each query class with examples. We have also discussed some QE strategies that might be effective for each query category. In future work, we intend to test the proposed techniques using standard datasets, and to explore automatic query categorisation methods.

Fast and Simple PCA via Convex Optimization

The problem of principle component analysis (PCA) is traditionally solved by spectral or algebraic methods. We show how PCA could be formulated as a sequence of {\it convex} optimization problems. This gives rise to a new efficient method for computing the PCA based on recent advances in stochastic methods for convex optimization. In particular, we present running times that improve over the current state-of-the-art.

Learning to Hash for Indexing Big Data – A Survey

The explosive growth in big data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, Approximate Nearest Neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown insufficient in many real-world applications. As a remedy, new approaches incorporating data-driven learning methods in development of advanced hash functions have emerged. Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions. Importantly, the learned hash codes are able to preserve the proximity of neighboring data in the original feature spaces in the hash code spaces. The goal of this paper is to provide readers with systematic understanding of insights, pros and cons of the emerging techniques. We provide a comprehensive survey of the learning to hash framework and representative techniques of various types, including unsupervised, semi-supervised, and supervised. In addition, we also summarize recent hashing approaches utilizing the deep learning models. Finally, we discuss the future direction and trends of research in this area.

‘Oddball SGD’: Novelty Driven Stochastic Gradient Descent for Training Deep Neural Networks

Stochastic Gradient Descent (SGD) is arguably the most popular of the machine learning methods applied to training deep neural networks (DNN) today. It has recently been demonstrated that SGD can be statistically biased so that certain elements of the training set are learned more rapidly than others. In this article, we place SGD into a feedback loop whereby the probability of selection is proportional to error magnitude. This provides a novelty-driven oddball SGD process that learns more rapidly than traditional SGD by prioritising those elements of the training set with the largest novelty (error). In our DNN example, oddball SGD trains some 50x faster than regular SGD.

Optimal method in multiple regression with structural changes

In this paper, we consider an estimation problem of the regression coefficients in multiple regression models with several unknown change-points. Under some realistic assumptions, we propose a class of estimators which includes as a special cases shrinkage estimators (SEs) as well as the unrestricted estimator (UE) and the restricted estimator (RE). We also derive a more general condition for the SEs to dominate the UE. To this end, we generalize some identities for the evaluation of the bias and risk functions of shrinkage-type estimators. As illustrative example, our method is applied to the ‘gross domestic product’ data set of 10 countries whose USA, Canada, UK, France and Germany. The simulation results corroborate our theoretical findings.

Sparse Fisher’s Linear Discriminant Analysis for Partially Labeled Data

Classification is an important tool with many useful applications. Among the many classification methods, Fisher’s Linear Discriminant Analysis (LDA) is a traditional model-based approach which makes use of the covariance information. However, in the high-dimensional, low-sample size setting, LDA cannot be directly deployed because the sample covariance is not invertible. While there are modern methods designed to deal with high-dimensional data, they may not fully use the covariance information as LDA does. Hence in some situations, it is still desirable to use a model-based method such as LDA for classification. This article exploits the potential of LDA in more complicated data settings. In many real applications, it is costly to manually place labels on observations; hence it is often that only a small portion of labeled data is available while a large number of observations are left without a label. It is a great challenge to obtain good classification performance through the labeled data alone, especially when the dimension is greater than the size of the labeled data. In order to overcome this issue, we propose a semi-supervised sparse LDA classifier to take advantage of the seemingly useless unlabeled data. They provide additional information which helps to boost the classification performance in some situations. A direct estimation method is used to reconstruct LDA and achieve the sparsity; meanwhile we employ the difference-convex algorithm to handle the non-convex loss function associated with the unlabeled data. Theoretical properties of the proposed classifier are studied. Our simulated examples help to understand when and how the information extracted from the unlabeled data can be useful. A real data example further illustrates the usefulness of the proposed method.

Streaming Verification in Data Analysis

Streaming interactive proofs (SIPs) are a framework to reason about outsourced computation, where a data owner (the verifier) outsources a computation to the cloud (the prover), but wishes to verify the correctness of the solution provided by the cloud service. In this paper we present streaming interactive proofs for problems in data analysis. We present protocols for clustering and shape fitting problems, as well as an improved protocol for rectangular matrix multiplication. The latter can in turn be used to verify k eigenvectors of a (streamed) n \times n matrix. In general our solutions use polylogarithmic rounds of communication and polylogarithmic total communication and verifier space. For special cases (when optimality certificates can be verified easily), we present constant round protocols with similar costs. For rectangular matrix multiplication and eigenvector verification, our protocols work in the more restricted annotated data streaming model, and use sublinear (but not polylogarithmic) communication.

TransA: An Adaptive Approach for Knowledge Graph Embedding

Knowledge representation is a major topic in AI, and many studies attempt to represent entities and relations of knowledge base in a continuous vector space. Among these attempts, translation-based methods build entity and relation vectors by minimizing the translation loss from a head entity to a tail one. In spite of the success of these methods, translation-based methods also suffer from the oversimplified loss metric, and are not competitive enough to model various and complex entities/relations in knowledge bases. To address this issue, we propose \textbf{TransA}, an adaptive metric approach for embedding, utilizing the metric learning ideas to provide a more flexible embedding method. Experiments are conducted on the benchmark datasets and our proposed method makes significant and consistent improvements over the state-of-the-art baselines.

TransG : A Generative Mixture Model for Knowledge Graph Embedding

Recently, knowledge graph embedding, which projects symbolic entities and relations into continuous vector space, has become a new, hot topic in artificial intelligence. This paper addresses a new issue of \textbf{multiple relation semantics} that a relation may have multiple meanings revealed by the entity pairs associated with the corresponding triples, and proposes a novel Gaussian mixture model for embedding, \textbf{TransG}. The new model can discover latent semantics for a relation and leverage a mixture of relation component vectors for embedding a fact triple. To the best of our knowledge, this is the first generative model for knowledge graph embedding, which is able to deal with multiple relation semantics. Extensive experiments show that the proposed model achieves substantial improvements against the state-of-the-art baselines.

A High-Order Radial Basis Function (RBF) Leray Projection Method for the Solution of the Incompressible Unsteady Stokes Equations

A Nonlinear Analogue of May-Wigner Instability Transition

A Parameterized Algorithm for Mixed Cut

A Sanov-type theorem for empirical measures associated with the surface and cone measures on $\ell^{p}$ spheres

Approximate performance analysis of generalized join the shortest queue routing

Backdoors into Heterogeneous Classes of SAT and CSP

Building a Pilot Software Quality-in-Use Benchmark Dataset

Challenges and Considerations for Utilizing Burst Buffers in High-Performance Computing

Colored graphs without colorful cycles

Computational evolution of decision-making strategies

Cycle structure of autotopisms of quasigroups and Latin squares

Distributed Estimation and Inference with Statistical Guarantees

Ear-decompositions and the complexity of the matching polytope

Energy saving in smart homes based on consumer behaviour: A case study

Evaluation of Protein-protein Interaction Predictors with Noisy Partially Labeled Data Sets

Finding Two Edge-Disjoint Paths with Length Constraints

FPTAS for Hardcore and Ising Models on Hypergraphs

How permutations displace points and stretch intervals

Improvements on the density of maximal 1-planar graphs

Large Cross-free sets in Steiner triple systems

Learning from Synthetic Data Using a Stacked Multichannel Autoencoder

Martin kernels for Markov processes with jumps

Maximum likelihood estimators uniformly minimize distribution variance among distribution unbiased estimators in exponential families

On Extension of Regular Graphs

On Reconstructability of Quadratic Utility Functions from the Iterations in Gradient Methods

On the asymptotic distribution of parameters in random weighted staircase tableaux

Overlapping latin subsquares and full products

Partitioning 2-edge-colored graphs by monochromatic paths and cycles

Periods in missing lengths of rainbow cycles

Permuting longitudinal data despite all the dependencies

Phantom distribution functions for some stationary sequences

Phase diagram of the 3D Anderson model for uncorrelated speckle potentials

Pointwise adaptive estimation of a multivariate density under independence hypothesis

Proceedings Thirteenth International Workshop on the ACL2 Theorem Prover and Its Applications

Ramsey number of a connected triangle matching

Randomised enumeration of small witnesses using a decision oracle

Searching for a superlinear lower bounds for the Maximum Consecutive Subsums Problem and the (min,+)-convolution

Spatio-temporal visualization of light transport in complex photonic structures

Specification-based Synthesis of Distributed Self-Stabilizing Protocols

Statistical Inference for Matrix-variate Gaussian Graphical Models and False Discovery Rate Control

Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses

Subsampling for General Statistics under Long Range Dependence

The maximum size of a non-trivial intersecting uniform family that is not a subfamily of the Hilton–Milner family

User-Curated Image Collections: Modeling and Recommendation

Vertex covers by monochromatic pieces – A survey of results and problems

Visual Generalized Coordinates