The Resilient Propagation (Rprop) algorithm has been very popular for backpropagation training of multilayer feed-forward neural networks in various applications. The standard Rprop however encounters difficulties in the context of deep neural networks as typically happens with gradient-based learning algorithms. In this paper, we propose a modification of the Rprop that combines standard Rprop steps with a special drop out technique. We apply the method for training Deep Neural Networks as standalone components and in ensemble formulations. Results on the MNIST dataset show that the proposed modification alleviates standard Rprop’s problems demonstrating improved learning speed and accuracy.
This paper addresses the problem of scalable optimization for L1-regularized conditional Gaussian graphical models. Conditional Gaussian graphical models generalize the well-known Gaussian graphical models to conditional distributions to model the output network influenced by conditioning input variables. While highly scalable optimization methods exist for sparse Gaussian graphical model estimation, state-of-the-art methods for conditional Gaussian graphical models are not efficient enough and more importantly, fail due to memory constraints for very large problems. In this paper, we propose a new optimization procedure based on a Newton method that efficiently iterates over two sub-problems, leading to drastic improvement in computation time compared to the previous methods. We then extend our method to scale to large problems under memory constraints, using block coordinate descent to limit memory usage while achieving fast convergence. Using synthetic and genomic data, we show that our methods can solve one million dimensional problems to high accuracy in a little over a day on a single machine.
We propose Macau, a powerful and flexible Bayesian factorization method for heterogeneous data. Our model can factorize any set of entities and relations that can be represented by a relational model, including tensors and also multiple relations for each entity. Macau can also incorporate side information, specifically entity and relation features, which are crucial for predicting sparsely observed relations. Macau scales to millions of entity instances, hundred millions of observations, and sparse entity features with millions of dimensions. To achieve the scale up, we specially designed sampling procedure for entity and relation features that relies primarily on noise injection in linear regressions. We show performance and advanced features of Macau in a set of experiments, including challenging drug-protein activity prediction task.
Traditional Kalman filter (KF) is derived under the well-known minimum mean square error (MMSE) criterion, which is optimal under Gaussian assumption. However, when the signals are non-Gaussian, especially when the system is disturbed by some heavy-tailed impulsive noises, the performance of KF will deteriorate seriously. To improve the robustness of KF against impulsive noises, we propose in this work a new Kalman filter, called the maximum correntropy Kalman filter (MCKF), which adopts the robust maximum correntropy criterion (MCC) as the optimality criterion, instead of using the MMSE. Similar to the traditional KF, the state mean and covariance matrix propagation equations are used to give prior estimations of the state and covariance matrix in MCKF. A novel fixed-point algorithm is then used to update the posterior estimations. A sufficient condition that guarantees the convergence of the fixed-point algorithm is given. Illustration examples are presented to demonstrate the effectiveness and robustness of the new algorithm.
The Singleton Property (SP) approach has been proposed for representing and querying metadata about RDF triples such as provenance, time, location, and evidence. In this approach, one singleton property is created to uniquely represent a relationship in a particular context, and in general, generates a large property hierarchy in the schema. It has become the subject of important questions from Semantic Web practitioners. Can an existing reasoner recognize the singleton property triples? And how? If the singleton property triples describe a data triple, then how can a reasoner infer this data triple from the singleton property triples? Or would the large property hierarchy affect the reasoners in some way? We address these questions in this paper and present our study about the reasoning aspects of the singleton properties. We propose a simple mechanism to enable existing reasoners to recognize the singleton property triples, as well as to infer the data triples described by the singleton property triples. We evaluate the effect of the singleton property triples in the reasoning processes by comparing the performance on RDF datasets with and without singleton properties. Our evaluation uses as benchmark the LUBM datasets and the LUBM-SP datasets derived from LUBM with temporal information added through singleton properties.
The advances of the Linked Open Data (LOD) initiative are giving rise to a more structured Web of data. Indeed, a few datasets act as hubs (e.g., DBpedia) connecting many other datasets. They also made possible new Web services for entity detection inside plain text (e.g., DBpedia Spotlight), thus allowing for new applications that can benefit from a combination of the Web of documents and the Web of data. To ease the emergence of these new applications, we propose a query-biased algorithm (LDRANK) for the ranking of web of data resources with associated textual data. Our algorithm combines link analysis with dimensionality reduction. We use crowdsourcing for building a publicly available and reusable dataset for the evaluation of query-biased ranking of Web of data resources detected in Web pages. We show that, on this dataset, LDRANK outperforms the state of the art. Finally, we use this algorithm for the construction of semantic snippets of which we evaluate the usefulness with a crowdsourcing-based approach.
This article proposes a convenient tool for decoding the output of neural networks trained by connectionist temporal classification (CTC) for handwritten text recognition. We use regular expressions to describe the complex structures we are expecting in the writing. We analyze theoretically which calculations are relevant and which can be avoided. A great speed-up results from an approximation which is also analyzed theoretically and confirmed to work correctly by experiments. The variety of applications reaches from keyword spotting to full text recognition. We refer to applications where we successfully integrated the proposed decoder.
Compounding is a highly productive word-formation process in some languages that is often problematic for natural language processing applications. In this paper, we investigate whether distributional semantics in the form of word embeddings can enable a deeper, i.e., more knowledge-rich, processing of compounds than the standard string-based methods. We present an unsupervised approach that exploits regularities in the semantic vector space (based on analogies such as ‘bookshop is to shop as bookshelf is to shelf’) to produce compound analyses of high quality. A subsequent compound splitting algorithm based on these analyses is highly effective, particularly for ambiguous compounds. German to English machine translation experiments show that this semantic analogy-based compound splitter leads to better translations than a commonly used frequency-based method.
In this work, we study distance metric learning (DML) for high dimensional data. A typical approach for DML with high dimensional data is to perform the dimensionality reduction first before learning the distance metric. The main shortcoming of this approach is that it may result in a suboptimal solution due to the subspace removed by the dimensionality reduction method. In this work, we present a dual random projection frame for DML with high dimensional data that explicitly addresses the limitation of dimensionality reduction for DML. The key idea is to first project all the data points into a low dimensional space by random projection, and compute the dual variables using the projected vectors. It then reconstructs the distance metric in the original space using the estimated dual variables. The proposed method, on one hand, enjoys the light computation of random projection, and on the other hand, alleviates the limitation of most dimensionality reduction methods. We verify both empirically and theoretically the effectiveness of the proposed algorithm for high dimensional DML.