Adapting Resilient Propagation for Deep Learning

The Resilient Propagation (Rprop) algorithm has been very popular for backpropagation training of multilayer feed-forward neural networks in various applications. The standard Rprop however encounters difficulties in the context of deep neural networks as typically happens with gradient-based learning algorithms. In this paper, we propose a modification of the Rprop that combines standard Rprop steps with a special drop out technique. We apply the method for training Deep Neural Networks as standalone components and in ensemble formulations. Results on the MNIST dataset show that the proposed modification alleviates standard Rprop’s problems demonstrating improved learning speed and accuracy.

Large-Scale Optimization Algorithms for Sparse Conditional Gaussian Graphical Models

This paper addresses the problem of scalable optimization for L1-regularized conditional Gaussian graphical models. Conditional Gaussian graphical models generalize the well-known Gaussian graphical models to conditional distributions to model the output network influenced by conditioning input variables. While highly scalable optimization methods exist for sparse Gaussian graphical model estimation, state-of-the-art methods for conditional Gaussian graphical models are not efficient enough and more importantly, fail due to memory constraints for very large problems. In this paper, we propose a new optimization procedure based on a Newton method that efficiently iterates over two sub-problems, leading to drastic improvement in computation time compared to the previous methods. We then extend our method to scale to large problems under memory constraints, using block coordinate descent to limit memory usage while achieving fast convergence. Using synthetic and genomic data, we show that our methods can solve one million dimensional problems to high accuracy in a little over a day on a single machine.

Macau: Scalable Bayesian Multi-relational Factorization with Side Information using MCMC

We propose Macau, a powerful and flexible Bayesian factorization method for heterogeneous data. Our model can factorize any set of entities and relations that can be represented by a relational model, including tensors and also multiple relations for each entity. Macau can also incorporate side information, specifically entity and relation features, which are crucial for predicting sparsely observed relations. Macau scales to millions of entity instances, hundred millions of observations, and sparse entity features with millions of dimensions. To achieve the scale up, we specially designed sampling procedure for entity and relation features that relies primarily on noise injection in linear regressions. We show performance and advanced features of Macau in a set of experiments, including challenging drug-protein activity prediction task.

Maximum Correntropy Kalman Filter

Traditional Kalman filter (KF) is derived under the well-known minimum mean square error (MMSE) criterion, which is optimal under Gaussian assumption. However, when the signals are non-Gaussian, especially when the system is disturbed by some heavy-tailed impulsive noises, the performance of KF will deteriorate seriously. To improve the robustness of KF against impulsive noises, we propose in this work a new Kalman filter, called the maximum correntropy Kalman filter (MCKF), which adopts the robust maximum correntropy criterion (MCC) as the optimality criterion, instead of using the MMSE. Similar to the traditional KF, the state mean and covariance matrix propagation equations are used to give prior estimations of the state and covariance matrix in MCKF. A novel fixed-point algorithm is then used to update the posterior estimations. A sufficient condition that guarantees the convergence of the fixed-point algorithm is given. Illustration examples are presented to demonstrate the effectiveness and robustness of the new algorithm.

On Reasoning with RDF Statements about Statements using Singleton Property Triples

The Singleton Property (SP) approach has been proposed for representing and querying metadata about RDF triples such as provenance, time, location, and evidence. In this approach, one singleton property is created to uniquely represent a relationship in a particular context, and in general, generates a large property hierarchy in the schema. It has become the subject of important questions from Semantic Web practitioners. Can an existing reasoner recognize the singleton property triples? And how? If the singleton property triples describe a data triple, then how can a reasoner infer this data triple from the singleton property triples? Or would the large property hierarchy affect the reasoners in some way? We address these questions in this paper and present our study about the reasoning aspects of the singleton properties. We propose a simple mechanism to enable existing reasoners to recognize the singleton property triples, as well as to infer the data triples described by the singleton property triples. We evaluate the effect of the singleton property triples in the reasoning processes by comparing the performance on RDF datasets with and without singleton properties. Our evaluation uses as benchmark the LUBM datasets and the LUBM-SP datasets derived from LUBM with temporal information added through singleton properties.

Ranking Entities in the Age of Two Webs, an Application to Semantic Snippets

The advances of the Linked Open Data (LOD) initiative are giving rise to a more structured Web of data. Indeed, a few datasets act as hubs (e.g., DBpedia) connecting many other datasets. They also made possible new Web services for entity detection inside plain text (e.g., DBpedia Spotlight), thus allowing for new applications that can benefit from a combination of the Web of documents and the Web of data. To ease the emergence of these new applications, we propose a query-biased algorithm (LDRANK) for the ranking of web of data resources with associated textual data. Our algorithm combines link analysis with dimensionality reduction. We use crowdsourcing for building a publicly available and reusable dataset for the evaluation of query-biased ranking of Web of data resources detected in Web pages. We show that, on this dataset, LDRANK outperforms the state of the art. Finally, we use this algorithm for the construction of semantic snippets of which we evaluate the usefulness with a crowdsourcing-based approach.

Regular expressions for decoding of neural network outputs

This article proposes a convenient tool for decoding the output of neural networks trained by connectionist temporal classification (CTC) for handwritten text recognition. We use regular expressions to describe the complex structures we are expecting in the writing. We analyze theoretically which calculations are relevant and which can be avoided. A great speed-up results from an approximation which is also analyzed theoretically and confirmed to work correctly by experiments. The variety of applications reaches from keyword spotting to full text recognition. We refer to applications where we successfully integrated the proposed decoder.

Splitting Compounds by Semantic Analogy

Compounding is a highly productive word-formation process in some languages that is often problematic for natural language processing applications. In this paper, we investigate whether distributional semantics in the form of word embeddings can enable a deeper, i.e., more knowledge-rich, processing of compounds than the standard string-based methods. We present an unsupervised approach that exploits regularities in the semantic vector space (based on analogies such as ‘bookshop is to shop as bookshelf is to shelf’) to produce compound analyses of high quality. A subsequent compound splitting algorithm based on these analyses is highly effective, particularly for ambiguous compounds. German to English machine translation experiments show that this semantic analogy-based compound splitter leads to better translations than a commonly used frequency-based method.

Towards Making High Dimensional Distance Metric Learning Practical

In this work, we study distance metric learning (DML) for high dimensional data. A typical approach for DML with high dimensional data is to perform the dimensionality reduction first before learning the distance metric. The main shortcoming of this approach is that it may result in a suboptimal solution due to the subspace removed by the dimensionality reduction method. In this work, we present a dual random projection frame for DML with high dimensional data that explicitly addresses the limitation of dimensionality reduction for DML. The key idea is to first project all the data points into a low dimensional space by random projection, and compute the dual variables using the projected vectors. It then reconstructs the distance metric in the original space using the estimated dual variables. The proposed method, on one hand, enjoys the light computation of random projection, and on the other hand, alleviates the limitation of most dimensionality reduction methods. We verify both empirically and theoretically the effectiveness of the proposed algorithm for high dimensional DML.

$1$-cohomology of simplicial amalgams of groups

A continuous model for systems of complexity 2 on simple abelian groups

A local lemma via entropy compression

A Multivariate Cure Model for Left- and Right-Censored Data with Application to Colorectal Cancer Screening Patterns

A tetrahedral space-filling curve for non-conforming adaptive meshes

Adaptive Geostatistical Design and Analysis for Sequential Prevalence Surveys

Adaptive Lookup for Unstructured Peer-to-Peer Overlays

Bayesian Structured Sparsity Priors for EEG Source Localization Technical Report

Bounding the Clique-Width of $H$-free Split Graphs

Caratheodory’s Theorem in Depth

Circular coloring of signed graphs

Classification of vertex-transitive cubic partial cubes

Decentralized gradient algorithm for solution of a linear equation

Dependency length minimization: Puzzles and Promises

Dynamic Poisson Factorization

Efficiency of Z-estimators indexed by the objective functions

Efficient Kernel Fusion Techniques for Massive Video Data Analysis on GPGPUs

Exponential Family Matrix Completion under Structural Constraints

Finite size scaling bounds on many-body localized phase transitions

Fixed points and cycle structure of random permutations

Flexible results for quadratic forms with applications to variance components estimation

Functional central limit theorem for the interface of the multitype contact process

Kernelized Deep Convolutional Neural Network for Describing Complex Images

Large Communities in a scale-free network

Large-scale analysis of Zipf’s law in English texts

Learning without Recall by Random Walks on Directed Graphs

Linear Probing with 5-Independent Hashing

Marginal integration $M-$estimators for additive models

Maximally Persistent Cycles in Random Geometric Complexes

Modeling and interpolation of the ambient magnetic field by Gaussian processes

Modularity and the Spread of Perturbations in Complex Dynamical Systems

Nonparametric density estimation by histogram trend filtering

Nonparametric estimation of a mixing distribution for a family of linear stochastic dynamical systems

Nonsymmetric Askey-Wilson polynomials and $Q$-polynomial distance-regular graphs

On enumeration of a class of maps on Klein bottle

On the cop number of generalized Petersen graphs

On Varieties of Doubly Robust Estimators Under Missing Not at Random With an Ancillary Variable

Open Access and Discovery Tools: How do Primo Libraries Manage Green Open Access Collections?

Precise Phase Transition of Total Variation Minimization

Qualitative and Quantitative Stability Analysis of Penta-rhythmic Circuits

Quantum gate learning in engineered qubit networks: Toffoli gate with always-on interactions

Random matrix techniques in quantum information theory

Regional and temporal characteristics of bovine tuberculosis of cattle in Great Britain

Sequential Selection of a Monotone Subsequence from a Random Permutation

Sparse Multinomial Logistic Regression via Approximate Message Passing

Stable Nash Equilibria in the Gale-Shapley Matching Game

Subdivisions in the Robber Locating Game

The fitness of the strongest individual on the subcritical GMS model

The List Distinguishing Number Equals the Distinguishing Number for Interval Graphs

The Shape of Data and Probability Measures

Toward the Combinatorial Limit Theory of Free Words

Voted Kernel Regularization

When are Kalman-filter restless bandits indexable?