• Efficient Thresholded Correlation using Truncated Singular Value Decomposition
• Two coloring problems on matrix graphs
• Asymptotic properties of the derivative of self-intersection local time of fractional Brownian motion
• Invariance of Qubit-Qutrit Separability Probabilities over Bloch Radii of Qubit and Qutrit Subsystems
• Revealing the Mechanism of the Viscous-to-Elastic Crossover in Liquids
• Perfect Matchings in Hypergraphs and the Erdős matching conjecture
• Combinatorial solutions to integrable hierarchies
• Disordered double Weyl node
• Heuristic algorithms for finding distribution reducts in probabilistic rough set model
• Computing the $L_1$ Geodesic Diameter and Center of a Polygonal Domain
• Refined Error Bounds for Several Learning Algorithms
• SR-Clustering: Semantic Regularized Clustering for Egocentric Photo Streams Segmentation
• Weighted geometric distribution with a new characterisation of geometric distribution
• Combinatorial and Probabilistic Formulae for Divided Symmetrization
• Thick Points of High-Dimensional Gaussian Free Fields
• Shell polynomials and dual birth-death processes
• Hedging of covered options with linear market impact and gamma constraint
• Linear Eigenvalue Statistics: An Indicator Ensemble Design for Situation Awareness of Power Systems
• On tensor products of CSS Codes
• Estimation and clustering in a semiparametric Poisson process stochastic block model for longitudinal networks
• Move from Perturbed scheme to exponential weighting average
• Stochastic simulators based optimization by Gaussian process metamodels -Application to maintenance investments planning issues Short title: Metamodel-based optimization of stochastic simulators
• Improved hypothesis testing in a general multivariate elliptical model
• Estimating the conditional density by histogram type estimators and model selection
• Implementation of deep learning algorithm for automatic detection of brain tumors using intraoperative IR-thermal mapping data
• Coherence-resonance chimeras in a network of excitable elements
• Ramifications of Hurwitz theory, KP integrability and quantum curves
• Determinants Containing Powers of Generalized Fibonacci Numbers
• The box dimension of random box-like self-affine sets
• The Bi-Objective Workflow Satisfiability Problem and Workflow Resiliency
• Convex Hulls of Lévy Processes
• A Stochastically Evolving Non-local Search and Solutions to Inverse Problems with Sparse Data
• FAASTA: A fast solver for total-variation regularization of ill-conditioned problems with application to brain imaging
• On the Differential Privacy of Bayesian Inference
• On the Impact of Identifiers on Local Decision
• The free energy in a class of quantum spin systems and interchange processes
• Two-faced processes and random number generators
• The flag upper bound theorem for 3- and 5-manifolds
• Proceedings 14th International Workshop on Foundations of Coordination Languages and Self-Adaptive Systems
• Restricted Predicates for Hypothetical Datalog
• On the Bandwidth of the Kneser Graph
• Facility Deployment Decisions through Warp Optimizaton of Regressed Gaussian Processes
• Stochastic C-stability and B-consistency of explicit and implicit Milstein-type schemes
• The C-finite Ansatz Meets the Holonomic Ansatz
• Existence, uniqueness, and regularity for stochastic evolution equations with irregular initial values
• Stochastic Dual Ascent for Solving Linear Systems
• On Distributed Cooperative Decision-Making in Multiarmed Bandits
• Finite-size effects and switching times for Moran dynamics with mutation
• A dynamic Bayesian Markov model for health economic evaluations of interventions against infectious diseases
• Addressing Complex and Subjective Product-Related Queries with Customer Reviews
Over the last decade, proliferation of various online platforms and their increasing adoption by billions of users have heightened the privacy risk of a user enormously. In fact, security researchers have shown that sparse microdata containing information about online activities of a user although anonymous, can still be used to disclose the identity of the user by cross-referencing the data with other data sources. To preserve the privacy of a user, in existing works several methods (k-anonymity, l-diversity, differential privacy) are proposed that ensure a dataset which is meant to share or publish bears small identity disclosure risk. However, the majority of these methods modify the data in isolation, without considering their utility in subsequent knowledge discovery tasks, which makes these datasets less informative. In this work, we consider labeled data that are generally used for classification, and propose two methods for feature selection considering two goals: first, on the reduced feature set the data has small disclosure risk, and second, the utility of the data is preserved for performing a classification task. Experimental results on various real-world datasets show that the method is effective and useful in practice.
In the last few years, deep learning has lead to very good performance on a variety of problems, such as object recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Due to the lack of training data and computing power in early days, it is hard to train a large high-capacity convolutional neural network without overfitting. Recently, with the rapid growth of data size and the increasing power of graphics processor unit, many researchers have improved the convolutional neural networks and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. Besides, we also introduce some applications of convolutional neural networks in computer vision.
We study how to obtain concise descriptions of discrete multivariate sequential data in terms of rich multivariate sequential patterns that can capture potentially highly interesting (cor)relations between sequences. To this end we allow our pattern language to span over the alphabets (domains) of all sequences, allow patterns to overlap temporally, and allow for gaps in their occurrences. We formalise our goal by the Minimum Description Length principle, by which our objective is to discover the set of patterns that provides the most succinct description of the data. To discover good pattern sets, we introduce Ditto, an efficient algorithm to approximate the ideal result. We support our claim with a set of experiments on both synthetic and real data.
Our world is filled with both beautiful and brainy people, but how often does a Nobel Prize winner also wins a beauty pageant? Let us assume that someone who is both very beautiful and very smart is more rare than what we would expect from the combination of the number of beautiful and brainy people. Of course there will still always be some individuals that defy this stereotype; these beautiful brainy people are exactly the class of anomaly we focus on in this paper. They do not posses rare qualities, but it is the unexpected combination of factors that makes them stand out. In this paper we define the above described class of anomaly and propose a method to quickly identify them in transaction data. Further, as we take a pattern set based approach, our method readily explains why a transaction is anomalous. The effectiveness of our method is thoroughly verified with a wide range of experiments on both real world and synthetic data.
In today’s world, we follow news which is distributed globally. Significant events are reported by different sources and in different languages. In this work, we address the problem of tracking of events in a large multilingual stream. Within a recently developed system Event Registry we examine two aspects of this problem: how to compare articles in different languages and how to link collections of articles in different languages which refer to the same event. Taking a multilingual stream and clusters of articles from each language, we compare different cross-lingual document similarity measures based on Wikipedia. This allows us to compute the similarity of any two articles regardless of language. Building on previous work, we show there are methods which scale well and can compute a meaningful similarity between articles from languages with little or no direct overlap in the training data. Using this capability, we then propose an approach to link clusters of articles across languages which represent the same event. We provide an extensive evaluation of the system as a whole, as well as an evaluation of the quality and robustness of the similarity measure and the linking algorithm.
RDF has become very popular for semantic data publishing due to its flexible and universal graph-like data model. Yet, the ever-increasing size of RDF data collections makes it more and more infeasible to store and process them on a single machine, raising the need for distributed approaches. Instead of building a standalone but closed distributed RDF store, we endorse the usage of existing infrastructures for Big Data processing, e.g. Hadoop. However, SPARQL query performance is a major challenge as these platforms are not designed for RDF processing from ground. Thus, existing Hadoop-based approaches often favor certain query pattern shape while performance drops significantly for other shapes. In this paper, we describe a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational databases, to efficiently minimize query input size regardless of its pattern shape and diameter. Our prototype system S2RDF is built on top of Spark and uses its relational interface to execute SPARQL queries over ExtVP. We demonstrate its superior performance in comparison to state of the art SPARQL-on-Hadoop approaches using the recent WatDiv test suite. S2RDF achieves sub-second runtimes for majority of queries on a billion triples RDF graph.
This is the document of Multimodal Deep Learning Library, MDL, which is written in C++. It explains principles and implementations with details of Restricted Boltzmann Machine, Deep Neural Network, Deep Belief Network, Denoising Autoencoder, Deep Boltzmann Machine, Deep Canonical Correlation Analysis, and modal prediction model. MDL uses OpenCV 3.0.0, which is the only dependency of this library. Most of its implementation has been tested in Mac OS. It also provides interface for reading various data set such as MNIST, CIFAR, XRMB, and AVLetters. To read mat file, Matlab must be installed because it uses Matlab/c++ interface provided by Matlab. There are multiple model options provided. Different gradient descent methods, loss function, annealing methods, and activation functions are given. These options are easy to extend given the structure of MDL. So MDL could be used as a frame for testings in deep learning.
Embedding learning, a.k.a. representation learning, has been shown to be able to model large-scale semantic knowledge graphs. A key concept is a mapping of the knowledge graph to a tensor representation whose entries are predicted by models using latent representations of generalized entities. Knowledge graphs are typically treated as static: A knowledge graph grows more links when more facts become available but the ground truth values associated with links is considered time invariant. In this paper we address the issue of knowledge graphs where triple states depend on time. We assume that changes in the knowledge graph always arrive in form of events, in the sense that the events are the gateway to the knowledge graph. We train an event prediction model which uses both knowledge graph background information and information on recent events. By predicting future events, we also predict likely changes in the knowledge graph and thus obtain a model for the evolution of the knowledge graph as well. Our experiments demonstrate that our approach performs well in a clinical application, a recommendation engine and a sensor network application.