A Compositional Framework for Markov Processes

We define the concept of an ‘open’ Markov process, or more precisely, continuous-time Markov chain, which is one where probability can flow in or out of certain states called ‘inputs’ and ‘outputs’. One can build up a Markov process from smaller open pieces. This process is formalized by making open Markov processes into the morphisms of a dagger compact category. We show that the behavior of a detailed balanced open Markov process is determined by a principle of minimum dissipation, closely related to Prigogine’s principle of minimum entropy production. Using this fact, we set up a functor mapping open detailed balanced Markov processes to open circuits made of linear resistors. We also describe how to ‘black box’ an open Markov process, obtaining the linear relation between input and output data that holds in any steady state, including nonequilibrium steady states with a nonzero flow of probability through the system. We prove that black boxing gives a symmetric monoidal dagger functor sending open detailed balanced Markov processes to Lagrangian relations between symplectic vector spaces. This allows us to compute the steady state behavior of an open detailed balanced Markov process from the behaviors of smaller pieces from which it is built. We relate this black box functor to a previously constructed black box functor for circuits.

Character-Aware Neural Language Models

We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Czech, German, French, Spanish, Russian), the model consistently outperforms a Kneser-Ney baseline (by 30-35%) and a word-level LSTM baseline (by 15-25%), again with far fewer parameters. Our results suggest that on many languages, character inputs are sufficient for language modeling.

Full-text and Keyword Indexes for String Searching

In this work, we present a literature review for full-text and keyword indexes as well as our contributions (which are mostly practice-oriented). The first contribution is the FM-bloated index, which is a modification of the well-known FM-index (a compressed, full-text index) that trades space for speed. In our approach, the count table and the occurrence lists store information about selected q-grams in addition to the individual characters. Two variants are described, namely one using O(n \log^2 n) bits of space with O(m + \log m \log \log n) average query time, and one with linear space and O(m \log \log n) average query time, where n is the input text length and m is the pattern length. We experimentally show that a significant speedup can be achieved by operating on q-grams (albeit at the cost of very high space requirements, hence the name ‘bloated’). In the category of keyword indexes we present the so-called split index, which can efficiently solve the k-mismatches problem, especially for 1 error. Our implementation in the C++ language is focused mostly on data compaction, which is beneficial for the search speed (by being cache friendly). We compare our solution with other algorithms and we show that it is faster when the Hamming distance is used. Query times in the order of 1 microsecond were reported for one mismatch for a few-megabyte natural language dictionary on a medium-end PC. A minor contribution includes string sketches which aim to speed up approximate string comparison at the cost of additional space (O(1) per string). They can be used in the context of keyword indexes in order to deduce that two strings differ by at least k mismatches with the use of fast bitwise operations rather than an explicit verification.

Towards universal neural nets: Gibbs machines and ACE

We study a class of neural nets – Gibbs machines – which are a type of variational auto-encoders, designed for gradual learning. They offer an universal platform for incrementally adding newly learned features, including physical symmetries in space/time. Combining them with classifiers gives rise to a brand of universal generative neural nets – stochastic auto-classifier-encoders (ACE). ACE preserve the non-Gaussian and clustering nature of real-life data and have state-of-the-art performance, both for classification and density estimation for the MNIST data set.

A Boosted Tweedie Compound Poisson Model for Insurance Premium

A fully data-driven method to identify (correlated) changes in diachronic corpora

A Neural Algorithm of Artistic Style

A note on Constantin and Iyer’s representation formula for the Navier–Stokes equations

A Parallel Algorithm to Test Chordality of Graphs

A review of homomorphic encryption and software tools for encrypted statistical machine learning

A simple proof of Bourgain’s theorem on the singularity of the spectrum of Ornstein’s maps

A statistical analysis of a deformation model with Wasserstein barycenters : estimation procedure and goodness of fit test

A white noise approach to insider trading

Adaptive Multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

Alignment-based compositional semantics for instruction following

Asymptotics for randomly reinforced urns with random barriers

Capture-recapture abundance estimation using a semi-complete data likelihood approach

Combinatorial analysis of polynomial flatness problem, Mahler measure and ergodic theory

Compressed Sensing and Reconstruction of Unstructured Mesh Datasets

Conditional quantile sequential estimation for stochastic code

Crossings as a side effect of dependency lengths

Deep Convolutional Neural Networks for Smile Recognition

Deterministic Broadcasting and Gossiping with Beeps

Discriminating quantum states: the multiple Chernoff distance

Edge scaling limit of the spectral radius for random normal matrix ensembles at hard edge

Estimating HIV Epidemics for Sub-National Areas

Financial Market Modeling with Quantum Neural Networks

Forecast Combination Under Heavy-Tailed Errors

Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces

Generating correlated random vector by polynomial normal transformation

Global Synchronization and Consensus Using Beeps in a Fault-Prone MAC

Greedy methods, randomization approaches and multi-arm bandit algorithms for efficient sparsity-constrained optimization

Improved Analyses of the Randomized Power Method and Block Lanczos Method

Nested Hierarchical Dirichlet Processes for Multi-Level Non-Parametric Admixture Modeling

Ninomiya-Victoir scheme: strong convergence, antithetic version and application to multilevel estimators

On dimension reduction in Gaussian filters

On sets not belonging to algebras and rainbow matchings in graphs

On tame, pet, domestic, and miserable impartial games

One-sample Distribution-free Interval Estimators for Ratios of Percentiles

Outermost boundaries for star-connected components in percolation

Population Synthesis via k-Nearest Neighbor Crossover Kernel

Prepartition: Paradigm for the Load Balance of Virtual Machine Allocation in Data Centers

Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP 2015)

Random walk on sparse random digraphs

Representation and poly-time approximation for pressure of $\mathbb{Z}^2$ lattice models in the non-uniqueness region

Restricted Indian Buffet Processes

Robustness of movement models: can models bridge the gap between temporal scales of data sets and behavioural processes?

SPRIGHT: A Fast and Robust Framework for Sparse Walsh-Hadamard Transform

Standardized drought indices: A novel uni- and multivariate approach

The Modelling of Random Phenomena

The Stable Fixtures Problem with Payments

Thermodynamic Identities and Symmetry Breaking in Short-Range Spin Glasses

Transition to chaos in random neuronal networks

Universal targets for homomorphisms of edge-colored graphs