Probabilistic Self-Stabilization

By using concrete scenarios, we present and discuss a new concept of probabilistic Self-Stabilization in Distributed Systems.

Nonparametric Bayesian Regression on Manifolds via Brownian Motion

This paper proposes a novel framework for manifold-valued regression and establishes its consistency as well as its contraction rate. It assumes a predictor with values in the interval $[0,1]$ and response with values in a compact Riemannian manifold $M$. This setting is useful for applications such as modeling dynamic scenes or shape deformations, where the visual scene or the deformed objects can be modeled by a manifold. The proposed framework is nonparametric and uses the heat kernel (and its associated Brownian motion) on manifolds as an averaging procedure. It directly generalizes the use of the Gaussian kernel (as a natural model of additive noise) in vector-valued regression problems. In order to avoid explicit dependence on estimates of the heat kernel, we follow a Bayesian setting, where Brownian motion on $M$ induces a prior distribution on the space of continuous functions $C([0,1], M)$. For the case of discretized Brownian motion, we establish the consistency of the posterior distribution in terms of the $L_{q}$ distances for any $1 \leq q < \infty$. Most importantly, we establish contraction rate of order $O(n^{-1/4+\epsilon})$ for any fixed $\epsilon>0$, where $n$ is the number of observations. For the continuous Brownian motion we establish weak consistency.

The generalization of Latin hypercube sampling

Latin hypercube sampling (LHS) is generalized in terms of a spectrum of stratified sampling (SS) designs referred to as partially stratified sample (PSS) designs. True SS and LHS are shown to represent the extremes of the PSS spectrum. The variance of PSS estimates is derived along with some asymptotic properties. PSS designs are shown to reduce variance associated with variable interactions, whereas LHS reduces variance associated with main effects. Challenges associated with the use of PSS designs and their limitations are discussed. To overcome these challenges, the PSS method is coupled with a new method called Latinized stratified sampling (LSS) that produces sample sets that are simultaneously SS and LHS. The LSS method is equivalent to an Orthogonal Array based LHS under certain conditions but is easier to obtain. Utilizing an LSS on the subspaces of a PSS provides a sampling strategy that reduces variance associated with both main effects and variable interactions and can be designed specially to minimize variance for a given problem. Several high-dimensional numerical examples highlight the strengths and limitations of the method. The Latinized partially stratified sampling method is then applied to identify the best sample strategy for uncertainty quantification on a plate buckling problem.

The sample complexity of weighted sparse approximation

For Gaussian sampling matrices, we provide bounds on the minimal number of measurements $m$ required to achieve robust weighted sparse recovery guarantees in terms of how well a given prior model for the sparsity support aligns with the true underlying support. Our main contribution is that for a sparse vector $\mathbf{x} \in \mathbb{R}^N$ supported on an unknown set $\mathcal{S} \subset \{1,2, \dots, N\}$ with $|\mathcal{S}|\leq k$, if $\mathcal{S}$ has $weighted ~cardinality ~\omega(\mathcal{S}) := \sum_{j \in \mathcal{S}} \omega_j^2$, and if the weights on $\mathcal{S}^c$ exhibit mild growth, $\omega_j^2 \geq \gamma \log(j/\omega(\mathcal{S}))$ for $\gamma > 0$, then the sample complexity for sparse recovery via weighted $\ell_1$-minimization using weights $\omega_j$ is linear in the weighted sparsity level, and $m = \mathcal{O}(\omega(\mathcal{S})/\gamma)$. This main result is a generalization of special cases including a) the standard sparse recovery setting where all weights $\omega_j \equiv 1$, and $m = \mathcal{O}\left(k\log\left(N/k\right)\right)$; b) the setting where the support is known a priori, and $m = \mathcal{O}(k)$; and c) the setting of sparse recovery with prior information, and $m$ depends on how well the weights are aligned with the support set $\mathcal{S}$. We further extend the results in case c) to the setting of additive noise. Our results are $nonuniform$ that is they apply for a fixed support, unknown a priori, and the weights on $\mathcal{S}$ do not all have to be smaller than the weights on $\mathcal{S}^c$ for our recovery results to hold.

Linear Contextual Bandits with Global Constraints and Objective

We consider the linear contextual bandit problem with global convex constraints and a concave objective function. In each round, the outcome of pulling an arm is a vector, that depends linearly on the context of that arm. The global constraints require the average of these vectors to lie in a certain convex set. The objective is a concave function of this average vector. This problem turns out to be a common generalization of classic linear contextual bandits (linContextual) [Auer 2003], bandits with concave rewards and convex knapsacks (BwCR) [Agrawal, Devanur 2014], and the online stochastic convex programming (OSCP) problem [Agrawal, Devanur 2015]. We present algorithms with near-optimal regret bounds for this problem. Our bounds compare favorably to results on the unstructured version of the problem [Agrawal et al. 2015, Badanidiyuru et al. 2014] where the relation between the contexts and the outcomes could be arbitrary, but the algorithm only competes against a fixed set of policies.

Selective inference with a randomized response

Inspired by sample splitting and the reusable holdout introduced in the field of differential privacy, we consider selective inference with a randomized response. Using a randomized response can ensure that the leftover information of Fithian et al.(2014) is bounded below ensuring that selective intervals are better behaved than without randomization. Under independent sampling, we prove a selective (or privatized) central limit theorem that transfers procedures valid under asymptotic normality without selection to their corresponding selective counterparts. We focus on the classical asymptotic setting, leaving the interesting high-dimensional asymptotic questions for future work.

Quantum integrable combinatorics of Schur polynomials

We examine and present new combinatorics for the Schur polynomials from the viewpoint of quantum integrability. We introduce and analyze an integrable six-vertex model which can be viewed as a certain degeneration model from a t-deformed boson model. By a detailed analysis of the wavefunction from the quantum inverse scattering method, we present a novel combinatorial formula which expresses the Schur polynomials by using an additional parameter, which is in the same sense but different from the Tokuyama formula. We also give an algebraic analytic proof for the Cauchy identity and make applications of the domain wall boundary partition functions to the enumeration of alternating sign matrices.

Word-representability of subdivisions of triangular grid graphs

A graph $G=(V,E)$ is word-representable if there exists a word $w$ over the alphabet $V$ such that letters $x$ and $y$, $x\neq y$, alternate in $w$ if and only if $(x,y)\in E$. Halld\'{o}rsson et al.\ have shown that a graph is word-representable if and only if it admits a so-called semi-transitive orientation. A corollary to this result is that any 3-colorable graph is word-representable. Akrobotu et al.\ have shown that a triangulation of a grid graph is word-representable if and only if it is 3-colorable. This result does not hold for triangulations of grid-covered cylinder graphs, namely, there are such word-representable graphs with chromatic number 4. In this paper we show that word-representability of triangulations of grid-covered cylinder graphs with three sectors (resp., more than three sectors) is characterized by avoiding a certain set of six minimal induced subgraphs (resp., wheel graphs $W_5$ and $W_7$).

Variational Bayesian strategies for high-dimensional, stochastic design problems

This paper is concerned with a lesser-studied problem in the context of model-based, uncertainty quantification (UQ), that of optimization/design/control under uncertainty. The solution of such problems is hindered not only by the usual difficulties encountered in UQ tasks (e.g. the high computational cost of each forward simulation, the large number of random variables) but also by the need to solve a nonlinear optimization problem involving large numbers of design variables and potentially constraints. We propose a framework that is suitable for a large class of such problems and is based on the idea of recasting it as a probabilistic inference task. To address this, we propose a Variational Bayesian (VB) formulation and an iterative VB-Expectation-Maximization scheme that is also capable of identifying a low-dimensional set of directions in the design space along which the objective exhibits the largest sensitivity. We demonstrate the validity of proposed approach in the context of two numerical examples involving $\mathcal{O}(10^3)$ random and design variables. In all cases considered the cost of the computations in terms of calls to the forward model was of the order $\mathcal{O}(10^2)$. The accuracy of the approximations provided is assessed by appropriate information-theoretic metrics.\footnote{ This paper is based on the talk given during the international symposium on ‘Big Data and Predictive Computational Modeling’ that took place in 18-21 May 2015 at TUM-IAS, Munich Germany.}

The quaternionic weighted zeta function of a graph

We establish the quaternionic weighted zeta function of a graph and its Study determinant expression. For a graph with quaternionic weights on arcs, we define a zeta function by using an infinite product which is regarded as the Euler product. This is a quaternionic extension of the square of the Ihara zeta function. We show that the new zeta function can be expressed as the exponential of a generating function and that it has two Study determinant expressions, which are crucial for the theory of zeta functions of graphs.

Differentially Private Analysis of Outliers

This paper investigates differentially private analysis of distance-based outliers. The problem of outlier detection is to find a small number of instances that are apparently distant from the remaining instances. On the other hand, the objective of differential privacy is to conceal presence (or absence) of any particular instance. Outlier detection and privacy protection are thus intrinsically conflicting tasks. In this paper, instead of reporting outliers detected, we present two types of differentially private queries that help to understand behavior of outliers. One is the query to count outliers, which reports the number of outliers that appear in a given subspace. Our formal analysis on the exact global sensitivity of outlier counts reveals that regular global sensitivity based method can make the outputs too noisy, particularly when the dimensionality of the given subspace is high. Noting that the counts of outliers are typically expected to be relatively small compared to the number of data, we introduce a mechanism based on the smooth upper bound of the local sensitivity. The other is the query to discovery top-$h$ subspaces containing a large number of outliers. This task can be naively achieved by issuing count queries to each subspace in turn. However, the variation of subspaces can grow exponentially in the data dimensionality. This can cause serious consumption of the privacy budget. For this task, we propose an exponential mechanism with a customized score function for subspace discovery. To the best of our knowledge, this study is the first trial to ensure differential privacy for distance-based outlier analysis. We demonstrated our methods with synthesized datasets and real datasets. The experimental results show that out method achieve better utility compared to the global sensitivity based methods.

An improved EM algorithm for solving MLE in constrained diffusion kurtosis imaging of human brain

The displacement distribution of a water molecular is characterized mathematically as Gaussianity without considering potential diffusion barriers and compartments. However, this is not true in real scenario: most biological tissues are comprised of cell membranes, various intracellular and extracellular spaces, and of other compartments, where the water diffusion is referred to have a non-Gaussian distribution. Diffusion kurtosis imaging (DKI), recently considered to be one sensitive biomarker, is an extension of diffusion tensor imaging, which quantifies the degree of non-Gaussianity of the diffusion. This work proposes an efficient scheme of maximum likelihood estimation (MLE) in DKI: we start from the Rician noise model of the signal intensities. By augmenting a Von-Mises distributed latent phase variable, the Rician likelihood is transformed to a tractable joint density without loss of generality. A fast computational method, an expectation-maximization (EM) algorithm for MLE is proposed in DKI. To guarantee the physical relevance of the diffusion kurtosis we apply the ternary quartic (TQ) parametrization to utilize its positivity, which imposes the upper bound to the kurtosis. A Fisher-scoring method is used for achieving fast convergence of the individual diffusion compartments. In addition, we use the barrier method to constrain the lower bound to the kurtosis. The proposed estimation scheme is conducted on both synthetic and real data with an objective of healthy human brain. We compared the method with the other popular ones with promising performance shown in the results.

The strong renewal theorem with infinite mean via local large deviations

A necessary and sufficient condition is established for an asymptotically stable random walk to satisfy the strong renewal theorem. This result is valid for all alpha in (0, 1), thus completing a result for alpha in (1/2, 1) which was proved in the 1963 paper of Garsia and Lamperti [6].

Efficient Estimation for Diffusions Sampled at High Frequency Over a Fixed Time Interval

Parametric estimation for diffusion processes is considered for high frequency observations over a fixed time interval. The processes solve stochastic differential equations with an unknown parameter in the diffusion coefficient. We find easily verified conditions on approximate martingale estimating functions under which estimators are consistent, rate optimal, and efficient under high frequency (in-fill) asymptotics. The asymptotic distributions of the estimators are shown to be normal variance-mixtures, where the mixing distribution generally depends on the full sample path of the diffusion process over the observation time interval. Utilising the concept of stable convergence, we also obtain the more easily applicable result that for a suitable data dependent normalisation, the estimators converge in distribution to a standard normal distribution. The theory is illustrated by a small simulation study comparing an efficient and a non-efficient estimating function.

The Characterization of planar, 4-connected, K_{2,5}-minor-free graphs

We show that every planar, 4-connected, K2;5-minor- free graph is the square of a cycle of even length at least six.

Implicitly Constrained Semi-Supervised Least Squares Classification

We introduce a novel semi-supervised version of the least squares classifier. This implicitly constrained least squares (ICLS) classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, our approach does not introduce explicit additional assumptions into the objective function, but leverages implicit assumptions already present in the choice of the supervised least squares classifier. We show this approach can be formulated as a quadratic programming problem and its solution can be found using a simple gradient descent procedure. We prove that, in a certain way, our method never leads to performance worse than the supervised classifier. Experimental results corroborate this theoretical result in the multidimensional case on benchmark datasets, also in terms of the error rate.

A Neighbourhood-Based Stopping Criterion for Contrastive Divergence Learning

Restricted Boltzmann Machines (RBMs) are general unsupervised learning devices to ascertain generative models of data distributions. RBMs are often trained using the Contrastive Divergence learning algorithm (CD), an approximation to the gradient of the data log-likelihood. A simple reconstruction error is often used as a stopping criterion for CD, although several authors \cite{schulz-et-al-Convergence-Contrastive-Divergence-2010-NIPSw, fischer-igel-Divergence-Contrastive-Divergence-2010-ICANN} have raised doubts concerning the feasibility of this procedure. In many cases the evolution curve of the reconstruction error is monotonic while the log-likelihood is not, thus indicating that the former is not a good estimator of the optimal stopping point for learning. However, not many alternatives to the reconstruction error have been discussed in the literature. In this manuscript we investigate simple alternatives to the reconstruction error, based on the inclusion of information contained in neighboring states to the training set, as a stopping criterion for CD learning.

Bayesian inference for diffusion driven mixed-effects models

Stochastic differential equations (SDEs) provide a natural framework for modelling intrinsic stochasticity inherent in many continuous-time physical processes. When such processes are observed in multiple individuals or experimental units, SDE driven mixed-effects models allow the quantification of between (as well as within) individual variation. Performing Bayesian inference for such models, using discrete time data that may be incomplete and subject to measurement error is a challenging problem and is the focus of this paper. We extend a recently proposed MCMC scheme to include the SDE driven mixed-effects framework. Fundamental to our approach is the development of a novel construct that allows for efficient sampling of conditioned SDEs that may exhibit nonlinear dynamics between observation times. We apply the resulting scheme to synthetic data generated from a simple SDE model of orange tree growth, and real data consisting of observations on aphid numbers recorded under a variety of different treatment regimes. In addition, we provide a systematic comparison of our approach with an inference scheme based on a tractable approximation of the SDE, that is, the linear noise approximation.

A distributed, end-to-end verifiable, internet voting system

E-voting systems have emerged as a powerful technology for improving democracy by reducing election cost, increasing voter participation, and even allowing voters to directly verify the entire election procedure. Prior internet voting systems have single points of failure which may result in the compromise of voter secrecy, service availability, or integrity of the election results. In this paper, we present the design, implementation, security analysis, and evaluation of a complete e-voting system that is distributed, privacy-preserving and end-to-end verifiable. An essential component of end-to-end verifiable e-voting systems, is a Bulletin Board (BB), i.e., a publicly accessible repository of information assisting in collecting votes, tabulating the result, and election auditing. In previous works, either the BB is a single point of failure and needs to be trusted, or in case it is distributed, its proper operation during ballot casting can only be verified by using a trusted device that performs cryptographic operations. Our system is the first e-voting system with a distributed BB whose operation is human verifiable, i.e., voters can verify its proper operation without the assistance of special software or trusted devices. A voter can use our system over the web, even when her web client stack is potentially unsafe, without sacrificing her privacy, and still be assured her vote was cast as intended. A voter can even outsource election auditing to third parties, still without sacrificing privacy. Finally, as the number of auditors increases, the probability of election fraud going undetected is diminished exponentially. We provide a rigorous security analysis, proving the safety and liveness properties of the system. We implement a prototype of the complete system, we measure its performance empirically, and demonstrate its ability to handle large-scale elections.

Local majority dynamics on preferential attachment graphs

Suppose in a graph $G$ vertices can be either red or blue. Let $k$ be odd. At each time step, each vertex $v$ in $G$ polls $k$ random neighbours and takes the majority colour. If it doesn’t have $k$ neighbours, it simply polls all of them, or all less one if the degree of $v$ is even. We study this protocol on the preferential attachment model of Albert and Barab\’asi, which gives rise to a degree distribution that has roughly power-law $P(x) \sim \frac{1}{x^{3}}$, as well as generalisations which give exponents larger than $3$. The setting is as follows: Initially each vertex of $G$ is red independently with probability $\alpha < \frac{1}{2}$, and is otherwise blue. We show that if $\alpha$ is sufficiently biased away from $\frac{1}{2}$, then with high probability, consensus is reached on the initial global majority within $O(\log_d \log_d t)$ steps. Here $t$ is the number of vertices and $d \geq 5$ is the minimum of $k$ and $m$ (or $m-1$ if $m$ is even), $m$ being the number of edges each new vertex adds in the preferential attachment generative process. Additionally, our analysis reduces the required bias of $\alpha$ for graphs of a given degree sequence studied by the first author (which includes, e.g., random regular graphs).

Multimodal Deep Learning for Robust RGB-D Object Recognition

Robust object recognition is a crucial ingredient of many robotics applications and a prerequisite for solving challenging tasks such as manipulating or grasping objects. Recently, convolutional neural networks (CNNs) have been shown to outperform conventional computer vision algorithms for object recognition from images in many large-scale recognition tasks. In this paper, we investigate the potential for using CNNs to solve object recognition from combined RGB and depth images. We present a new network architecture with two processing streams that learns descriptive features from both input modalities and can learn to fuse information automatically before classification. We introduce an effective encoding of depth information for CNNs and report state of the art performance on the challenging RGB-D Object dataset, where we achieve a recognition accuracy of 91.3%. Finally, to facilitate usage of our method ‘in the wild’, we present a novel synthetic data augmentation approach for training with depth images that improves recognition accuracy in real-world environments.

The Polylingual Labeled Topic Model

In this paper, we present the Polylingual Labeled Topic Model, a model which combines the characteristics of the existing Polylingual Topic Model and Labeled LDA. The model accounts for multiple languages with separate topic distributions for each language while restricting the permitted topics of a document to a set of predefined labels. We explore the properties of the model in a two-language setting on a dataset from the social science domain. Our experiments show that our model outperforms LDA and Labeled LDA in terms of their held-out perplexity and that it produces semantically coherent topics which are well interpretable by human subjects.

YARBUS : Yet Another Rule Based belief Update System

We introduce a new rule based system for belief tracking in dialog systems. Despite the simplicity of the rules being considered, the proposed belief tracker ranks favourably compared to the previous submissions on the second and third Dialog State Tracking challenges. The results of this simple tracker allows to reconsider the performances of previous submissions using more elaborate techniques.

The distribution of the spine of a Fleming-Viot type process

We show uniqueness of the spine of a Fleming-Viot particle system under minimal assumptions on the driving process. If the driving process is a continuous time Markov process on a finite space, we show that asymptotically, when the number of particles goes to infinity, the branching rate for the spine is twice that of a generic particle in the system, and every side branch has the distribution of the unconditioned generic branching tree.

Micro Service Cloud Computing Pattern for Next Generation Networks

The falling trend in the revenue of traditional telephony services has attracted attention to new IP based services. The IP Multimedia System (IMS) is a key architecture which provides the necessary platform for delivery of new multimedia services. However, current implementations of IMS do not offer automatic scalability or elastisity for the growing number of customers. Although the cloud computing paradigm has shown many promising characteristics for web applications, it is still failing to meet the requirements for telecommunication applications. In this paper, we present some related cloud computing patterns and discuss their adaptations for implementation of IMS or other telecommunication systems.

Group actions on semimatroids

We initiate the study of group actions on (possibly infinite) semimatroids and geometric semilattices. To every such action is naturally associated an orbit-counting function, a two-variable Tutte polynomial and a poset which, in the realizable case, coincides with the poset of connected components of intersections of the associated toric arrangement. In this structural framework we recover and strongly generalize many enumerative results about arithmetic matroids, arithmetic Tutte polynomials and toric arrangements by finding new combinatorial interpretations beyond the realizable case. In particular, we thus find the first class of natural examples of nonrealizable arithmetic matroids. Moreover, if the group is abelian, then the action gives rise to a family of Z-modules which, under appropriate conditions, defines a matroid over Z. As a stepping stone to our results we also prove an extension of the cryptomorphism between semimatroids and geometric semilattices to the infinite case.

Compressed Data Structures for Dynamic Sequences

We consider the problem of storing a dynamic string $S$ over an alphabet $\Sigma=\{\,1,\ldots,\sigma\,\}$ in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries: $\mathrm{access}(i,S)$ returns the $i$-th symbol in $S$, $\mathrm{rank}_a(i,S)$ counts how many times a symbol $a$ occurs among the first $i$ positions in $S$, and $\mathrm{select}_a(i,S)$ finds the position where a symbol $a$ occurs for the $i$-th time. We present the first fully-dynamic data structure for arbitrarily large alphabets that achieves optimal query times for all three operations and supports updates with worst-case time guarantees. Ours is also the first fully-dynamic data structure that needs only $nH_k+o(n\log\sigma)$ bits, where $H_k$ is the $k$-th order entropy and $n$ is the string length. Moreover our representation supports extraction of a substring $S[i..i+\ell]$ in optimal $O(\log n/\log\log n + \ell/\log_{\sigma}n)$ time.

Hoeffding’s inequality for sums of weakly dependent random variables

We provide a systematic approach to deal with the following problem. Let $X_1,\ldots,X_n$ be, possibly dependent, $[0,1]$-valued random variables. What is a sharp upper bound on the probability that their sum is significantly larger than their mean? In the case of independent random variables, a fundamental tool for bounding such probabilities is devised by Wassily Hoeffding. In this paper we consider analogues of Hoeffding’s result for sums of dependent random variables for which we have certain information on their dependency structure. We prove a result that yields concentration inequalities for several notions of weak dependence between random variables. Additionally, we obtain a new concentration inequality for sums of, possibly dependent, $[0,1]$-valued random variables, $X_1,\ldots,X_n$, that satisfy the following condition: there exist constants $\gamma \in (0,1)$ and $\delta\in (0,1]$ such that for every subset $A\subseteq \{1,\ldots,n\}$ we have $\mathbb{E}\left[\prod_{i\in A} X_i \prod_{i\notin A}(1-X_i) \right]\leq \gamma^{|A|} \delta^{n-|A|}$, where $|A|$ denotes the cardinality of $A$. Our approach applies to several sums of weakly dependent random variables such as sums of martingale difference sequences, sums of $k$-wise independent random variables and $U$-statistics. Finally, we discuss some applications to the theory of random graphs.

Multi-objective analysis of computational models

Computational models are of increasing complexity and their behavior may in particular emerge from the interaction of different parts. Studying such models becomes then more and more difficult and there is a need for methods and tools supporting this process. Multi-objective evolutionary algorithms generate a set of trade-off solutions instead of a single optimal solution. The availability of a set of solutions that have the specificity to be optimal relative to carefully chosen objectives allows to perform data mining in order to better understand model features and regularities. We review the corresponding work, propose a unifying framework, and highlight its potential use. Typical questions that such a methodology allows to address are the following: what are the most critical parameters of the model? What are the relations between the parameters and the objectives? What are the typical behaviors of the model? Two examples are provided to illustrate the capabilities of the methodology. The features of a flapping-wing robot are thus evaluated to find out its speed-energy relation, together with the criticality of its parameters. A neurocomputational model of the Basal Ganglia brain nuclei is then considered and its most salient features according to this methodology are presented and discussed.

Quantum Algorithm for Triangle Finding in Sparse Graphs

This paper presents a quantum algorithm for triangle finding over sparse graphs that improves over the previous best quantum algorithm for this task by Buhrman et al. [SIAM Journal on Computing, 2005]. Our algorithm is based on the recent $\tilde O(n^{5/4})$-query algorithm given by Le Gall [FOCS 2014] for triangle finding over dense graphs (here $n$ denotes the number of vertices in the graph). We show in particular that triangle finding can be solved with $O(n^{5/4-\epsilon})$ queries for some constant $\epsilon>0$ whenever the graph has at most $O(n^{2-c})$ edges for some constant $c>0$.

On $ωψ$-Perfect Graphs

In this paper we generalize the concept of perfection in graphs to other interesting parameters related to colorations. This idea was introduced by Christen and Selkow in 1979 and Yegnanarayanan in 2001. Let $ a,b \in \{ \omega, \chi, \Gamma, \alpha, \psi \} $ where $ \omega $ is the clique number, $ \chi $ is the chromatic number, $ \Gamma $ is the Grundy number, $ \alpha $ is the achromatic number and $ \psi $ is the pseudoachromatic number. A graph $ G $ is $ ab $-perfect, if for every induced subgraph $ H $, $ a(H)=b(H) $. Hereby we characterize the $ \omega \psi $-perfect graphs.

A Reinforcement Learning Approach to Online Learning of Decision Trees

Online decision tree learning algorithms typically examine all features of a new data point to update model parameters. We propose a novel alternative, Reinforcement Learning- based Decision Trees (RLDT), that uses Reinforcement Learning (RL) to actively examine a minimal number of features of a data point to classify it with high accuracy. Furthermore, RLDT optimizes a long term return, providing a better alternative to the traditional myopic greedy approach to growing decision trees. We demonstrate that this approach performs as well as batch learning algorithms and other online decision tree learning algorithms, while making significantly fewer queries about the features of the data points. We also show that RLDT can effectively handle concept drift.

Mesoscopic higher regularity and subadditivity in elliptic homogenization

We introduce a new method for obtaining quantitative results in stochastic homogenization for linear elliptic equations in divergence form. Unlike previous works on the topic, our method does not use concentration inequalities (such as Poincar\’e or logarithmic Sobolev inequalities in the probability space) and relies instead on a higher ($C^{k}$, $k \geq 1$) regularity theory for solutions of the heterogeneous equation, which is valid on length scales larger than a certain specified mesoscopic scale. This regularity theory, which is of independent interest, allows us to, in effect, localize the dependence of the solutions on the coefficients and thereby accelerate the rate of convergence of the expected energy of the cell problem by a bootstrap argument. The fluctuations of the energy are then tightly controlled using subadditivity. The convergence of the energy gives control of the scaling of the spatial averages of gradients and fluxes (that is, it quantifies the weak convergence of these quantities) which yields, by a new ‘multiscale’ Poincar\’e inequality, quantitative estimates on the sublinearity of the corrector.

Center problem, Abel equation and the Faa di Bruno Hopf algebra for output feedback

A combinatorial interpretation is given of Devlin’s word problem underlying the classical center-focus problem of Poincare for non-autonomous differential equations. It turns out that the canonical polynomials of Devlin are from the point of view of connected graded Hopf algebras intimately related to the graded components of a Hopf algebra antipode applied to the formal power series of Ferfera. The link is made by passing through control theory since the Abel equation, which describes a center, is equivalent to an output feedback equation, and the Hopf algebra of output feedback is derived from the composition of iterated integrals rather than just the products of iterated integrals, which yields the shuffle algebra. This means that the primary algebraic structure at play in Devlin’s approach is actually not the shuffle algebra, but a Faa di Bruno type Hopf algebra, which is defined in terms of the shuffle product but is a distinct algebraic structure.

On the Vanishing of Homology in Random Čech Complexes

We compute the homology of random \v{C}ech complexes over a homogeneous Poisson process on a closed manifold, and show that there are, coarsely, two phase transitions. The first transition is analogous to the Erd\H{o}s-R\’enyi phase transition, where the \v{C}ech complex becomes as connected as the underlying manifold. The second transition is where all the other homology groups are computed correctly (almost simultaneously). Our calculations also suggest a finer measurement of scales, where there is a further refinement to this picture and separation between different homology groups.

Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models initialized with connectionist temporal classification (CTC). In this paper, we present techniques that further improve performance of LSTM RNN acoustic models for large vocabulary speech recognition. We show that frame stacking and reduced frame rate lead to more accurate models and faster decoding. CD phone modeling leads to further improvements. We also present initial results for LSTM RNN models outputting words directly.

Pattern-avoiding access in binary search trees

The dynamic optimality conjecture is perhaps the most fundamental open question about binary search trees (BST). It postulates the existence of an asymptotically optimal online BST, i.e. one that is constant factor competitive with any BST on any input access sequence. The two main candidates for dynamic optimality in the literature are splay trees [Sleator and Tarjan, 1985], and Greedy [Lucas, 1988; Munro, 2000; Demaine et al. 2009] [..] Dynamic optimality is trivial for almost all sequences: the optimum access cost of most length-n sequences is Theta(n log n), achievable by any balanced BST. Thus, the obvious missing step towards the conjecture is an understanding of the ‘easy’ access sequences. [..] The difficulty of proving dynamic optimality is witnessed by highly restricted special cases that remain unresolved; one prominent example is the traversal conjecture [Sleator and Tarjan, 1985], which states that preorder sequences (whose optimum is linear) are linear-time accessed by splay trees; no online BST is known to satisfy this conjecture. In this paper, we prove two different relaxations of the traversal conjecture for Greedy: (i) Greedy is almost linear for preorder traversal, (ii) if a linear-time preprocessing is allowed, Greedy is in fact linear. These statements are corollaries of our more general results that express the complexity of access sequences in terms of a pattern avoidance parameter k. [..] To our knowledge, these are the first upper bounds for Greedy that are not known to hold for any other online BST. To obtain these results we identify an input-revealing property of Greedy. Informally, this means that the execution log partially reveals the structure of the access sequence. This property facilitates the use of rich technical tools from forbidden submatrix theory. [Abridged]

Cycles in the lattice with visible steps and the chromatic zeta function of a graph

We study the probability that a cycle of length k in the lattice [1, n]^s does not contain more lattice points than the k vertices of the cycle. Then we generalize this problem to other con?gurations induced by a given graph H introducting the chromatic zeta fuction of a graph.

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

We introduce and analyze stochastic optimization methods where the input to each gradient update is perturbed by bounded noise. We show that this framework forms the basis of a unified approach to analyze asynchronous implementations of stochastic optimization algorithms.In this framework, asynchronous stochastic optimization algorithms can be thought of as serial methods operating on noisy inputs. Using our perturbed iterate framework, we provide new analyses of the Hogwild! algorithm and asynchronous stochastic coordinate descent, that are simpler than earlier analyses, remove many assumptions of previous models, and in some cases yield improved upper bounds on the convergence rates. We proceed to apply our framework to develop and analyze KroMagnon: a novel, parallel, sparse stochastic variance-reduced gradient (SVRG) algorithm. We demonstrate experimentally on a 16-core machine that the sparse and parallel version of SVRG is in some cases more than four orders of magnitude faster than the standard SVRG algorithm.

A few $c_2$ invariants of circulant graphs

The $c_2$ invariant is an arithmetic graph invariant introduced by Brown and Schnetz in order to better understand Feynman integrals. This document looks at the special case where the graph in question is a 4-regular circulant graph with one vertex removed; call such a graph a decompletion of a circulant graph. The $c_2$ invariant for the prime $2$ is computed in the case of the decompletion of circulant graphs $C_n(1,3)$ and $C_{2k+2}(1,k)$. For any prime $p$ and for the previous two families of circulant graphs along with the further families $C_n(1,4)$, $C_n(1,5)$, $C_n(1,6)$, $C_n(2,3)$, $C_n(2,4)$, $C_n(2,5)$, and $C_n(3,4)$, the same technique gives the $c_2$ invariant of the decompletions as the solution to a finite system of recurrence equations.

String Gaussian Processes

In this paper we introduce a novel framework for making exact nonparametric Bayesian inference on latent functions, that is particularly suitable for Big Data tasks. Firstly, we introduce a class of stochastic processes we refer to as string Gaussian processes (string GPs). The local nature of the construction of string GPs provides a principled insight into making exact, scalable, distributed, and flexible nonparametric Bayesian inference. Moreover, string GPs provide a flexible framework for building nonstationary functional priors from popular kernels, allowing for arbitrarily complex local patterns in the data, while ensuring some mild global regularity constraints. Furthermore, string GP priors naturally cope with heterogeneous input data, and the gradient of the learned latent function is readily available for explanatory analysis. Secondly, we provide some theoretical results relating our approach to the standard GP paradigm. In particular, we prove that some string GPs are Gaussian processes, which provides a complementary global perspective on our framework. Finally, we derive a scalable and distributed MCMC scheme for supervised learning tasks under string GP priors. The proposed MCMC scheme has computational time complexity $\mathcal{O}(n)$ and memory requirement $\mathcal{O}(dn)$, where $n$ is the data size and $d$ the dimension of the input space.