In this paper we propose a new class of Dynamic Mixture Models (DAMMs) being able to sequentially adapt the mixture components as well as the mixture composition using information coming from the data. The information driven nature of the proposed class of models allows to exactly compute the full likelihood and to avoid computer intensive simulation schemes. An extensive Monte Carlo experiment reveals that the new proposed model can accurately approximate the more complicated Stochastic Dynamic Mixture Model previously introduced in the literature as well as other kind of models. The properties of the new proposed class of models are discussed through the paper and an application in financial econometrics is reported.
While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks– \textit{Internal Covariate Shift}– the current solution has multiple drawbacks. For instance, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate due to shifting parameter values (specially during initial training epochs). Another fundamental problem with BN is that it cannot be used with batch-size $1$ during training. We address these (and other) drawbacks of BN by proposing a non-adaptive normalization technique for removing covariate shift, that we call \textit{Normalization Propagation}. Our approach does not depend on batch statistics, but rather uses a data-independent parametric estimate of mean and standard-deviation in every layer; thus being computationally cheaper. We exploit the observation that the pre-activation before Rectified Linear Units follow a Gaussian distribution in deep networks, and that once the first and second order statistics of any given dataset are normalized, we can forward propagate this normalization without the need for recalculating the approximate statistics (using data) for any of the hidden layers.
We propose a Bayesian model of unsupervised semantic role induction in multiple languages, and use it to explore the usefulness of parallel corpora for this task. Our joint Bayesian model consists of individual models for each language plus additional latent variables that capture alignments between roles across languages. Because it is a generative Bayesian model, we can do evaluations in a variety of scenarios just by varying the inference procedure, without changing the model, thereby comparing the scenarios directly. We compare using only monolingual data, using a parallel corpus, using a parallel corpus with annotations in the other language, and using small amounts of annotation in the target language. We find that the biggest impact of adding a parallel corpus to training is actually the increase in mono-lingual data, with the alignments to another language resulting in small improvements, even with labeled data for the other language.
Two challenges in the theory and practice of cloud computing are: (1) smart control (allocation) of resources under exploration constraints, on time-varying systems, and (2) understanding and debugging of the performance of complex systems that involve e.g. virtualization. In this paper, we examine how these challenges can be approached using causal models. For challenge (1) we investigate how to infer and use causal models and selection diagrams to design and integrate experiments in a principled way, as well as to cope with partially varying systems. For challenge (2) we examine how to formalize performance attribution and debugging questions by counterfactual probabilities, and how to approximately answer them based on inferred (non-deterministic) graphical causal models.
Given $n$ colored balls, we want to detect if more than $\lfloor n/2\rfloor$ of them have the same color, and if so find one ball with such majority color. We are only allowed to choose two balls and compare their colors, and the goal is to minimize the total number of such operations. A well-known exercise is to show how to find such a ball with only $2n$ comparisons while using only a logarithmic number of bits for bookkeeping. The resulting algorithm is called the Boyer–Moore majority vote algorithm. It is known that any deterministic method needs $\lceil 3n/2\rceil-2$ comparisons in the worst case, and this is tight. However, it is not clear what is the required number of comparisons if we allow randomization. We construct a randomized algorithm which always correctly finds a ball of the majority color (or detects that there is none) using, with high probability, only $7n/6+o(n)$ comparisons. We also prove that the expected number of comparisons used by any such randomized method is at least $1.038n$.
In this paper we present new results for the combinatorics of web diagrams and web worlds. These are discrete objects that arise in the physics of calculating scattering amplitudes in non-abelian gauge theories. Web-colouring and web-mixing matrices (collectively known as web matrices) are indexed by ordered pairs of web-diagrams and contain information relating the number of colourings of the first web diagram that will produce the second diagram. We introduce the black diamond product on power series and show how it determines the web-colouring matrix of disjoint web worlds. Furthermore, we show that combining known physical results with the black diamond product gives a new technique for generating combinatorial identities. Due to the complicated action of the product on power series, the resulting identities appear highly non-trivial. We present two results to explain repeated entries that appear in the web matrices. The first of these shows how diagonal web matrix entries will be the same if the comparability graphs of their associated decomposition posets are the same. The second result concerns general repeated entries in conjunction with a flipping operation on web diagrams. We present a combinatorial proof of idempotency of the web-mixing matrices, previously established using physical arguments only. We also show how the entries of the square of the web-colouring matrix can be achieved by a linear transformation that maps the standard basis for formal power series in one variable to a sequence of polynomials. We look at one parameterized web world that is related to indecomposable permutations and show how determining the web-colouring matrix entries in this case is equivalent to a combinatorics on words problem.