Dynamic Adaptive Mixture Models

In this paper we propose a new class of Dynamic Mixture Models (DAMMs) being able to sequentially adapt the mixture components as well as the mixture composition using information coming from the data. The information driven nature of the proposed class of models allows to exactly compute the full likelihood and to avoid computer intensive simulation schemes. An extensive Monte Carlo experiment reveals that the new proposed model can accurately approximate the more complicated Stochastic Dynamic Mixture Model previously introduced in the literature as well as other kind of models. The properties of the new proposed class of models are discussed through the paper and an application in financial econometrics is reported.


Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks– \textit{Internal Covariate Shift}– the current solution has multiple drawbacks. For instance, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate due to shifting parameter values (specially during initial training epochs). Another fundamental problem with BN is that it cannot be used with batch-size 1 during training. We address these (and other) drawbacks of BN by proposing a non-adaptive normalization technique for removing covariate shift, that we call \textit{Normalization Propagation}. Our approach does not depend on batch statistics, but rather uses a data-independent parametric estimate of mean and standard-deviation in every layer; thus being computationally cheaper. We exploit the observation that the pre-activation before Rectified Linear Units follow a Gaussian distribution in deep networks, and that once the first and second order statistics of any given dataset are normalized, we can forward propagate this normalization without the need for recalculating the approximate statistics (using data) for any of the hidden layers.


A Bayesian Model of Multilingual Unsupervised Semantic Role Induction

We propose a Bayesian model of unsupervised semantic role induction in multiple languages, and use it to explore the usefulness of parallel corpora for this task. Our joint Bayesian model consists of individual models for each language plus additional latent variables that capture alignments between roles across languages. Because it is a generative Bayesian model, we can do evaluations in a variety of scenarios just by varying the inference procedure, without changing the model, thereby comparing the scenarios directly. We compare using only monolingual data, using a parallel corpus, using a parallel corpus with annotations in the other language, and using small amounts of annotation in the target language. We find that the biggest impact of adding a parallel corpus to training is actually the increase in mono-lingual data, with the alignments to another language resulting in small improvements, even with labeled data for the other language.


Causal models for debugging and control in cloud computing

Two challenges in the theory and practice of cloud computing are: (1) smart control (allocation) of resources under exploration constraints, on time-varying systems, and (2) understanding and debugging of the performance of complex systems that involve e.g. virtualization. In this paper, we examine how these challenges can be approached using causal models. For challenge (1) we investigate how to infer and use causal models and selection diagrams to design and integrate experiments in a principled way, as well as to cope with partially varying systems. For challenge (2) we examine how to formalize performance attribution and debugging questions by counterfactual probabilities, and how to approximately answer them based on inferred (non-deterministic) graphical causal models.


Randomized algorithms for finding a majority element

Given n colored balls, we want to detect if more than \lfloor n/2\rfloor of them have the same color, and if so find one ball with such majority color. We are only allowed to choose two balls and compare their colors, and the goal is to minimize the total number of such operations. A well-known exercise is to show how to find such a ball with only 2n comparisons while using only a logarithmic number of bits for bookkeeping. The resulting algorithm is called the Boyer–Moore majority vote algorithm. It is known that any deterministic method needs \lceil 3n/2\rceil-2 comparisons in the worst case, and this is tight. However, it is not clear what is the required number of comparisons if we allow randomization. We construct a randomized algorithm which always correctly finds a ball of the majority color (or detects that there is none) using, with high probability, only 7n/6+o(n) comparisons. We also prove that the expected number of comparisons used by any such randomized method is at least 1.038n.


Web matrices: structural properties and generating combinatorial identities

In this paper we present new results for the combinatorics of web diagrams and web worlds. These are discrete objects that arise in the physics of calculating scattering amplitudes in non-abelian gauge theories. Web-colouring and web-mixing matrices (collectively known as web matrices) are indexed by ordered pairs of web-diagrams and contain information relating the number of colourings of the first web diagram that will produce the second diagram. We introduce the black diamond product on power series and show how it determines the web-colouring matrix of disjoint web worlds. Furthermore, we show that combining known physical results with the black diamond product gives a new technique for generating combinatorial identities. Due to the complicated action of the product on power series, the resulting identities appear highly non-trivial. We present two results to explain repeated entries that appear in the web matrices. The first of these shows how diagonal web matrix entries will be the same if the comparability graphs of their associated decomposition posets are the same. The second result concerns general repeated entries in conjunction with a flipping operation on web diagrams. We present a combinatorial proof of idempotency of the web-mixing matrices, previously established using physical arguments only. We also show how the entries of the square of the web-colouring matrix can be achieved by a linear transformation that maps the standard basis for formal power series in one variable to a sequence of polynomials. We look at one parameterized web world that is related to indecomposable permutations and show how determining the web-colouring matrix entries in this case is equivalent to a combinatorics on words problem.


Simultaneous Inference for High-dimensional Linear Models

Learning Physical Intuition of Block Towers by Example

Statistical mechanics of clonal expansion in lymphocyte networks modelled with slow and fast variables

Joint Learning Templates and Slots for Event Schema Induction

Simplified Relative Citation Ratio for Static Paper Ranking: UFMG/LATIN at WSDM Cup 2016

Cloud Server Benchmarks for Performance Evaluation of New Hardware Architecture

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

Inclusion-exclusion principles for convex hulls and the Euler relation

Learning deep representation of multityped objects and tasks

Neural Architectures for Named Entity Recognition

Cubic arc-transitive $k$-circulants

Hamiltonian prisms on 5-chordal graphs

Numerical CP Decomposition of Some Difficult Tensors

A Unified View of Localized Kernel Learning

Lasso estimation for GEFCom2014 probabilistic electric load forecasting

Reptilings and space-filling curves for acute triangles

In the Search of Optimal Concurrency

Short time full asymptotic expansion of hypoelliptic heat kernel at the cut locus

Pluriassociative algebras II: The polydendriform operad and related operads

Latent class analyisis for reliable measure of inflation expectation in the indian public

Sampling approach to sparse approximation problem: determining degrees of freedom by simulated annealing

Design Heuristic for Parallel Many Server Systems under FCFS-ALIS

TANGO: Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation

A pairwise comparison approach to ranking in chess team championships

Contextual trace refinement for concurrent objects: Safety and progress

Dynamic Memory Networks for Visual and Textual Question Answering

Short note on the number of 1-ascents in dispersed dyck paths

Estimating Non-Simplified Vine Copulas Using Penalized Splines

Cubic graphs and related triangulations on orientable surfaces

A Generalization of the Power Law Distribution with Nonlinear Exponent

Sequential ranking under random semi-bandit feedback

Almost invariance of distributions for random walks on groups

Vine copula based inference of multivariate event time data

Gibbs Random Fields and Markov Random Fields with Constraints

Distributed $(Δ+1)$-Coloring in Sublogarithmic Rounds

Performance Localisation

Terraces for Small Groups

Arborified multiple zeta values

Mesoscopic eigenvalue statistics of Wigner matrices

Multidimensional Lévy White Noises in Weighted Besov Spaces

Inferential Privacy Guarantees for Differentially Private Mechanisms

Rapidly Mixing Markov Chains: A Comparison of Techniques (A Survey)

Optimized Polynomial Evaluation with Semantic Annotations

Analyzing Games with Ambiguous Player Types using the ${\rm MINthenMAX}$ Decision Model

Random Locations of Periodic Stationary Processes

Delta State Replicated Data Types

Effective use of the PGAS Paradigm: Driving Transformations and Self-Adaptive Behavior in DASH-Applications

Parallel Texts in the Hebrew Bible, New Methods and Visualizations

Text Understanding with the Attention Sum Reader Network

Perfect and $\varepsilon$-Perfect Simulation Methods for the One Dimensional Kac Equation

A Randomized Misfit Approach for Data Reduction in Large-Scale Inverse Problems

X-rank and identifiability for a polynomial decomposition model

Sentiment Analysis in Scholarly Book Reviews

Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

Limiting Distribution of the Rightmost Particle in Catalytic Branching Brownian Motion

Computing Shortest Paths Using A*, Landmarks, and Polygon Inequalities (Abstract)