Joint Greedy Equivalence Search (jointGES)
We consider the problem of jointly estimating multiple related directed acyclic graph (DAG) models based on high-dimensional data from each graph. This problem is motivated by the task of learning gene regulatory networks based on gene expression data from different tissues, developmental stages or disease states. We prove that under certain regularity conditions, the proposed $\ell_0$-penalized maximum likelihood estimator converges in Frobenius norm to the adjacency matrices consistent with the data-generating distributions and has the correct sparsity. In particular, we show that this joint estimation procedure leads to a faster convergence rate than estimating each DAG model separately. As a corollary we also obtain high-dimensional consistency results for causal inference from a mix of observational and interventional data. For practical purposes, we propose jointGES consisting of Greedy Equivalence Search (GES) to estimate the union of all DAG models followed by variable selection using lasso to obtain the different DAGs, and we analyze its consistency guarantees. The proposed method is illustrated through an analysis of simulated data as well as epithelial ovarian cancer gene expression data. …

Batch Normalization (BN)
Batch normalization is a technique for improving the performance and stability of artificial neural networks. Batch normalization was introduced in a 2015 paper (https://…/1502.03167.pdf ). It is used to normalize the input layer by adjusting and scaling the activations. It can mitigate the problem of internal covariate shift, where parameter initialization and changes in the distribution of the inputs of each layer affects the learning rate of the network. …

Gittins Index
The Gittins index is a measure of the reward that can be achieved through a given stochastic process with certain properties, namely: the process has an ultimate termination state and evolves with an option, at each intermediate state, of terminating. Upon terminating at a given state, the reward achieved is the sum of the probabilistic expected rewards associated with every state from the actual terminating state to the ultimate terminal state, inclusive. The index is a real scalar.
In applied mathematics, the ‘Gittins index’ is a real scalar value associated to the state of a stochastic process with a reward function and with a probability of termination. It is a measure of the reward that can be achieved by the process evolving from that state on, under the probability that it will be terminated in future. The ‘index policy’ induced by the Gittins index, consisting of choosing at any time the stochastic process with the currently highest Gittins index, is the solution of some stopping problems such as the one of dynamic allocation, where a decision-maker has to maximize the total reward by distributing a limited amount of effort to a number of competing projects, each returning a stochastic reward. If the projects are independent from each other and only one project at a time may evolve, the problem is called multi-armed bandit (one type of Stochastic scheduling problems) and the Gittins index policy is optimal. If multiple projects can evolve, the problem is called Restless bandit and the Gittins index policy is a known good heuristic but no optimal solution exists in general. In fact, in general this problem is NP-complete and it is generally accepted that no feasible solution can be found.
Multi-Armed Bandits and the Gittins Index

Banach Wasserstein GAN
Wasserstein Generative Adversarial Networks (WGANs) can be used to generate realistic samples from complicated image distributions. The Wasserstein metric used in WGANs is based on a notion of distance between individual images, which induces a notion of distance between probability distributions of images. So far the community has considered $\ell^2$ as the underlying distance. We generalize the theory of WGAN with gradient penalty to Banach spaces, allowing practitioners to select the features to emphasize in the generator. We further discuss the effect of some particular choices of underlying norms, focusing on Sobolev norms. Finally, we demonstrate the impact of the choice of norm on model performance and show state-of-the-art inception scores for non-progressive growing GANs on CIFAR-10. …