Bayesian Least-Squares Policy Iteration (BLSPI) google
We introduce Bayesian least-squares policy iteration (BLSPI), an off-policy, model-free, policy iteration algorithm that uses the Bayesian least-squares temporal-difference (BLSTD) learning algorithm to evaluate policies. An online variant of BLSPI has been also proposed, called randomised BLSPI (RBLSPI), that improves its policy based on an incomplete policy evaluation step. In online setting, the exploration-exploitation dilemma should be addressed as we try to discover the optimal policy by using samples collected by ourselves. RBLSPI exploits the advantage of BLSTD to quantify our uncertainty about the value function. Inspired by Thompson sampling, RBLSPI first samples a value function from a posterior distribution over value functions, and then selects actions based on the sampled value function. The effectiveness and the exploration abilities of RBLSPI are demonstrated experimentally in several environments. …

Siamese Deep Forest (SDF) google
A Siamese Deep Forest (SDF) is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. It can be also regarded as an alternative to the well-known Siamese neural networks. The SDF uses a modified training set consisting of concatenated pairs of vectors. Moreover, it defines the class distributions in the deep forest as the weighted sum of the tree class probabilities such that the weights are determined in order to reduce distances between similar pairs and to increase them between dissimilar points. We show that the weights can be obtained by solving a quadratic optimization problem. The SDF aims to prevent overfitting which takes place in neural networks when only limited training data are available. The numerical experiments illustrate the proposed distance metric method. …

FlexNGIA google
From virtual reality and telepresence, to augmented reality, holoportation, and remotely controlled robotics, these future network applications promise an unprecedented development for society, economics and culture by revolutionizing the way we live, learn, work and play. In order to deploy such futuristic applications and to cater to their performance requirements, recent trends stressed the need for the Tactile Internet, an Internet that, according to the International Telecommunication Union, combines ultra low latency with extremely high availability, reliability and security. Unfortunately, today’s Internet falls short when it comes to providing such stringent requirements due to several fundamental limitations in the design of the current network architecture and communication protocols. This brings the need to rethink the network architecture and protocols, and efficiently harness recent technological advances in terms of virtualization and network softwarization to design the Tactile Internet of the future. In this paper, we start by analyzing the characteristics and requirements of future networking applications. We then highlight the limitations of the traditional network architecture and protocols and their inability to cater to these requirements. Afterward, we put forward a novel network architecture adapted to the Tactile Internet called FlexNGIA, a Flexible Next-Generation Internet Architecture. We then describe some use-cases where we discuss the potential mechanisms and control loops that could be offered by FlexNGIA in order to ensure the required performance and reliability guarantees for future applications. Finally, we identify the key research challenges to further develop FlexNGIA towards a full-fledged architecture for the future Tactile Internet. …

Multi-Discriminator Generative Adversarial Network (MDGAN) google
A recent technical breakthrough in the domain of machine learning is the discovery and the multiple applications of Generative Adversarial Networks (GANs). Those generative models are computationally demanding, as a GAN is composed of two deep neural networks, and because it trains on large datasets. A GAN is generally trained on a single server. In this paper, we address the problem of distributing GANs so that they are able to train over datasets that are spread on multiple workers. MD-GAN is exposed as the first solution for this problem: we propose a novel learning procedure for GANs so that they fit this distributed setup. We then compare the performance of MD-GAN to an adapted version of Federated Learning to GANs, using the MNIST and CIFAR10 datasets. MD-GAN exhibits a reduction by a factor of two of the learning complexity on each worker node, while providing better performances than federated learning on both datasets. We finally discuss the practical implications of distributing GANs. …