PathNet
For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks. Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function. We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and re-evolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after fine-tuning. Paths evolved on task B re-use parts of the optimal path evolved on task A. Positive transfer was demonstrated for binary MNIST, CIFAR, and SVHN supervised learning classification tasks, and a set of Atari and Labyrinth reinforcement learning tasks, suggesting PathNets have general applicability for neural network training. Finally, PathNet also significantly improves the robustness to hyperparameter choices of a parallel asynchronous reinforcement learning algorithm (A3C). …

Silent Choir
The cost of communication is a substantial factor affecting the scalability of many distributed applications. Every message sent can incur a cost in storage, computation, energy and bandwidth. Consequently, reducing the communication costs of distributed applications is highly desirable. The best way to reduce message costs is by communicating without sending any messages whatsoever. This paper initiates a rigorous investigation into the use of silence in synchronous settings, in which processes can fail. We formalize sufficient conditions for information transfer using silence, as well as necessary conditions for particular cases of interest. This allows us to identify message patterns that enable communication through silence. In particular, a pattern called a {\em silent choir} is identified, and shown to be central to information transfer via silence in failure-prone systems. The power of the new framework is demonstrated on the {\em atomic commitment} problem (AC). A complete characterization of the tradeoff between message complexity and round complexity in the synchronous model with crash failures is provided, in terms of lower bounds and matching protocols. In particular, a new message-optimal AC protocol is designed using silence, in which processes decide in~3 rounds in the common case. This significantly improves on the best previously known message-optimal AC protocol, in which decisions were performed in $\Theta(n)$ rounds. …

Multi-Layer Convolutional Sparse Coding (ML-CSC)
The recently proposed Multi-Layer Convolutional Sparse Coding (ML-CSC) model, consisting of a cascade of convolutional sparse layers, provides a new interpretation of Convolutional Neural Networks (CNNs). Under this framework, the computation of the forward pass in a CNN is equivalent to a pursuit algorithm aiming to estimate the nested sparse representation vectors — or feature maps — from a given input signal. Despite having served as a pivotal connection between CNNs and sparse modeling, a deeper understanding of the ML-CSC is still lacking: there are no pursuit algorithms that can serve this model exactly, nor are there conditions to guarantee a non-empty model. While one can easily obtain signals that approximately satisfy the ML-CSC constraints, it remains unclear how to simply sample from the model and, more importantly, how one can train the convolutional filters from real data. In this work, we propose a sound pursuit algorithm for the ML-CSC model by adopting a projection approach. We provide new and improved bounds on the stability of the solution of such pursuit and we analyze different practical alternatives to implement this in practice. We show that the training of the filters is essential to allow for non-trivial signals in the model, and we derive an online algorithm to learn the dictionaries from real data, effectively resulting in cascaded sparse convolutional layers. Last, but not least, we demonstrate the applicability of the ML-CSC model for several applications in an unsupervised setting, providing competitive results. Our work represents a bridge between matrix factorization, sparse dictionary learning and sparse auto-encoders, and we analyze these connections in detail. …