Probabilistic Binary Neural Network (BLRNet)
Low bit-width weights and activations are an effective way of combating the increasing need for both memory and compute power of Deep Neural Networks. In this work, we present a probabilistic training method for Neural Network with both binary weights and activations, called BLRNet. By embracing stochasticity during training, we circumvent the need to approximate the gradient of non-differentiable functions such as sign(), while still obtaining a fully Binary Neural Network at test time. Moreover, it allows for anytime ensemble predictions for improved performance and uncertainty estimates by sampling from the weight distribution. Since all operations in a layer of the BLRNet operate on random variables, we introduce stochastic versions of Batch Normalization and max pooling, which transfer well to a deterministic network at test time. We evaluate the BLRNet on multiple standardized benchmarks. …
Learning Solving Procedure
It is expected that progress toward true artificial intelligence will be achieved through the emergence of a system that integrates representation learning and complex reasoning (LeCun et al. 2015). In response to this prediction, research has been conducted on implementing the symbolic reasoning of a von Neumann computer in an artificial neural network (Graves et al. 2016; Graves et al. 2014; Reed et al. 2015). However, these studies have many limitations in realizing neural-symbolic integration (Jaeger. 2016). Here, we present a new learning paradigm: a learning solving procedure (LSP) that learns the procedure for solving complex problems. This is not accomplished merely by learning input-output data, but by learning algorithms through a solving procedure that obtains the output as a sequence of tasks for a given input problem. The LSP neural network system not only learns simple problems of addition and multiplication, but also the algorithms of complicated problems, such as complex arithmetic expression, sorting, and Hanoi Tower. To realize this, the LSP neural network structure consists of a deep neural network and long short-term memory, which are recursively combined. Through experimentation, we demonstrate the efficiency and scalability of LSP and its validity as a mechanism of complex reasoning. …
Probabilistic Computation Tree Logic (PCTL)
In this paper, we develop approximate dynamic programming methods for stochastic systems modeled as Markov Decision Processes, given both soft performance criteria and hard constraints in a class of probabilistic temporal logic called Probabilistic Computation Tree Logic (PCTL). Our approach consists of two steps: First, we show how to transform a class of PCTL formulas into chance constraints that can be enforced during planning in stochastic systems. Second, by integrating randomized optimization and entropy-regulated dynamic programming, we devise a novel trajectory sampling-based approximate value iteration method to iteratively solve for an upper bound on the value function while ensuring the constraints that PCTL specifications are satisfied. Particularly, we show that by the on-policy sampling of the trajectories, a tight bound can be achieved between the upper bound given by the approximation and the true value function. The correctness and efficiency of the method are demonstrated using robotic motion planning examples. …
Weight-Median Sketch
We introduce a new sub-linear space data structure—the Weight-Median Sketch—that captures the most heavily weighted features in linear classifiers trained over data streams. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. In contrast with related sketches that capture the most commonly occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis of this approach that establishes recovery guarantees in the online learning setting, and demonstrate substantial empirical improvements in accuracy-memory trade-offs over alternatives, including count-based sketches and feature hashing. …
If you did not already know
19 Wednesday Jan 2022
Posted What is ...
in