Highly Efficient Network (HENet) google
In order to enhance the real-time performance of convolutional neural networks(CNNs), more and more researchers are focusing on improving the efficiency of CNN. Based on the analysis of some CNN architectures, such as ResNet, DenseNet, ShuffleNet and so on, we combined their advantages and proposed a very efficient model called Highly Efficient Networks(HENet). The new architecture uses an unusual way to combine group convolution and channel shuffle which was mentioned in ShuffleNet. Inspired by ResNet and DenseNet, we also proposed a new way to use element-wise addition and concatenation connection with each block. In order to make greater use of feature maps, pooling operations are removed from HENet. The experiments show that our model’s efficiency is more than 1 times higher than ShuffleNet on many open source datasets, such as CIFAR-10/100 and SVHN. …

t-Exponential Memory Network google
Recent advances in deep learning have brought to the fore models that can make multiple computational steps in the service of completing a task; these are capable of describ- ing long-term dependencies in sequential data. Novel recurrent attention models over possibly large external memory modules constitute the core mechanisms that enable these capabilities. Our work addresses learning subtler and more complex underlying temporal dynamics in language modeling tasks that deal with sparse sequential data. To this end, we improve upon these recent advances, by adopting concepts from the field of Bayesian statistics, namely variational inference. Our proposed approach consists in treating the network parameters as latent variables with a prior distribution imposed over them. Our statistical assumptions go beyond the standard practice of postulating Gaussian priors. Indeed, to allow for handling outliers, which are prevalent in long observed sequences of multivariate data, multivariate t-exponential distributions are imposed. On this basis, we proceed to infer corresponding posteriors; these can be used for inference and prediction at test time, in a way that accounts for the uncertainty in the available sparse training data. Specifically, to allow for our approach to best exploit the merits of the t-exponential family, our method considers a new t-divergence measure, which generalizes the concept of the Kullback-Leibler divergence. We perform an extensive experimental evaluation of our approach, using challenging language modeling benchmarks, and illustrate its superiority over existing state-of-the-art techniques. …

PaToPaEM google
Grid topology and line parameters are essential for grid operation and planning, which may be missing or inaccurate in distribution grids. Existing data-driven approaches for recovering such information usually suffer from ignoring 1) input measurement errors and 2) possible state changes among historical measurements. While using the errors-in-variables (EIV) model and letting the parameter and topology estimation interact with each other (PaToPa) can address input and output measurement error modeling, it only works when all measurements are from a single system state. To solve the two challenges simultaneously, we propose the PaToPaEM framework for joint line parameter and topology estimation with historical measurements from different unknown states. We improve the static framework that only works when measurements are from one single state, by further treating state changes in historical measurements as an unobserved latent variable. We then systematically analyze the new mathematical modeling, decouple the optimization problem, and incorporate the expectation-maximization (EM) algorithm to recover different hidden states in measurements. Combining these, PaToPaEM framework enables joint topology and line parameter estimation using noisy measurements from multiple system states. It lays a solid foundation for data-driven system identification in distribution grids. Superior numerical results validate the practicability of the PaToPaEM framework. …

Universal Approximation Theorem google
In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. The theorem thus states that simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; it does not touch upon the algorithmic learnability of those parameters. One of the first versions of the theorem was proved by George Cybenko in 1989 for sigmoid activation functions. Kurt Hornik showed in 1991 that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators. The output units are always assumed to be linear. For notational convenience, only the single output case will be shown. The general case can easily be deduced from the single output case. …