Data-free Automatic Acceleration of Convolutional Networks (DAC)
Deploying a deep learning model on mobile/IoT devices is a challenging task. The difficulty lies in the trade-off between computation speed and accuracy. A complex deep learning model with high accuracy runs slowly on resource-limited devices, while a light-weight model that runs much faster loses accuracy. In this paper, we propose a novel decomposition method, namely DAC, that is capable of factorizing an ordinary convolutional layer into two layers with much fewer parameters. DAC computes the corresponding weights for the newly generated layers directly from the weights of the original convolutional layer. Thus, no training (or fine-tuning) or any data is needed. The experimental results show that DAC reduces a large number of floating-point operations (FLOPs) while maintaining high accuracy of a pre-trained model. If 2% accuracy drop is acceptable, DAC saves 53% FLOPs of VGG16 image classification model on ImageNet dataset, 29% FLOPS of SSD300 object detection model on PASCAL VOC2007 dataset, and 46% FLOPS of a multi-person pose estimation model on Microsoft COCO dataset. Compared to other existing decomposition methods, DAC achieves better performance. …
Non-Intrusive Probabilistic Power Flow
In this paper, a novel non-intrusive probabilistic power flow (PPF) analysis method based on the low-rank approximation (LRA) is proposed, which can accurately and efficiently estimate the probabilistic characteristics (e.g., mean, variance, probability density function) of the PPF solutions. This method aims at building up a statistically-equivalent surrogate for the PPF solutions through a small number of power flow evaluations. By exploiting the retained tensor-product form of the univariate polynomial basis, a sequential correction-updating scheme is applied, making the total number of unknowns to be linear rather than exponential to the number of random inputs. Consequently, the LRA method is particularly promising for dealing with high-dimensional problems with a large number of random inputs. Numerical studies on the IEEE 39-bus, 118-bus, and 1354-bus systems show that the proposed method can achieve accurate probabilistic characteristics of the PPF solutions with much less computational effort compared to the Monte Carlo simulations. Even compared to the polynomial chaos expansion method, the LRA method can achieve comparable accuracy, while the LRA method is more capable of handling higher-dimensional problems. Moreover, numerical results reveal that the randomness brought about by the renewable energy resources and loads may inevitably affect the feasibility of dispatch/planning schemes. …
CactusNet
Deep neural networks trained over large datasets learn features that are both generic to the whole dataset, and specific to individual classes in the dataset. Learned features tend towards generic in the lower layers and specific in the higher layers of a network. Methods like fine-tuning are made possible because of the ability for one filter to apply to multiple target classes. Much like the human brain this behavior, can also be used to cluster and separate classes. However, to the best of our knowledge there is no metric for how applicable learned features are to specific classes. In this paper we propose a definition and metric for measuring the applicability of learned features to individual classes, and use this applicability metric to estimate input applicability and produce a new method of unsupervised learning we call the CactusNet. …
Differentiable ARchitecture Compression (DARC)
In many learning situations, resources at inference time are significantly more constrained than resources at training time. This paper studies a general paradigm, called Differentiable ARchitecture Compression (DARC), that combines model compression and architecture search to learn models that are resource-efficient at inference time. Given a resource-intensive base architecture, DARC utilizes the training data to learn which sub-components can be replaced by cheaper alternatives. The high-level technique can be applied to any neural architecture, and we report experiments on state-of-the-art convolutional neural networks for image classification. For a WideResNet with $97.2\%$ accuracy on CIFAR-10, we improve single-sample inference speed by $2.28\times$ and memory footprint by $5.64\times$, with no accuracy loss. For a ResNet with $79.15\%$ Top1 accuracy on ImageNet, we improve batch inference speed by $1.29\times$ and memory footprint by $3.57\times$ with $1\%$ accuracy loss. We also give theoretical Rademacher complexity bounds in simplified cases, showing how DARC avoids overfitting despite over-parameterization. …
If you did not already know
22 Friday Jul 2022
Posted What is ...
in