Memorized Sparse Backpropagation (MSBP) google
Neural network learning is typically slow since backpropagation needs to compute full gradients and backpropagate them across multiple layers. Despite its success of existing work in accelerating propagation through sparseness, the relevant theoretical characteristics remain unexplored and we empirically find that they suffer from the loss of information contained in unpropagated gradients. To tackle these problems, in this work, we present a unified sparse backpropagation framework and provide a detailed analysis of its theoretical characteristics. Analysis reveals that when applied to a multilayer perceptron, our framework essentially performs gradient descent using an estimated gradient similar enough to the true gradient, resulting in convergence in probability under certain conditions. Furthermore, a simple yet effective algorithm named memorized sparse backpropagation (MSBP) is proposed to remedy the problem of information loss by storing unpropagated gradients in memory for the next learning. The experiments demonstrate that the proposed MSBP is able to effectively alleviate the information loss in traditional sparse backpropagation while achieving comparable acceleration. …

SVD-based NMF Initialization google
Due to the iterative nature of most nonnegative matrix factorization (\textsc{NMF}) algorithms, initialization is a key aspect as it significantly influences both the convergence and the final solution obtained. Many initialization schemes have been proposed for NMF, among which one of the most popular class of methods are based on the singular value decomposition (SVD). However, these SVD-based initializations do not satisfy a rather natural condition, namely that the error should decrease as the rank of factorization increases. In this paper, we propose a novel SVD-based \textsc{NMF} initialization to specifically address this shortcoming by taking into account the SVD factors that were discarded to obtain a nonnegative initialization. This method, referred to as nonnegative SVD with low-rank correction (NNSVD-LRC), allows us to significantly reduce the initial error at a negligible additional computational cost using the low-rank structure of the discarded SVD factors. NNSVD-LRC has two other advantages compared to previous SVD-based initializations: (1) it provably generates sparse initial factors, and (2) it is faster as it only requires to compute a truncated SVD of rank $\lceil r/2 + 1 \rceil$ where $r$ is the factorization rank of the sought NMF decomposition (as opposed to a rank-$r$ truncated SVD for other methods). We show on several standard dense and sparse data sets that our new method competes favorably with state-of-the-art SVD-based initializations for NMF. …

Tree-CNN google
In recent years, Convolutional Neural Networks (CNNs) have shown remarkable performance in many computer vision tasks such as object recognition and detection. However, complex training issues, such as ‘catastrophic forgetting’ and hyper-parameter tuning, make incremental learning in CNNs a difficult challenge. In this paper, we propose a hierarchical deep neural network, with CNNs at multiple levels, and a corresponding training method for lifelong learning. The network grows in a tree-like manner to accommodate the new classes of data without losing the ability to identify the previously trained classes. The proposed network was tested on CIFAR-10 and CIFAR-100 datasets, and compared against the method of fine tuning specific layers of a conventional CNN. We obtained comparable accuracies and achieved 40% and 20% reduction in training effort in CIFAR-10 and CIFAR 100 respectively. The network was able to organize the incoming classes of data into feature-driven super-classes. Our model improves upon existing hierarchical CNN models by adding the capability of self-growth and also yields important observations on feature selective classification. …

Neural Markov Logic Network (NMLN) google
We introduce Neural Markov Logic Networks (NMLNs), a statistical relational learning system that borrows ideas from Markov logic. Like Markov Logic Networks (MLNs), NMLNs are an exponential-family model for modelling distributions over possible worlds, but unlike MLNs, they do not rely on explicitly specified first-order logic rules. Instead, NMLNs learn an implicit representation of such rules as a neural network that acts as a potential function on fragments of the relational structure. Interestingly, any MLN can be represented as an NMLN. Similarly to recently proposed Neural theorem provers (NTPs) [Rockt\’aschel and Riedel, 2017], NMLNs can exploit embeddings of constants but, unlike NTPs, NMLNs work well also in their absence. This is extremely important for predicting in settings other than the transductive one. We showcase the potential of NMLNs on knowledge-base completion tasks and on generation of molecular (graph) data. …