The ability to accurately detect and classify objects at varying pixel sizes in cluttered scenes is crucial to many Navy applications. However, detection performance of existing state-of the-art approaches such as convolutional neural networks (CNNs) degrade and suffer when applied to such cluttered and multi-object detection tasks. We conjecture that spatial relationships between objects in an image could be exploited to significantly improve detection accuracy, an approach that had not yet been considered by any existing techniques (to the best of our knowledge) at the time the research was conducted. We introduce a detection and classification technique called Spatially Related Detection with Convolutional Neural Networks (SPARCNN) that learns and exploits a probabilistic representation of inter-object spatial configurations within images from training sets for more effective region proposals to use with state-of-the-art CNNs. Our empirical evaluation of SPARCNN on the VOC 2007 dataset shows that it increases classification accuracy by 8% when compared to a region proposal technique that does not exploit spatial relations. More importantly, we obtained a higher performance boost of 18.8% when task difficulty in the test set is increased by including highly obscured objects and increased image clutter.
Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multi-talker separation), and speech dereverberation, as well as multi-microphone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.
Recurrent neural networks (RNNs) represent the state of the art in translation, image captioning, and speech recognition. They are also capable of learning algorithmic tasks such as long addition, copying, and sorting from a set of training examples. We demonstrate that RNNs can learn decryption algorithms — the mappings from plaintext to ciphertext — for three polyalphabetic ciphers (Vigen\ere, Autokey, and Enigma). Most notably, we demonstrate that an RNN with a 3000-unit Long Short-Term Memory (LSTM) cell can learn the decryption function of the Enigma machine. We argue that our model learns efficient internal representations of these ciphers 1) by exploring activations of individual memory neurons and 2) by comparing memory usage across the three ciphers. To be clear, our work is not aimed at ‘cracking’ the Enigma cipher. However, we do show that our model can perform elementary cryptanalysis by running known-plaintext attacks on the Vigen\ere and Autokey ciphers. Our results indicate that RNNs can learn algorithmic representations of black box polyalphabetic ciphers and that these representations are useful for cryptanalysis.
Estimating the number of principal components is one of the fundamental problems in many scientific fields such as signal processing (or the spiked covariance model). In this paper, we first derive the asymptotic expansion of the log-likelihood function and its infimum. Then we select the number of signals $k$ by maximizing the infimum of the log-likelihood function (MIL). We demonstrate that the MIL is consistent under the condition that $SNR/\sqrt{4(p-k+1)\log\log n/n}$ is uniformly bounded away from below by $1$. By re-examining the BIC (or the MDL) for the spiked covariance model, we find that the BIC is not suitable for the spiked covariance model, unless $SNR/\sqrt{4(p-k+1)\log n/n}$ is uniformly bounded away from below by $1$. Compared with the BIC, the MIL is consistent at much lower SNR. Numerical studies demonstrate our theoretical results.
We propose a novel model-based method for social network clustering in this paper. More precisely, we cluster a set of entities in a social network into disjoint communities based a newly adopted distance function. Our model not only allows mixed membership for each entity, but also provides reliable statistical inference on network structure. We design a Bayesian-based algorithm, the Gibbs sampling, to estimate membership parameters. We evaluate the performance of our algorithm by applying our model to two social network data, the Zachary club data and the bottlenose dolphin network data. Some concluding remarks and future work are discussed briefly at last.
In this paper, we develop some clique-based methods for social network clustering. The quality of clustering result is measured by a novel clique-based index, which is innovated from the modularity index proposed in [Newman 2006]. We design an effective algorithm based on recursive bipartition in order to maximize the objective function of the proposed index. Noting the optimization of the objective function is NP-hard when the network size or the parameter space is large, we relax the problem via an implicitly restarted Lanczos method from numerical algebra. One of the contributions of our method is that the proposed index of each community in the clustering result is higher than a predefined threshold, $p$, which is completely controlled by users. However, when the threshold is unknown or not given, we implement a tree-based strategy and propose a localized clustering algorithm which considers a localized threshold for each subnetwork to maximize the overall clique score of the ultimate clustering result. Finally, we exploit simulation experiments based on the stochastic block model to demonstrate the accuracy and efficiency of our algorithms, numerically and graphically.
In the last years, the Linked Data Cloud has achieved a size of more than 100 billion facts pertaining to a multitude of domains. However, accessing this information has been significantly challenging for lay users. Approaches to problems such as Question Answering on Linked Data and Link Discovery have notably played a role in increasing information access. These approaches are often based on handcrafted and/or statistical models derived from data observation. Recently, Deep Learning architectures based on Neural Networks called seq2seq have shown to achieve state-of-the-art results at translating sequences into sequences. In this direction, we propose Neural SPARQL Machines, end-to-end deep architectures to translate any natural language expression into sentences encoding SPARQL queries. Our preliminary results, restricted on selected DBpedia classes, show that Neural SPARQL Machines are a promising approach for Question Answering on Linked Data, as they can deal with known problems such as vocabulary mismatch and perform graph pattern composition.
The ubiquitous Lanczos method can approximate $f(A)x$ for any symmetric $n \times n$ matrix $A$, vector $x$, and function $f$. In exact arithmetic, the method’s error after $k$ iterations is bounded by the error of the best degree-$k$ polynomial uniformly approximating $f(x)$ on the range $[\lambda_{min}(A), \lambda_{max}(A)]$. However, despite decades of work, it has been unclear if this powerful guarantee holds in finite precision. We resolve this problem, proving that when $\max_{x \in [\lambda_{min}, \lambda_{max}]}|f(x)| \le C$, Lanczos essentially matches the exact arithmetic guarantee if computations use roughly $\log(nC\|A\|)$ bits of precision. Our proof extends work of Druskin and Knizhnerman [DK91], leveraging the stability of the classic Chebyshev recurrence to bound the stability of any polynomial approximating $f(x)$. We also study the special case of $f(A) = A^{-1}$, where stronger guarantees hold. In exact arithmetic Lanczos performs as well as the best polynomial approximating $1/x$ at each of $A$‘s eigenvalues, rather than on the full eigenvalue range. In seminal work, Greenbaum gives an approach to extending this bound to finite precision: she proves that finite precision Lanczos and the related CG method match any polynomial approximating $1/x$ in a tiny range around each eigenvalue [Gre89]. For $A^{-1}$, this bound appears stronger than ours. However, we exhibit matrices with condition number $\kappa$ where exact arithmetic Lanczos converges in $polylog(\kappa)$ iterations, but Greenbaum’s bound predicts $\Omega(\kappa^{1/5})$ iterations. It thus cannot offer significant improvement over the $O(\kappa^{1/2})$ bound achievable via our result. Our analysis raises the question of if convergence in less than $poly(\kappa)$ iterations can be expected in finite precision, even for matrices with clustered, skewed, or otherwise favorable eigenvalue distributions.
The matrix inversion is an interesting topic in algebra mathematics. However, to determine an inverse matrix from a given matrix is required many computation tools and time resource if the size of matrix is huge. In this paper, we have shown an inverse closed form for an interesting matrix which has much applications in communication system. Base on this inverse closed form, the channel capacity closed form of a communication system can be determined via the error rate parameter alpha