We demonstrate the use of tensor networks for image classification with the TensorNetwork open source library. We explain in detail the encoding of image data into a matrix product state form, and describe how to contract the network in a way that is parallelizable and well-suited to automatic gradients for optimization. Applying the technique to the MNIST and Fashion-MNIST datasets we find out-of-the-box performance of 98% and 88% accuracy, respectively, using the same tensor network architecture. The TensorNetwork library allows us to seamlessly move from CPU to GPU hardware, and we see a factor of more than 10 improvement in computational speed using a GPU.
Neural methods for SA have led to quantitative improvements over previous approaches, but these advances are not always accompanied with a thorough analysis of the qualitative differences. Therefore, it is not clear what outstanding conceptual challenges for sentiment analysis remain. In this work, we attempt to discover what challenges still prove a problem for sentiment classifiers for English and to provide a challenging dataset. We collect the subset of sentences that an (oracle) ensemble of state-of-the-art sentiment classifiers misclassify and then annotate them for 18 linguistic and paralinguistic phenomena, such as negation, sarcasm, modality, etc. The dataset is available at https://…/assessing_and_probing_sentiment. Finally, we provide a case study that demonstrates the usefulness of the dataset to probe the performance of a given sentiment classifier with respect to linguistic phenomena.
Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants, mainly asynchronous and synchronous SGD, are widely used. Asynchronous SGD is communication efficient but suffers from accuracy degradation due to delayed parameter updating. Synchronous SGD becomes communication intensive when the number of nodes increases regardless of its advantage. To address these issues, we introduce Layered SGD (LSGD), a new decentralized synchronous SGD algorithm. LSGD partitions computing resources into subgroups that each contain a communication layer (communicator) and a computation layer (worker). Each subgroup has centralized communication for parameter updates while communication between subgroups is handled by communicators. As a result, communication time is overlapped with I/O latency of workers. The efficiency of the algorithm is tested by training a deep network on the ImageNet classification task.
We present a complete categorical axiomatization of a wide class of dataflow programs. This gives a three-dimensional diagrammatic language for workflows, more expressive than the directed acyclic graphs generally used for this purpose. This calls for an implementation of these representations in data transformation tools.
Network Embedding (NE) methods, which map network nodes to low-dimensional feature vectors, have wide applications in network analysis and bioinformatics. Many existing NE methods rely only on network structure, overlooking other information associated with the nodes, e.g., text describing the nodes. Recent attempts to combine the two sources of information only consider local network structure. We extend NODE2VEC, a well-known NE method that considers broader network structure, to also consider textual node descriptors using recurrent neural encoders. Our method is evaluated on link prediction in two networks derived from UMLS. Experimental results demonstrate the effectiveness of the proposed approach compared to previous work.
Many time-series classification problems involve developing metrics that are invariant to temporal misalignment. In human activity analysis, temporal misalignment arises due to various reasons including differing initial phase, sensor sampling rates, and elastic time-warps due to subject-specific biomechanics. Past work in this area has only looked at reducing intra-class variability by elastic temporal alignment. In this paper, we propose a hybrid model-based and data-driven approach to learn warping functions that not just reduce intra-class variability, but also increase inter-class separation. We call this a temporal transformer network (TTN). TTN is an interpretable differentiable module, which can be easily integrated at the front end of a classification network. The module is capable of reducing intra-class variance by generating input-dependent warping functions which lead to rate-robust representations. At the same time, it increases inter-class variance by learning warping functions that are more discriminative. We show improvements over strong baselines in 3D action recognition on challenging datasets using the proposed framework. The improvements are especially pronounced when training sets are smaller.
We describe a knowledge graph derived from Twitter data with the goal of discovering relationships between people, links, and topics. The goal is to filter out noise from Twitter and surface an inside-out view that relies on high quality content. The generated graph contains many relationships where the user can query and traverse the structure from different angles allowing the development of new applications.
The vast increase in amount and complexity of digital content led to a wide interest in ad-hoc retrieval systems in recent years. Complementary, the existence of heterogeneous data sources and retrieval models stimulated the proliferation of increasingly ingenious and effective rank aggregation functions. Although recently proposed rank aggregation functions are promising with respect to effectiveness, existing proposals in the area usually overlook efficiency aspects. We propose an innovative rank aggregation function that is unsupervised, intrinsically multimodal, and targeted for fast retrieval and top effectiveness performance. We introduce the concepts of embedding and indexing of graph-based rank-aggregation representation models, and their application for search tasks. Embedding formulations are also proposed for graph-based rank representations. We introduce the concept of fusion vectors, a late-fusion representation of objects based on ranks, from which an intrinsically rank-aggregation retrieval model is defined. Next, we present an approach for fast retrieval based on fusion vectors, thus promoting an efficient rank aggregation system. Our method presents top effectiveness performance among state-of-the-art related work, while bringing novel aspects of multimodality and effectiveness. Consistent speedups are achieved against the recent baselines in all datasets considered.
The deep layers of modern neural networks extract a rather rich set of features as an input propagates through the network. This paper sets out to harvest these rich intermediate representations for quantization with minimal accuracy loss while significantly reducing the memory footprint and compute intensity of the DNN. This paper utilizes knowledge distillation through teacher-student paradigm (Hinton et al., 2015) in a novel setting that exploits the feature extraction capability of DNNs for higher-accuracy quantization. As such, our algorithm logically divides a pretrained full-precision DNN to multiple sections, each of which exposes intermediate features to train a team of students independently in the quantized domain. This divide and conquer strategy, in fact, makes the training of each student section possible in isolation while all these independently trained sections are later stitched together to form the equivalent fully quantized network. Experiments on various DNNs (LeNet, ResNet-20, SVHN and VGG-11) show that, on average, this approach – called DCQ (Divide and Conquer Quantization) – achieves on average 9.7% accuracy improvement to a state-of-the-art quantized training technique, DoReFa (Zhou et al., 2016) for binary and ternary networks.
Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. In this article, we provide a comprehensive survey of the recent developments in this field. We will focus on the works which use deep learning techniques to estimate depth from one or multiple images. Deep learning, coupled with the availability of large training datasets, have revolutionized the way the depth reconstruction problem is being approached by the research community. In this article, we survey more than 100 key contributions that appeared in the past five years, summarize the most commonly used pipelines, and discuss their benefits and limitations. In retrospect of what has been achieved so far, we also conjecture what the future may hold for learning-based depth reconstruction research.
This paper describes a new method, HMM gauge likelihood analysis, or GLA, of detecting anomalies in discrete time series using Hidden Markov Models and clustering. At the center of the method lies the comparison of subsequences. To achieve this, they first get assigned to their Hidden Markov Models using the Baum-Welch algorithm. Next, those models are described by an approximating representation of the probability distributions they define. Finally, this representation is then analyzed with the help of some clustering technique or other outlier detection tool and anomalies are detected. Clearly, HMMs could be substituted by some other appropriate model, e.g. some other dynamic Bayesian network. Our learning algorithm is unsupervised, so it does not require the labeling of large amounts of data. The usability of this method is demonstrated by applying it to synthetic and real-world syslog data.
Automatic post-editing (APE) seeks to automatically refine the output of a black-box machine translation (MT) system through human post-edits. APE systems are usually trained by complementing human post-edited data with large, artificial data generated through back-translations, a time-consuming process often no easier than training an MT system from scratch. In this paper, we propose an alternative where we fine-tune pre-trained BERT models on both the encoder and decoder of an APE system, exploring several parameter sharing strategies. By only training on a dataset of 23K sentences for 3 hours on a single GPU, we obtain results that are competitive with systems that were trained on 5M artificial sentences. When we add this artificial data, our method obtains state-of-the-art results.
In this paper we propose a method of obtaining points of extreme overfitting – parameters of modern neural networks, at which they demonstrate close to 100 % training accuracy, simultaneously with almost zero accuracy on the test sample. Despite the widespread opinion that the overwhelming majority of critical points of the loss function of a neural network have equally good generalizing ability, such points have a huge generalization error. The paper studies the properties of such points and their location on the surface of the loss function of modern neural networks.
Today, the dominant paradigm for training neural networks involves minimizing task loss on a large dataset. Using world knowledge to inform a model, and yet retain the ability to perform end-to-end training remains an open question. In this paper, we present a novel framework for introducing declarative knowledge to neural network architectures in order to guide training and prediction. Our framework systematically compiles logical statements into computation graphs that augment a neural network without extra learnable parameters or manual redesign. We evaluate our modeling strategy on three tasks: machine comprehension, natural language inference, and text chunking. Our experiments show that knowledge-augmented networks can strongly improve over baselines, especially in low-data regimes.
$k$ nearest neighbor ($k$NN) queries and skyline queries are important operators on multi-dimensional data points. Given a query point, $k$NN query returns the $k$ nearest neighbors based on a scoring function such as a weighted sum of the attributes, which requires predefined attribute weights (or preferences). Skyline query returns all possible nearest neighbors for any monotonic scoring functions without requiring attribute weights but the number of returned points can be prohibitively large. We observe that both $k$NN and skyline are inflexible and cannot be easily customized. In this paper, we propose a novel \emph{eclipse} operator that generalizes the classic $1$NN and skyline queries and provides a more flexible and customizable query solution for users. In eclipse, users can specify rough and customizable attribute preferences and control the number of returned points. We show that both $1$NN and skyline are instantiations of eclipse. To process eclipse queries, we propose a baseline algorithm with time complexity $O(n^22^{d-1})$, and an improved $O(n\log ^{d-1}n)$ time transformation-based algorithm, where $n$ is the number of points and $d$ is the number of dimensions. Furthermore, we propose a novel index-based algorithm utilizing duality transform with much better efficiency. The experimental results on the real NBA dataset and the synthetic datasets demonstrate the effectiveness of the eclipse operator and the efficiency of our eclipse algorithms.