Whats new on arXiv

Michael Laux

5 years ago

Advertisements

Structure fusion based on graph convolutional networks for semi-supervised classification

Suffering from the multi-view data diversity and complexity for semi-supervised classification, most of existing graph convolutional networks focus on the networks architecture construction or the salient graph structure preservation, and ignore the the complete graph structure for semi-supervised classification contribution. To mine the more complete distribution structure from multi-view data with the consideration of the specificity and the commonality, we propose structure fusion based on graph convolutional networks (SF-GCN) for improving the performance of semi-supervised classification. SF-GCN can not only retain the special characteristic of each view data by spectral embedding, but also capture the common style of multi-view data by distance metric between multi-graph structures. Suppose the linear relationship between multi-graph structures, we can construct the optimization function of structure fusion model by balancing the specificity loss and the commonality loss. By solving this function, we can simultaneously obtain the fusion spectral embedding from the multi-view data and the fusion structure as adjacent matrix to input graph convolutional networks for semi-supervised classification. Experiments demonstrate that the performance of SF-GCN outperforms that of the state of the arts on three challenging datasets, which are Cora,Citeseer and Pubmed in citation networks.

Generalized Principal Component Analysis

Generalized principal component analysis (GLM-PCA) facilitates dimension reduction of non-normally distributed data. We provide a detailed derivation of GLM-PCA with a focus on optimization. We also demonstrate how to incorporate covariates, and suggest post-processing transformations to improve interpretability of latent factors.

Measuring the Data Efficiency of Deep Learning Methods

In this paper, we propose a new experimental protocol and use it to benchmark the data efficiency — performance as a function of training set size — of two deep learning algorithms, convolutional neural networks (CNNs) and hierarchical information-preserving graph-based slow feature analysis (HiGSFA), for tasks in classification and transfer learning scenarios. The algorithms are trained on different-sized subsets of the MNIST and Omniglot data sets. HiGSFA outperforms standard CNN networks when the models are trained on 50 and 200 samples per class for MNIST classification. In other cases, the CNNs perform better. The results suggest that there are cases where greedy, locally optimal bottom-up learning is equally or more powerful than global gradient-based learning.

Collecting Indicators of Compromise from Unstructured Text of Cybersecurity Articles using Neural-Based Sequence Labelling

Indicators of Compromise (IOCs) are artifacts observed on a network or in an operating system that can be utilized to indicate a computer intrusion and detect cyber-attacks in an early stage. Thus, they exert an important role in the field of cybersecurity. However, state-of-the-art IOCs detection systems rely heavily on hand-crafted features with expert knowledge of cybersecurity, and require large-scale manually annotated corpora to train an IOC classifier. In this paper, we propose using an end-to-end neural-based sequence labelling model to identify IOCs automatically from cybersecurity articles without expert knowledge of cybersecurity. By using a multi-head self-attention module and contextual features, we find that the proposed model is capable of gathering contextual information from texts of cybersecurity articles and performs better in the task of IOC identification. Experiments show that the proposed model outperforms other sequence labelling models, achieving the average F1-score of 89.0% on English cybersecurity article test set, and approximately the average F1-score of 81.8% on Chinese test set.

Large Scale Adversarial Representation Learning

Adversarially trained generative models (GANs) have recently achieved compelling image synthesis results. But despite early successes in using GANs for unsupervised representation learning, they have since been superseded by approaches based on self-supervision. In this work we show that progress in image generation quality translates to substantially improved representation learning performance. Our approach, BigBiGAN, builds upon the state-of-the-art BigGAN model, extending it to representation learning by adding an encoder and modifying the discriminator. We extensively evaluate the representation learning and generation capabilities of these BigBiGAN models, demonstrating that these generation-based models achieve the state of the art in unsupervised representation learning on ImageNet, as well as in unconditional image generation.

Cross-classified multilevel models

Cross-classified multilevel modelling is an extension of standard multilevel modelling for non-hierarchical data that have cross-classified structures. Traditional multilevel models involve hierarchical data structures whereby lower level units such as students are nested within higher level units such as schools and where these higher level units may in turn be nested within further groupings or clusters such as school districts, regions, and countries. With hierarchical data structures, there is an exact nesting of each lower level unit in one and only one higher level unit. For example, each student attends one school, each school is located within one school district, and so on. However, social reality is more complicated than this, and so social and behavioural data often do not follow pure or strict hierarchies. Two types of non-hierarchical data structures which often appear in practice are cross-classified and multiple membership structures. In this article, we describe cross-classified data structures and cross-classified hierarchical linear modelling which can be used to analyse them.

A Unified Framework of Online Learning Algorithms for Training Recurrent Neural Networks

We present a framework for compactly summarizing many recent results in efficient and/or biologically plausible online training of recurrent neural networks (RNN). The framework organizes algorithms according to several criteria: (a) past vs. future facing, (b) tensor structure, (c) stochastic vs. deterministic, and (d) closed form vs. numerical. These axes reveal latent conceptual connections among several recent advances in online learning. Furthermore, we provide novel mathematical intuitions for their degree of success. Testing various algorithms on two synthetic tasks shows that performances cluster according to our criteria. Although a similar clustering is also observed for gradient alignment, alignment with exact methods does not alone explain ultimate performance, especially for stochastic algorithms. This suggests the need for better comparison metrics.

Importance of Small Probability Events in Big Data: Information Measures, Applications, and Challenges

In many applications (e.g., anomaly detection and security systems) of smart cities, rare events dominate the importance of the total information of big data collected by Internet of Things (IoTs). That is, it is pretty crucial to explore the valuable information associated with the rare events involved in minority subsets of the voluminous amounts of data. To do so, how to effectively measure the information with importance of the small probability events from the perspective of information theory is a fundamental question. This paper first makes a survey of some theories and models with respect to importance measures and investigates the relationship between subjective or semantic importance and rare events in big data. Moreover, some applications for message processing and data analysis are discussed in the viewpoint of information measures. In addition, based on rare events detection, some open challenges related to information measures, such as smart cities, autonomous driving, and anomaly detection in IoTs, are introduced which can be considered as future research directions.

Extraction and Analysis of Fictional Character Networks: A Survey

A character network is a graph extracted from a narrative, in which vertices represent characters and edges correspond to interactions between them. A number of narrative-related problems can be addressed automatically through the analysis of character networks, such as summarization, classification, or role detection. Character networks are particularly relevant when considering works of fictions (e.g. novels, plays, movies, TV series), as their exploitation allows developing information retrieval and recommendation systems. However, works of fiction possess specific properties making these tasks harder. This survey aims at presenting and organizing the scientific literature related to the extraction of character networks from works of fiction, as well as their analysis. We first describe the extraction process in a generic way, and explain how its constituting steps are implemented in practice, depending on the medium of the narrative, the goal of the network analysis, and other factors. We then review the descriptive tools used to characterize character networks, with a focus on the way they are interpreted in this context. We illustrate the relevance of character networks by also providing a review of applications derived from their analysis. Finally, we identify the limitations of the existing approaches, and the most promising perspectives.

Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method

We propose an approach to construction of robust non-Euclidean iterative algorithms for convex composite stochastic optimization based on truncation of stochastic gradients. For such algorithms, we establish sub-Gaussian confidence bounds under weak assumptions about the tails of the noise distribution in convex and strongly convex settings. Robust estimates of the accuracy of general stochastic algorithms are also proposed.

Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

In this paper, we introduce the concept of Prior Activation Distribution (PAD) as a versatile and general technique to capture the typical activation patterns of hidden layer units of a Deep Neural Network used for classification tasks. We show that the combined neural activations of such a hidden layer have class-specific distributional properties, and then define multiple statistical measures to compute how far a test sample’s activations deviate from such distributions. Using a variety of benchmark datasets (including MNIST, CIFAR10, Fashion-MNIST & notMNIST), we show how such PAD-based measures can be used, independent of any training technique, to (a) derive fine-grained uncertainty estimates for inferences; (b) provide inferencing accuracy competitive with alternatives that require execution of the full pipeline, and (c) reliably isolate out-of-distribution test samples.

Network Embedding: on Compression and Learning

Recently, network embedding that encodes structural information of graphs into a vector space has become popular for network analysis. Although recent methods show promising performance for various applications, the huge sizes of graphs may hinder a direct application of existing network embedding method to them. This paper presents NECL, a novel efficient Network Embedding method with two goals. 1) Is there an ideal Compression of a network? 2) Will the compression of a network significantly boost the representation Learning of the network? For the first problem, we propose a neighborhood similarity based graph compression method that compresses the input graph to get a smaller graph without losing any/much information about the global structure of the graph and the local proximity of the vertices in the graph. For the second problem, we use the compressed graph for network embedding instead of the original large graph to bring down the embedding cost. NECL is a general meta-strategy to improve the efficiency of all of the state-of-the-art graph embedding algorithms based on random walks, including DeepWalk and Node2vec, without losing their effectiveness. Extensive experiments on large real-world networks validate the efficiency of NECL method that yields an average improvement of 23 – 57% embedding time, including walking and learning time without decreasing classification accuracy as evaluated on single and multi-label classification tasks on real-world graphs such as DBLP, BlogCatalog, Cora and Wiki.

Genetic Network Architecture Search

We propose a method for learning the neural network architecture that based on Genetic Algorithm (GA). Our approach uses a genetic algorithm integrated with standard Stochastic Gradient Descent(SGD) which allows the sharing of weights across all architecture solutions. The method uses GA to design a sub-graph of Convolution cell which maximizes the accuracy on the validation-set. Through experiments, we demonstrate this methods performance on both CIFAR10 and CIFAR100 dataset with an accuracy of 96% and 80.1%. The code and result of this work available in GitHub:https://…/GeneticNAS.

Attentive Multi-Task Deep Reinforcement Learning

Sharing knowledge between tasks is vital for efficient learning in a multi-task setting. However, most research so far has focused on the easier case where knowledge transfer is not harmful, i.e., where knowledge from one task cannot negatively impact the performance on another task. In contrast, we present an approach to multi-task deep reinforcement learning based on attention that does not require any a-priori assumptions about the relationships between tasks. Our attention network automatically groups task knowledge into sub-networks on a state level granularity. It thereby achieves positive knowledge transfer if possible, and avoids negative transfer in cases where tasks interfere. We test our algorithm against two state-of-the-art multi-task/transfer learning approaches and show comparable or superior performance while requiring fewer network parameters.

Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model

Intent Detection and Slot Filling are two pillar tasks in Spoken Natural Language Understanding. Common approaches adopt joint Deep Learning architectures in attention-based recurrent frameworks. In this work, we aim at exploiting the success of ‘recurrence-less’ models for these tasks. We introduce Bert-Joint, i.e., a multi-lingual joint text classification and sequence labeling framework. The experimental evaluation over two well-known English benchmarks demonstrates the strong performances that can be obtained with this model, even when few annotated data is available. Moreover, we annotated a new dataset for the Italian language, and we observed similar performances without the need for changing the model.

Visus: An Interactive System for Automatic Machine Learning Model Building and Curation

While the demand for machine learning (ML) applications is booming, there is a scarcity of data scientists capable of building such models. Automatic machine learning (AutoML) approaches have been proposed that help with this problem by synthesizing end-to-end ML data processing pipelines. However, these follow a best-effort approach and a user in the loop is necessary to curate and refine the derived pipelines. Since domain experts often have little or no expertise in machine learning, easy-to-use interactive interfaces that guide them throughout the model building process are necessary. In this paper, we present Visus, a system designed to support the model building process and curation of ML data processing pipelines generated by AutoML systems. We describe the framework used to ground our design choices and a usage scenario enabled by Visus. Finally, we discuss the feedback received in user testing sessions with domain experts.

Hybridized Threshold Clustering for Massive Data

As the size

of datasets become massive, many commonly-used clustering algorithms (for example,

-means or hierarchical agglomerative clustering (HAC) require prohibitive computational cost and memory. In this paper, we propose a solution to these clustering problems by extending threshold clustering (TC) to problems of instance selection. TC is a recently developed clustering algorithm designed to partition data into many small clusters in linearithmic time (on average). Our proposed clustering method is as follows. First, TC is performed and clusters are reduced into single ‘prototype’ points. Then, TC is applied repeatedly on these prototype points until sufficient data reduction has been obtained. Finally, a more sophisticated clustering algorithm is applied to the reduced prototype points, thereby obtaining a clustering on all

data points. This entire procedure for clustering is called iterative hybridized threshold clustering (IHTC). Through simulation results and by applying our methodology on several real datasets, we show that IHTC combined with

-means or HAC substantially reduces the run time and memory usage of the original clustering algorithms while still preserving their performance. Additionally, IHTC helps prevent singular data points from being overfit by clustering algorithms.

On Inductive Biases in Deep Reinforcement Learning

Many deep reinforcement learning algorithms contain inductive biases that sculpt the agent’s objective and its interface to the environment. These inductive biases can take many forms, including domain knowledge and pretuned hyper-parameters. In general, there is a trade-off between generality and performance when algorithms use such biases. Stronger biases can lead to faster learning, but weaker biases can potentially lead to more general algorithms. This trade-off is important because inductive biases are not free; substantial effort may be required to obtain relevant domain knowledge or to tune hyper-parameters effectively. In this paper, we re-examine several domain-specific components that bias the objective and the environmental interface of common deep reinforcement learning agents. We investigated whether the performance deteriorates when these components are replaced with adaptive solutions from the literature. In our experiments, performance sometimes decreased with the adaptive components, as one might expect when comparing to components crafted for the domain, but sometimes the adaptive components performed better. We investigated the main benefit of having fewer domain-specific components, by comparing the learning performance of the two systems on a different set of continuous control problems, without additional tuning of either system. As hypothesized, the system with adaptive components performed better on many of the new tasks.

Improved local search for graph edit distance

Graph Edit Distance (GED) measures the dissimilarity between two graphs as the minimal cost of a sequence of elementary operations transforming one graph into another. This measure is fundamental in many areas such as structural pattern recognition or classification. However, exactly computing GED is NP-hard. Among different classes of heuristic algorithms that were proposed to compute approximate solutions, local search based algorithms provide the tightest upper bounds for GED. In this paper, we present K-REFINE and RANDPOST. K-REFINE generalizes and improves an existing local search algorithm and performs particularly well on small graphs. RANDPOST is a general warm start framework that stochastically generates promising initial solutions to be used by any local search based GED algorithm. It is particularly efficient on large graphs. An extensive empirical evaluation demonstrates that both K-REFINE and RANDPOST perform excellently in practice.

An Approximate Bayesian Approach to Surprise-Based Learning

Surprise-based learning allows agents to adapt quickly in non-stationary stochastic environments. Most existing approaches to surprise-based learning and change point detection assume either implicitly or explicitly a simple, hierarchical generative model of observation sequences that are characterized by stationary periods separated by sudden changes. In this work we show that exact Bayesian inference gives naturally rise to a surprise-modulated trade-off between forgetting and integrating the new observations with the current belief. We demonstrate that many existing approximate Bayesian approaches also show surprise-based modulation of learning rates, and we derive novel particle filters and variational filters with update rules that exhibit surprise-based modulation. Our derived filters have a constant scaling in observation sequence length and particularly simple update dynamics for any distribution in the exponential family. Empirical results show that these filters estimate parameters better than alternative approximate approaches and reach comparative levels of performance to computationally more expensive algorithms. The theoretical insight of casting various approaches under the same interpretation of surprise-based learning, as well as the proposed filters, may find useful applications in reinforcement learning in non-stationary environments and in the analysis of animal and human behavior.

Share this: