Inferring new facts from existing knowledge graphs (KG) with explainable reasoning processes is a significant problem and has received much attention recently. However, few studies have focused on relation types unseen in the original KG, given only one or a few instances for training. To bridge this gap, we propose CogKR for one-shot KG reasoning. The one-shot relational learning problem is tackled through two modules: the summary module summarizes the underlying relationship of the given instances, based on which the reasoning module infers the correct answers. Motivated by the dual process theory in cognitive science, in the reasoning module, a cognitive graph is built by iteratively coordinating retrieval (System 1, collecting relevant evidence intuitively) and reasoning (System 2, conducting relational reasoning over collected information). The structural information offered by the cognitive graph enables our model to aggregate pieces of evidence from multiple reasoning paths and explain the reasoning process graphically. Experiments show that CogKR substantially outperforms previous state-of-the-art models on one-shot KG reasoning benchmarks, with relative improvements of 24.3%-29.7% on MRR. The source code is available at https://…/CogKR.
Natural languages are complexly structured entities. They exhibit characterising regularities that can be exploited to link them one another. In this work, I compare two morphological aspects of languages: Written Patterns and Sentence Structure. I show how languages spontaneously group by similarity in both analyses and derive an average language distance. Finally, exploiting Sentence Structure I developed an Artificial Neural Network capable of distinguishing languages suggesting that not only word roots but also grammatical sentence structure is a characterising trait which alone suffice to identify them.
Many massive data are assembled through collections of information of a large number of individuals in a population. The analysis of such data, especially in the aspect of individualized inferences and solutions, has the potential to create significant value for practical applications. Traditionally, inference for an individual in the data set is either solely relying on the information of the individual or from summarizing the information about the whole population. However, with the availability of big data, we have the opportunity, as well as a unique challenge, to make a more effective individualized inference that takes into consideration of both the population information and the individual discrepancy. To deal with the possible heterogeneity within the population while providing effective and credible inferences for individuals in a data set, this article develops a new approach called the individualized group learning (iGroup). The iGroup approach uses local nonparametric techniques to generate an individualized group by pooling other entities in the population which share similar characteristics with the target individual. Three general cases of iGroup are discussed, and their asymptotic performances are investigated. Both theoretical results and empirical simulations reveal that, by applying iGroup, the performance of statistical inference on the individual level are ensured and can be substantially improved from inference based on either solely individual information or entire population information. The method has a broad range of applications. Two examples in financial statistics and maritime anomaly detection are presented.
With the support of the blockchain systems, the cryptocurrency has changed the world of virtual assets. Digital games, especially those with massive multi-player scenarios, will be significantly impacted by this novel technology. However, there are insufficient academic studies on this topic. In this work, we filled the blank by surveying the state-of-the-art blockchain games. We discuss the blockchain integration for games and then categorize existing blockchain games from the aspects of their genres and technical platforms. Moreover, by analyzing the industrial trend with a statistical approach, we envision the future of blockchain games from technological and commercial perspectives.
We consider the topic modeling problem for large datasets. For this problem, Latent Dirichlet Allocation (LDA) with a collapsed Gibbs sampler optimization is the state-of-the-art approach in terms of topic quality. However, LDA is a slow approach, and running it on large datasets is impractical even with modern hardware. In this paper we propose to fit topics directly to the co-occurances data of the corpus. In particular, we introduce an extension of a mixture model, the Full Dependence Mixture (FDM), which arises naturally as a model of a second moment under general generative assumptions on the data. While there is some previous work on topic modeling using second moments, we develop a direct stochastic optimization procedure for fitting an FDM with a single Kullback Leibler objective. While moment methods in general have the benefit that an iteration no longer needs to scale with the size of the corpus, our approach also allows us to leverage standard optimizers and GPUs for the problem of topic modeling. We evaluate the approach on synthetic and semi-synthetic data, as well as on the SOTU and Neurips Papers corpora, and show that the approach outperforms LDA, where LDA is run on both full and sub-sampled data.
The reliance on deep learning algorithms has grown significantly in recent years. Yet, these models are highly vulnerable to adversarial attacks, which introduce visually imperceptible perturbations into testing data to induce misclassifications. The literature has proposed several methods to combat such adversarial attacks, but each method either fails at high perturbation values, requires excessive computing power, or both. This letter proposes a computationally efficient method for defending the Fast Gradient Sign (FGS) adversarial attack by simultaneously denoising and compressing data. Specifically, our proposed defense relies on training a fully connected multi-layer Denoising Autoencoder (DAE) and using its encoder as a defense against the adversarial attack. Our results show that using this dimensionality reduction scheme is not only highly effective in mitigating the effect of the FGS attack in multiple threat models, but it also provides a 2.43x speedup in comparison to defense strategies providing similar robustness against the same attack.
The majority of modern deep learning models are able to interpolate the data: the empirical loss can be driven near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning. Specifically, we use it to compute an adaptive learning-rate given a stochastic gradient direction. This results in the Adaptive Learning-rates for Interpolation with Gradients (ALI-G) algorithm. ALI-G retains the advantages of SGD, which are low computational cost and provable convergence in the convex setting. But unlike SGD, the learning-rate of ALI-G can be computed inexpensively in closed-form and does not require a manual schedule. We provide a detailed analysis of ALI-G in the stochastic convex setting with explicit convergence rates. In order to obtain good empirical performance in deep learning, we extend the algorithm to use a maximal learning-rate, which gives a single hyper-parameter to tune. We show that employing such a maximal learning-rate has an intuitive proximal interpretation and preserves all convergence guarantees. We provide experiments on a variety of architectures and tasks: (i) learning a differentiable neural computer; (ii) training a wide residual network on the SVHN data set; (iii) training a Bi-LSTM on the SNLI data set; and (iv) training wide residual networks and densely connected networks on the CIFAR data sets. We empirically show that ALI-G outperforms adaptive gradient methods such as Adam, and provides comparable performance with SGD, although SGD benefits from manual learning rate schedules. We release PyTorch and Tensorflow implementations of ALI-G as standalone optimizers that can be used as a drop-in replacement in existing code (code available at https://…/ali-g ).
Scientific computations or measurements may result in huge volumes of data. Often these can be thought of representing a real-valued function on a high-dimensional domain, and can be conceptually arranged in the format of a tensor of high degree in some truncated or lossy compressed format. We look at some common post-processing tasks which are not obvious in the compressed format, as such huge data sets can not be stored in their entirety, and the value of an element is not readily accessible through simple look-up. The tasks we consider are finding the location of maximum or minimum, or minimum and maximum of a function of the data, or finding the indices of all elements in some interval — i.e. level sets, the number of elements with a value in such a level set, the probability of an element being in a particular level set, and the mean and variance of the total collection. The algorithms to be described are fixed point iterations of particular functions of the tensor, which will then exhibit the desired result. For this, the data is considered as an element of a high degree tensor space, although in an abstract sense, the algorithms are independent of the representation of the data as a tensor. All that we require is that the data can be considered as an element of an associative, commutative algebra with an inner product. Such an algebra is isomorphic to a commutative sub-algebra of the usual matrix algebra, allowing the use of matrix algorithms to accomplish the mentioned tasks. We allow the actual computational representation to be a lossy compression, and we allow the algebra operations to be performed in an approximate fashion, so as to maintain a high compression level. One such example which we address explicitly is the representation of data as a tensor with compression in the form of a low-rank representation.
Fine-grained Entity Typing is a tough task which suffers from noise samples extracted from distant supervision. Thousands of manually annotated samples can achieve greater performance than millions of samples generated by the previous distant supervision method. Whereas, it’s hard for human beings to differentiate and memorize thousands of types, thus making large-scale human labeling hardly possible. In this paper, we introduce a Knowledge-Constraint Typing Annotation Tool (KCAT), which is efficient for fine-grained entity typing annotation. KCAT reduces the size of candidate types to an acceptable range for human beings through entity linking and provides a Multi-step Typing scheme to revise the entity linking result. Moreover, KCAT provides an efficient Annotator Client to accelerate the annotation process and a comprehensive Manager Module to analyse crowdsourcing annotations. Experiment shows that KCAT can significantly improve annotation efficiency, the time consumption increases slowly as the size of type set expands.
This paper focuses on the end-to-end abstractive summarization of a single product review without supervision. We assume that a review can be described as a discourse tree, in which the summary is the root, and the child sentences explain their parent in detail. By recursively estimating a parent from its children, our model learns the latent discourse tree without an external parser and generates a concise summary. We also introduce an architecture that ranks the importance of each sentence on the tree to support summary generation focusing on the main review point. The experimental results demonstrate that our model is competitive with or outperforms other unsupervised approaches. In particular, for relatively long reviews, it achieves a competitive or better performance than supervised models. The induced tree shows that the child sentences provide additional information about their parent, and the generated summary abstracts the entire review.
Multilingual writers and speakers often alternate between two languages in a single discourse, a practice called ‘code-switching’. Existing sentiment detection methods are usually trained on sentiment-labeled monolingual text. Manually labeled code-switched text, especially involving minority languages, is extremely rare. Consequently, the best monolingual methods perform relatively poorly on code-switched text. We present an effective technique for synthesizing labeled code-switched text from labeled monolingual text, which is more readily available. The idea is to replace carefully selected subtrees of constituency parses of sentences in the resource-rich language with suitable token spans selected from automatic translations to the resource-poor language. By augmenting scarce human-labeled code-switched text with plentiful synthetic code-switched text, we achieve significant improvements in sentiment labeling accuracy (1.5%, 5.11%, 7.20%) for three different language pairs (English-Hindi, English-Spanish and English-Bengali). We also get significant gains for hate speech detection: 4% improvement using only synthetic text and 6% if augmented with real text.
This paper describes a C++ library that compiles neural network models at runtime into machine code that performs inference. This approach in general promises to achieve the best performance possible since it is able to integrate statically known properties of the network directly into the code. In our experiments on the NAO V6 platform, it outperforms existing implementations significantly on small networks, while being inferior on large networks. The library was already part of the B-Human code release 2018, but has been extended since and is now available as a standalone version that can be integrated into any C++14 code base.
Noise modeling lies in the heart of many image processing tasks. However, existing deep learning methods for noise modeling generally require clean and noisy image pairs for model training; these image pairs are difficult to obtain in many realistic scenarios. To ameliorate this problem, we propose a self-consistent GAN (SCGAN), that can directly extract noise maps from noisy images, thus enabling unsupervised noise modeling. In particular, the SCGAN introduces three novel self-consistent constraints that are complementary to one another, viz.: the noise model should produce a zero response over a clean input; the noise model should return the same output when fed with a specific pure noise input; and the noise model also should re-extract a pure noise map if the map is added to a clean image. These three constraints are simple yet effective. They jointly facilitate unsupervised learning of a noise model for various noise types. To demonstrate its wide applicability, we deploy the SCGAN on three image processing tasks including blind image denoising, rain streak removal, and noisy image super-resolution. The results demonstrate the effectiveness and superiority of our method over the state-of-the-art methods on a variety of benchmark datasets, even though the noise types vary significantly and paired clean images are not available.
Building performance discrepancies between building design and operation are one of the causes that lead many new designs fail to achieve their goals and objectives. One of main factors contributing to the discrepancy is occupant behaviors. Occupants responding to a new design are influenced by several factors. Existing building performance models (BPMs) ignore or partially address those factors (called contextual factors) while developing BPMs. To potentially reduce the discrepancies and improve the prediction accuracy of BPMs, this paper proposes a computational framework for learning mixture models by using Generative Adversarial Networks (GANs) that appropriately combining existing BPMs with knowledge on occupant behaviors to contextual factors in new designs. Immersive virtual environments (IVEs) experiments are used to acquire data on such behaviors. Performance targets are used to guide appropriate combination of existing BPMs with knowledge on occupant behaviors. The resulting model obtained is called an augmented BPM. Two different experiments related to occupant lighting behaviors are shown as case study. The results reveal that augmented BPMs significantly outperformed existing BPMs with respect to achieving specified performance targets. The case study confirms the potential of the computational framework for improving prediction accuracy of BPMs during design.
The scale of Internet-connected systems has increased considerably, and these systems are being exposed to cyber attacks more than ever. The complexity and dynamics of cyber attacks require protecting mechanisms to be responsive, adaptive, and large-scale. Machine learning, or more specifically deep reinforcement learning (DRL), methods have been proposed widely to address these issues. By incorporating deep learning into traditional RL, DRL is highly capable of solving complex, dynamic, and especially high-dimensional cyber defense problems. This paper presents a survey of DRL approaches developed for cyber security. We touch on different vital aspects, including DRL-based security methods for cyber-physical systems, autonomous intrusion detection techniques, and multi-agent DRL-based game theory simulations for defense strategies against cyber attacks. Extensive discussions and future research directions on DRL-based cyber security are also given. We expect that this comprehensive review provides the foundations for and facilitates future studies on exploring the potential of emerging DRL to cope with increasingly complex cyber security problems.
Hierarchical Reinforcement Learning is a promising approach to long-horizon decision-making problems with sparse rewards. Unfortunately, most methods still decouple the lower-level skill acquisition process and the training of a higher level that controls the skills in a new task. Treating the skills as fixed can lead to significant sub-optimality in the transfer setting. In this work, we propose a novel algorithm to discover a set of skills, and continuously adapt them along with the higher level even when training on a new task. Our main contributions are two-fold. First, we derive a new hierarchical policy gradient, as well as an unbiased latent-dependent baseline. We introduce Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy simultaneously. Second, we propose a method of training time-abstractions that improves the robustness of the obtained skills to environment changes. Code and results are available at sites.google.com/view/hippo-rl .
Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, viewed by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a “dog’ can be seen, heard, and felt). We hypothesize that a powerful representation is one that models view-invariant factors. Based on this hypothesis, we investigate a contrastive coding scheme, in which a representation is learned that aims to maximize mutual information between different views but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. The resulting learned representations perform above the state of the art for downstream tasks such as object classification, compared to formulations based on predictive learning or single view reconstruction, and improve as more views are added. Code and reference implementations are released on our project page: http://…/.
Training deep generative models with maximum likelihood remains a challenge. The typical workaround is to use variational inference (VI) and maximize a lower bound to the log marginal likelihood of the data. Variational auto-encoders (VAEs) adopt this approach. They further amortize the cost of inference by using a recognition network to parameterize the variational family. Amortized VI scales approximate posterior inference in deep generative models to large datasets. However it introduces an amortization gap and leads to approximate posteriors of reduced expressivity due to the problem known as posterior collapse. In this paper, we consider expectation maximization (EM) as a paradigm for fitting deep generative models. Unlike VI, EM directly maximizes the log marginal likelihood of the data. We rediscover the importance weighted auto-encoder (IWAE) as an instance of EM and propose a new EM-based algorithm for fitting deep generative models called reweighted expectation maximization (REM). REM learns better generative models than the IWAE by decoupling the learning dynamics of the generative model and the recognition network using a separate expressive proposal found by moment matching. We compared REM to the VAE and the IWAE on several density estimation benchmarks and found it leads to significantly better performance as measured by log-likelihood.
We present a novel deep zero-shot learning (ZSL) model for inferencing human-object-interaction with verb-object (VO) query. While the previous ZSL approaches only use the semantic/textual information to be fed into the query stream, we seek to incorporate and embed the semantics into the visual representation stream as well. Our approach is powered by Semantics-to-Space (S2S) architecture where semantics derived from the residing objects are embedded into a spatial space. This architecture allows the co-capturing of the semantic attributes of the human and the objects along with their location/size/silhouette information. As this is the first attempt to address the zero-shot human-object-interaction inferencing with VO query, we have constructed a new dataset, Verb-Transferability 60 (VT60). VT60 provides 60 different VO pairs with overlapping verbs tailored for testing ZSL approaches with VO query. Experimental evaluations show that our approach not only outperforms the state-of-the-art, but also shows the capability of consistently improving performance regardless of which ZSL baseline architecture is used.
Few-shot learning is a challenging problem where the system is required to achieve generalization from only few examples. Meta-learning tackles the problem by learning prior knowledge shared across a distribution of tasks, which is then used to quickly adapt to unseen tasks. Model-agnostic meta-learning (MAML) algorithm formulates prior knowledge as a common initialization across tasks. However, forcibly sharing an initialization brings about conflicts between tasks and thus compromises the quality of the initialization. In this work, by observing that the extent of compromise differs among tasks and between layers of a neural network, we propose a new initialization idea that employs task-dependent layer-wise attenuation, which we call selective forgetting. The proposed attenuation scheme dynamically controls how much of prior knowledge each layer will exploit for a given task. The experimental results demonstrate that the proposed method mitigates the conflicts and provides outstanding performance as a result. We further show that the proposed method, named L2F, can be applied and improve other state-of-the-art MAML-based frameworks, illustrating its generalizability.