The need for maintenance is based on the wear of components of machinery. If this need can be defined reliably beforehand so that no unpredicted failures take place then the maintenance actions can be carried out economically with mini-mum disturbances to production. There are two basic challenges in solving the above. First understanding the development of wear and failures, and second managing the measurement and diagnosis of such parameters that can reveal the development of wear. In principle the development of wear and failures can be predicted through monitoring time, load or wear as such. Moni-toring time is not very efficient, as there are only limited numbers of components that suffer from aging which as such is the result of chemical wear i.e. changes in the material. In most cases the loading of components influences their wear. In principle the loading can be stable or varying in nature. Of these two cases the varying load case is much more challenging than the stable one. The monitoring of wear can be done either directly e.g. optical methods or indirectly e.g. vibration. Monitoring actual wear is naturally the most reliable approach, but it often means that additional investments are needed. The paper discusses how the monitoring of wear and need for maintenance can be done based on the use of Cyber Physical Systems.
This paper provides a taxonomy for the licensing of data in the fields of artificial intelligence and machine learning. The paper’s goal is to build towards a common framework for data licensing akin to the licensing of open source software. Increased transparency and resolving conceptual ambiguities in existing licensing language are two noted benefits of the approach proposed in the paper. In parallel, such benefits may help foster fairer and more efficient markets for data through bringing about clearer tools and concepts that better define how data can be used in the fields of AI and ML. The paper’s approach is summarized in a new family of data license language – \textit{the Montreal Data License (MDL)}. Alongside this new license, the authors and their collaborators have developed a web-based tool to generate license language espousing the taxonomies articulated in this paper.
We present a data-driven method for separating complex, multiscale systems into their constituent time-scale components using a recursive implementation of dynamic mode decomposition (DMD). Local linear models are built from windowed subsets of the data, and dominant time scales are discovered using spectral clustering on their eigenvalues. This approach produces time series data for each identified component, which sum to a faithful reconstruction of the input signal. It differs from most other methods in the field of multiresolution analysis (MRA) in that it 1) accounts for spatial and temporal coherencies simultaneously, making it more robust to scale overlap between components, and 2) yields a closed-form expression for local dynamics at each scale, which can be used for short-term prediction of any or all components. Our technique is an extension of multi-resolution dynamic mode decomposition (mrDMD), generalized to treat a broader variety of multiscale systems and more faithfully reconstruct their isolated components. In this paper we present an overview of our algorithm and its results on two example physical systems, and briefly discuss some advantages and potential forecasting applications for the technique.
In this paper we tackle the problem of unsupervised domain adaptation for the task of semantic segmentation, where we attempt to transfer the knowledge learned upon synthetic datasets with ground-truth labels to real-world images without any annotation. With the hypothesis that the structural content of images is the most informative and decisive factor to semantic segmentation and can be readily shared across domains, we propose a Domain Invariant Structure Extraction (DISE) framework to disentangle images into domain-invariant structure and domain-specific texture representations, which can further realize image-translation across domains and enable label transfer to improve segmentation performance. Extensive experiments verify the effectiveness of our proposed DISE model and demonstrate its superiority over several state-of-the-art approaches.
Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfied by a URI and accessible via HTTP. LOD encodes globalscale knowledge potentially available to any human as well as artificial intelligence that may want to benefit from it as background knowledge for supporting their tasks. LOD has emerged as the backbone of applications in diverse fields such as Natural Language Processing, Information Retrieval, Computer Vision, Speech Recognition, and many more. Nevertheless, regardless of the specific tasks that LOD-based tools aim to address, the reuse of such knowledge may be challenging for diverse reasons, e.g. semantic heterogeneity, provenance, and data quality. As aptly stated by Heath et al. Linked Data might be outdated, imprecise, or simply wrong’: there arouses a necessity to investigate the problem of linked data validity. This work reports a collaborative effort performed by nine teams of students, guided by an equal number of senior researchers, attending the International Semantic Web Research School (ISWS 2018) towards addressing such investigation from different perspectives coupled with different approaches to tackle the issue.
Analogous to living ecosystems in nature, web services form an artificial ecosystem consisting of many tags and their associated media, such as photographs, movies, and web pages created by human users. Concerning biological ecosystems, we regard tag as a species and human as a hidden environmental resource. We subsequently analyze the evolution of the web services, in particular social tagging systems, with respect to the self-organization of new tags. The evolution of new combinations of tags is analyzed as the open-ended evolution (OEE) index. The tag meaning is computed by the types of associated tags; tags that vary their meanings temporally exist. We argue that such tags are the examples of OEE.
In a wide array of areas, algorithms are matching and surpassing the performance of human experts, leading to consideration of the roles of human judgment and algorithmic prediction in these domains. The discussion around these developments, however, has implicitly equated the specific task of prediction with the general task of automation. We argue here that automation is broader than just a comparison of human versus algorithmic performance on a task; it also involves the decision of which instances of the task to give to the algorithm in the first place. We develop a general framework that poses this latter decision as an optimization problem, and we show how basic heuristics for this optimization problem can lead to performance gains even on heavily-studied applications of AI in medicine. Our framework also serves to highlight how effective automation depends crucially on estimating both algorithmic and human error on an instance-by-instance basis, and our results show how improvements in these error estimation problems can yield significant gains for automation as well.
Domain adaptation is critical for learning in new and unseen environments. With domain adversarial training, deep networks can learn disentangled and transferable features that effectively diminish the dataset shift between the source and target domains for knowledge transfer. In the era of Big Data, the ready availability of large-scale labeled datasets has stimulated wide interest in partial domain adaptation (PDA), which transfers a recognizer from a labeled large domain to an unlabeled small domain. It extends standard domain adaptation to the scenario where target labels are only a subset of source labels. Under the condition that target labels are unknown, the key challenge of PDA is how to transfer relevant examples in the shared classes to promote positive transfer, and ignore irrelevant ones in the specific classes to mitigate negative transfer. In this work, we propose a unified approach to PDA, Example Transfer Network (ETN), which jointly learns domain-invariant representations across the source and target domains, and a progressive weighting scheme that quantifies the transferability of source examples while controlling their importance to the learning task in the target domain. A thorough evaluation on several benchmark datasets shows that our approach achieves state-of-the-art results for partial domain adaptation tasks.
In this paper we establish rigorous benchmarks for image classifier robustness. Our first benchmark, ImageNet-C, standardizes and expands the corruption robustness topic, while showing which classifiers are preferable in safety-critical applications. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier’s robustness to common perturbations. Unlike recent robustness research, this benchmark evaluates performance on common corruptions and perturbations not worst-case adversarial perturbations. We find that there are negligible changes in relative corruption robustness from AlexNet classifiers to ResNet classifiers. Afterward we discover ways to enhance corruption and perturbation robustness. We even find that a bypassed adversarial defense provides substantial common perturbation robustness. Together our benchmarks may aid future work toward networks that robustly generalize.
Several important security issues of Deep Neural Network (DNN) have been raised recently associated with different applications and components. The most widely investigated security concern of DNN is from its malicious input, a.k.a adversarial example. Nevertheless, the security challenge of DNN’s parameters is not well explored yet. In this work, we are the first to propose a novel DNN weight attack methodology called Bit-Flip Attack (BFA) which can crush a neural network through maliciously flipping extremely small amount of bits within its weight storage memory system (i.e., DRAM). The bit-flip operations could be conducted through well-known Row-Hammer attack, while our main contribution is to develop an algorithm to identify the most vulnerable bits of DNN weight parameters (stored in memory as binary bits), that could maximize the accuracy degradation with a minimum number of bit-flips. Our proposed BFA utilizes a Progressive Bit Search (PBS) method which combines gradient ranking and progressive search to identify the most vulnerable bit to be flipped. With the aid of PBS, we can successfully attack a ResNet-18 fully malfunction (i.e., top-1 accuracy degrade from 69.8% to 0.1%) only through 13 bit-flips out of 93 million bits, while randomly flipping 100 bits merely degrades the accuracy by less than 1%.
We critically assess mainstream accounting and finance research applying methods from computational linguistics (CL) to study financial discourse. We also review common themes and innovations in the literature and assess the incremental contributions of work applying CL methods over manual content analysis. Key conclusions emerging from our analysis are: (a) accounting and finance research is behind the curve in terms of CL methods generally and word sense disambiguation in particular; (b) implementation issues mean the proposed benefits of CL are often less pronounced than proponents suggest; (c) structural issues limit practical relevance; and (d) CL methods and high quality manual analysis represent complementary approaches to analyzing financial discourse. We describe four CL tools that have yet to gain traction in mainstream AF research but which we believe offer promising ways to enhance the study of meaning in financial discourse. The four tools are named entity recognition (NER), summarization, semantics and corpus linguistics.
Enforcing distributions of latent variables in neural networks is an active subject. It is vital in all kinds of generative models, where we want to be able to interpolate between points in the latent space, or sample from it. Modern generative AutoEncoders (AE) like WAE, SWAE, CWAE add a regularizer to the standard (deterministic) AE, which allows to enforce Gaussian distribution in the latent space. Enforcing different distributions, especially topologically nontrivial, might bring some new interesting possibilities, but this subject seems unexplored so far. This article proposes a new approach to enforce uniform distribution on d-dimensional torus. We introduce a circular spring loss, which enforces minibatch points to be equally spaced and satisfy cyclic boundary conditions. As example of application we propose multiple-path morphing. Minimal distance geodesic between two points in uniform distribution on latent space of angles becomes a line, however, torus topology allows us to choose such lines in alternative ways, going through different edges of $[-\pi,\pi]^d$. Further applications to explore can be for example trying to learn real-life topologically nontrivial spaces of features, like rotations to automatically recognize 2D rotation of an object in picture by training on relative angles, or even 3D rotations by additionally using spherical features – this way morphing should be close to object rotation.
Graph embedding methods produce unsupervised node features from graphs that can then be used for a variety of machine learning tasks. Modern graphs, particularly in industrial applications, contain billions of nodes and trillions of edges, which exceeds the capability of existing embedding systems. We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges. PBG uses graph partitioning to train arbitrarily large embeddings on either a single machine or in a distributed environment. We demonstrate comparable performance with existing embedding systems on common benchmarks, while allowing for scaling to arbitrarily large graphs and parallelization on multiple machines. We train and evaluate embeddings on several large social network graphs as well as the full Freebase dataset, which contains over 100 million nodes and 2 billion edges.
Few-shot learning in image classification aims to learn a classifier to classify images when only few training examples are available for each class. Recent work has achieved promising classification performance, where an image-level feature based measure is usually used. In this paper, we argue that a measure at such a level may not be effective enough in light of the scarcity of examples in few-shot learning. Instead, we think a local descriptor based image-to-class measure should be taken, inspired by its surprising success in the heydays of local invariant features. Specifically, building upon the recent episodic training mechanism, we propose a Deep Nearest Neighbor Neural Network (DN4 in short) and train it in an end-to-end manner. Its key difference from the literature is the replacement of the image-level feature based measure in the final layer by a local descriptor based image-to-class measure. This measure is conducted online via a $k$-nearest neighbor search over the deep local descriptors of convolutional feature maps. The proposed DN4 not only learns the optimal deep local descriptors for the image-to-class measure, but also utilizes the higher efficiency of such a measure in the case of example scarcity, thanks to the exchangeability of visual patterns across the images in the same class. Our work leads to a simple, effective, and computationally efficient framework for few-shot learning. Experimental study on benchmark datasets consistently shows its superiority over the related state-of-the-art, with the largest absolute improvement of $17\%$ over the next best. The source code can be available from \UrlFont{https://…/DN4.git}.
This paper presents a novel framework for designing support vector machines (SVMs), which does not impose restriction on the SVM kernel to be positive-definite and allows the user to define memory constraint in terms of fixed template vectors. This makes the framework scalable and enables its implementation for low-power, high-density and memory constrained embedded application. An efficient hardware implementation of the same is also discussed, which utilizes novel low power memtransistor based cross-bar architecture, and is robust to device mismatch and randomness. We used memtransistor measurement data, and showed that the designed SVMs can achieve state-of-the-art classification accuracy on both synthetic and real-world benchmark datasets. This framework would be beneficial for design of SVM based wake-up systems for internet of things (IoTs) and edge devices where memtransistors can be used to optimize system’s energy-efficiency and perform in-memory matrix-vector multiplication (MVM).
This paper proposes a reinforcement learning approach for traffic control with the adaptive horizon. To build the controller for the traffic network, a Q-learning-based strategy that controls the green light passing time at the network intersections is applied. The controller includes two components: the regular Q-learning controller that controls the traffic light signal, and the adaptive controller that continuously optimizes the action space for the Q-learning algorithm in order to improve the efficiency of the Q-learning algorithm. The regular Q-learning controller uses the control cost function as a reward function to determine the action to choose. The adaptive controller examines the control cost and updates the action space of the controller by determining the subset of actions that are most likely to obtain optimal results and shrinking the action space to that subset. Uncertainties in traffic influx and turning rate are introduced to test the robustness of the controller under a stochastic environment. Compared with those with model predictive control (MPC), the results show that the proposed Q-learning-based controller outperforms the MPC method by reaching a stable solution in a shorter period and achieves lower control costs. The proposed Q-learning-based controller is also robust under 30% traffic demand uncertainty and 15% turning rate uncertainty.
Combinatorial generalization – the ability to understand and produce novel combinations of already familiar elements – is considered to be a core capacity of the human mind and a major challenge to neural network models. A significant body of research suggests that conventional neural networks can’t solve this problem unless they are endowed with mechanisms specifically engineered for the purpose of representing symbols. In this paper we introduce a novel way of representing symbolic structures in connectionist terms – the vectors approach to representing symbols (VARS), which allows training standard neural architectures to encode symbolic knowledge explicitly at their output layers. In two simulations , we show that out-of-the-box neural networks not only can learn to produce VARS representations, but in doing so they achieve combinatorial generalization. This adds to other recent work that has shown improved combinatorial generalization under specific training conditions, and raises the question of whether special mechanisms are indeed needed to support symbolic processing.
Extracting key information from documents, such as receipts or invoices, and preserving the interested texts to structured data is crucial in the document-intensive streamline processes of office automation in areas that includes but not limited to accounting, financial, and taxation areas. To avoid designing expert rules for each specific type of document, some published works attempt to tackle the problem by learning a model to explore the semantic context in text sequences based on the Named Entity Recognition (NER) method in the NLP field. In this paper, we propose to harness the effective information from both semantic meaning and spatial distribution of texts in documents. Specifically, our proposed model, Convolutional Universal Text Information Extractor (CUTIE), applies convolutional neural networks on gridded texts where texts are embedded as features with semantical connotations. We further explore the effect of employing different structures of convolutional neural network and propose a fast and portable structure. We demonstrate the effectiveness of the proposed method on a dataset with up to 6,980 labelled receipts, without any pre-training or post-processing, achieving state of the art performance that is much higher than BERT but with only 1/10 parameters and without requiring the 3,300M word dataset for pre-training. Experimental results also demonstrate that the CUTIE being able to achieve state of the art performance with much smaller amount of training data.
We describe a method for utilizing the known structure of input data to make learning more efficient. Our work is in the domain of programming languages, and we use deep neural networks to do program analysis. Computer programs include a lot of structural information (such as loop nests, conditional blocks, and data scopes), which is pertinent to program analysis. In this case, the neural network has to learn to recognize the structure, and also learn the target function for the problem. However, the structural information in this domain is readily accessible to software with the availability of compiler tools and parsers for well-defined programming languages. Our method for utilizing the known structure of input data includes: (1) pre-processing the input data to expose relevant structures, and (2) constructing neural networks by incorporating the structure of the input data as an integral part of the network design. The method has the effect of modularizing the neural network which helps break down complexity, and results in more efficient training of the overall network. We apply this method to an example code analysis problem, and show that it can achieve higher accuracy with a smaller network size and fewer training examples. Further, the method is robust, performing equally well on input data with different distributions.
This study investigates the effects Markov Chain Monte Carlo (MCMC) sampling in unsupervised Maximum Likelihood (ML) learning. Our attention is restricted to the family unnormalized probability densities for which the negative log density (or energy function) is a ConvNet. In general, we find that the majority of techniques used to stabilize training in previous studies can the opposite effect. Stable ML learning with a ConvNet potential can be achieved with only a few hyper-parameters and no regularization. With this minimal framework, we identify a variety of ML learning outcomes depending on the implementation of MCMC sampling. On one hand, we show that it is easy to train an energy-based model which can sample realistic images with short-run Langevin. ML can be effective and stable even when MCMC samples have much higher energy than true steady-state samples throughout training. Based on this insight, we introduce an ML method with noise initialization for MCMC, high-quality short-run synthesis, and the same budget as ML with informative MCMC initialization such as CD or PCD. Unlike previous models, this model can obtain realistic high-diversity samples from a noise signal after training with no auxiliary models. On the other hand, models learned with highly non-convergent MCMC do not have a valid steady-state and cannot be considered approximate unnormalized densities of the training data because long-run MCMC samples differ greatly from the data. We show that it is much harder to train an energy-based model where long-run and steady-state MCMC samples have realistic appearance. To our knowledge, long-run MCMC samples of all previous models result in unrealistic images. With correct tuning of Langevin noise, we train the first models for which long-run and steady-state MCMC samples are realistic images.
We consider deep feedforward neural networks with rectified linear units from a signal processing perspective. In this view, such representations mark the transition from using a single (data-driven) linear representation to utilizing a large collection of affine linear representations tailored to particular regions of the signal space. This paper provides a precise description of the individual affine linear representations and corresponding domain regions that the (data-driven) neural network associates to each signal of the input space. In particular, we describe atomic decompositions of the representations and, based on estimating their Lipschitz regularity, suggest some conditions that can stabilize learning independent of the network depth. Such an analysis may promote further theoretical insight from both the signal processing and machine learning communities.
Despite the great successes of machine learning, it can have its limits when dealing with insufficient training data. A potential solution is to incorporate additional knowledge into the training process which leads to the idea of informed machine learning. We present a research survey and structured overview of various approaches in this field. We aim to establish a taxonomy which can serve as a classification framework that considers the kind of additional knowledge, its representation,and its integration into the machine learning pipeline. The evaluation of numerous papers on the bases of the taxonomy uncovers key methods in this field.
The realistic modeling intended to quantify precisely some biological mechanisms is a task requiering a lot of a priori knowledge and generally leading to heavy mathematical models. On the other hand, the structure of the classical Machine Learning algorithms, such as Neural Networks, limits their flexibility and the possibility to take into account the existence of complex underlying phenomena, such as delay, saturation and accumulation. The aim of this paper is to reach a compromise between precision, parsimony and flexibility to design an efficient biomimetic predictive tool extracting knowledge from livestock data. To achieve this, we build a Mathematical Model based on Partial Differential Equations (PDE) embarking the mathematical expression of biological determinants. We made the hypothesis that all the physico-chemical phenomena occurring in animal body can be summarized by the evolution of a global information. Therefore the developed PDE system describes the evolution and the action of an information circulating in an Avatar of the Real Animal. This Avatar outlines the dynamics of the biological reactions of animal body in the framework of a specific problem. Each PDE contains parameters corresponding to biological-like factors which can be learnt from data by the developed Statistical Learning Tool.
Quality product descriptions are critical for providing competitive customer experience in an E-commerce platform. An accurate and attractive description not only helps customers make an informed decision but also improves the likelihood of purchase. However, crafting a successful product description is tedious and highly time-consuming. Due to its importance, automating the product description generation has attracted considerable interests from both research and industrial communities. Existing methods mainly use templates or statistical methods, and their performance could be rather limited. In this paper, we explore a new way to generate the personalized product description by combining the power of neural networks and knowledge base. Specifically, we propose a KnOwledge Based pEronalized (or KOBE) product description generation model in the context of E-commerce. In KOBE, we extend the encoder-decoder framework, the Transformer, to a sequence modeling formulation using self-attention. In order to make the description both informative and personalized, KOBE considers a variety of important factors during text generation, including product aspects, user categories, and knowledge base, etc. Experiments on real-world datasets demonstrate that the proposed method out-performs the baseline on various metrics. KOBE can achieve an improvement of 9.7% over state-of-the-arts in terms of BLEU. We also present several case studies as the anecdotal evidence to further prove the effectiveness of the proposed approach. The framework has been deployed in Taobao, the largest online E-commerce platform in China.
Debugging Cyber-Physical System (CPS) models can be extremely complex. Indeed, only the detection of a failure is insuffcient to know how to correct a faulty model. Faults can propagate in time and in space producing observable misbehaviours in locations completely different from the location of the fault. Understanding the reason of an observed failure is typically a challenging and laborious task left to the experience and domain knowledge of the designer. \n In this paper, we propose CPSDebug, a novel approach that by combining testing, specification mining, and failure analysis, can automatically explain failures in Simulink/Stateflow models. We evaluate CPSDebug on two case studies, involving two use scenarios and several classes of faults, demonstrating the potential value of our approach.
Due to its fast retrieval and storage efficiency capabilities, hashing has been widely used in nearest neighbor retrieval tasks. By using deep learning based techniques, hashing can outperform non-learning based hashing in many applications. However, there are some limitations to previous learning based hashing methods (e.g., the learned hash codes are not discriminative due to the hashing methods being unable to discover rich semantic information and the training strategy having difficulty optimizing the discrete binary codes). In this paper, we propose a novel learning based hashing method, named \textbf{\underline{A}}symmetric \textbf{\underline{D}}eep \textbf{\underline{S}}emantic \textbf{\underline{Q}}uantization (\textbf{ADSQ}). \textbf{ADSQ} is implemented using three stream frameworks, which consists of one \emph{LabelNet} and two \emph{ImgNets}. The \emph{LabelNet} leverages three fully-connected layers, which is used to capture rich semantic information between image pairs. For the two \emph{ImgNets}, they each adopt the same convolutional neural network structure, but with different weights (i.e., asymmetric convolutional neural networks). The two \emph{ImgNets} are used to generate discriminative compact hash codes. Specifically, the function of the \emph{LabelNet} is to capture rich semantic information that is used to guide the two \emph{ImgNets} in minimizing the gap between the real-continuous features and discrete binary codes. By doing this, \textbf{ADSQ} can make full use of the most critical semantic information to guide the feature learning process and consider the consistency of the common semantic space and Hamming space. Results from our experiments demonstrate that \textbf{ADSQ} can generate high discriminative compact hash codes and it outperforms current state-of-the-art methods on three benchmark datasets, CIFAR-10, NUS-WIDE, and ImageNet.
We present a training system, which can provably defend significantly larger neural networks than previously possible, including ResNet-34 and DenseNet-100. Our approach is based on differentiable abstract interpretation and introduces two novel concepts: (i) abstract layers for fine-tuning the precision and scalability of the abstraction, (ii) a flexible domain specific language (DSL) for describing training objectives that combine abstract and concrete losses with arbitrary specifications. Our training method is implemented in the DiffAI system.
Topics models, such as LDA, are widely used in Natural Language Processing. Making their output interpretable is an important area of research with applications to areas such as the enhancement of exploratory search interfaces and the development of interpretable machine learning models. Conventionally, topics are represented by their n most probable words, however, these representations are often difficult for humans to interpret. This paper explores the re-ranking of topic words to generate more interpretable topic representations. A range of approaches are compared and evaluated in two experiments. The first uses crowdworkers to associate topics represented by different word rankings with related documents. The second experiment is an automatic approach based on a document retrieval task applied on multiple domains. Results in both experiments demonstrate that re-ranking words improves topic interpretability and that the most effective re-ranking schemes were those which combine information about the importance of words both within topics and their relative frequency in the entire corpus. In addition, close correlation between the results of the two evaluation approaches suggests that the automatic method proposed here could be used to evaluate re-ranking methods without the need for human judgements.
Time series forecasting is one of the challenging problems for humankind. Traditional forecasting methods using mean regression models have severe shortcomings in reflecting real-world fluctuations. While new probabilistic methods rush to rescue, they fight with technical difficulties like quantile crossing or selecting a prior distribution. To meld the different strengths of these fields while avoiding their weaknesses as well as to push the boundary of the state-of-the-art, we introduce ForGAN – one step ahead probabilistic forecasting with generative adversarial networks. ForGAN utilizes the power of the conditional generative adversarial network to learn the data generating distribution and compute probabilistic forecasts from it. We argue how to evaluate ForGAN in opposition to regression methods. To investigate probabilistic forecasting of ForGAN, we create a new dataset and demonstrate our method abilities on it. This dataset will be made publicly available for comparison. Furthermore, we test ForGAN on two publicly available datasets, namely Mackey-Glass dataset and Internet traffic dataset (A5M) where the impressive performance of ForGAN demonstrate its high capability in forecasting future values.
Insufficient or even unavailable training data of emerging classes is a big challenge of many classification tasks, including text classification. Recognising text documents of classes that have never been seen in the learning stage, so-called zero-shot text classification, is therefore difficult and only limited previous works tackled this problem. In this paper, we propose a two-phase framework together with data augmentation and feature augmentation to solve this problem. Four kinds of semantic knowledge (word embeddings, class descriptions, class hierarchy, and a general knowledge graph) are incorporated into the proposed framework to deal with instances of unseen classes effectively. Experimental results show that each and the combination of the two phases achieve the best overall accuracy compared with baselines and recent approaches in classifying real-world texts under the zero-shot scenario.
Deep neural networks are known to suffer from catastrophic forgetting in class-incremental learning, where the performance on previous tasks drastically degrades when learning a new task. To alleviate this effect, we propose to leverage a continuous and large stream of unlabeled data in the wild. In particular, to leverage such transient external data effectively, we design a novel class-incremental learning scheme with (a) a new distillation loss, termed global distillation, (b) a learning strategy to avoid overfitting to the most recent task, and (c) a sampling strategy for the desired external data. Our experimental results on various datasets, including CIFAR and ImageNet, demonstrate the superiority of the proposed methods over prior methods, particularly when a stream of unlabeled data is accessible: we achieve up to 9.3% of relative performance improvement compared to the state-of-the-art method.