While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies. In addition, the semantic structures within given sequential data can be interpreted by visualizing temporal dependencies learned from the method. The proposed method, called Temporal Dependency Network (TDN), represents a video as a temporal graph whose node represents a frame of the video and whose edge represents the temporal dependency between two frames of a variable distance. The temporal dependency structure of semantic is discovered by learning parameterized kernels of graph convolutional methods. We evaluate the proposed method on the large-scale video dataset, Youtube-8M. By visualizing the temporal dependency structures as experimental results, we show that the suggested method can find the temporal dependency structures of video semantic.
United State possesses the largest economy in the world, and thus serves as an ideal representation to investigate the motility model for the developed country. Early surveillance of the causes of death is critical which can allow the preparation of preventive steps against critical disease such as dengue fever. Studies reported that some search queries, especially those diseases related terms on Google Trends are essential. To this end, we include either main cause of death or the extended or the more general terminologies from Google Trends to decode the mortality related terms using the Wiener Cascade Model. Using time series and Wavelet scalogram of search terms, the patterns of search queries are categorized into different levels of periodicity. The results include (1) the decoding trend, (2) the features importance, and (3) the accuracy of the decoding patterns. Three scenarios regard predictors include the use of (1) all 19 features, (2) the top ten most periodic predictors, or (3) the ten predictors with highest weighting. All search queries spans from December 2013 – December 2018. The results show that search terms with either higher weight and annual periodic pattern contribute more in forecasting the word die; however, only predictors with higher weight are valuable to forecast the word death.
In this work, we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. The algorithm is unsupervised. It is able to detect and track anomalous activity in a dynamic network despite the noise from multiple interfering sources. We use the Hopfield network model of memory to combine the graph and time information. We show that anomalies can be spotted with a good precision using a memory network. The presented approach is scalable and we provide a distributed implementation of the algorithm. To demonstrate its efficiency, we apply it to two datasets: Enron Email dataset and Wikipedia page views. We show that the anomalous spikes are triggered by the real-world events that impact the network dynamics. Besides, the structure of the clusters and the analysis of the time evolution associated with the detected events reveals interesting facts on how humans interact, exchange and search for information, opening the door to new quantitative studies on collective and social behavior on large and dynamic datasets.
In this paper we present EvalNE, a Python toolbox for evaluating network embedding methods on link prediction tasks. Link prediction is one of the most popular choices for evaluating the quality of network embeddings. However, the complexity of this task requires a carefully designed evaluation pipeline in order to provide consistent, reproducible and comparable results. EvalNE simplifies this process by providing automation and abstraction of tasks such as hyper-parameter tuning and model validation, edge sampling and negative edge sampling, computation of edge embeddings from node embeddings, and evaluation metrics. The toolbox allows for the evaluation of any off-the-shelf embedding method without the need to write extra code. Moreover, it can also be used for evaluating any other link prediction method, and integrates several link prediction heuristics as baselines.
A person dependent network, called an AlterEgo net, is proposed for development. The networks are created per person. It receives at input an object descriptions and outputs a simulation of the internal person’s representation of the objects. The network generates a textual stream resembling the narrative stream of consciousness depicting multitudinous thoughts and feelings related to a perceived object. In this way, the object is described not by a ‘static’ set of its properties, like a dictionary, but by the stream of words and word combinations referring to the object. The network simulates a person’s dialogue with a representation of the object. It is based on an introduced algorithmic scheme, where perception is modeled by two interacting iterative cycles, reminding one respectively the forward and backward propagation executed at training convolution neural networks. The ‘forward’ iterations generate a stream representing the ‘internal world’ of a human. The ‘backward’ iterations generate a stream representing an internal representation of the object. People perceive the world differently. Tuning AlterEgo nets to a specific person or group of persons, will allow simulation of their thoughts and feelings. Thereby these nets is potentially a new human augmentation technology for various applications.
We apply the Ordered Weighted Averaging (OWA) operator in multi-criteria decision-making. To satisfy different kinds of uncertainty, measure based dominance has been presented to gain the order of different criterion. However, this idea has not been applied in fuzzy system until now. In this paper, we focus on the situation where the linguistic satisfactions are fuzzy measures instead of the exact values. We review the concept of OWA operator and discuss the order mechanism of fuzzy number. Then we combine with measure-based dominance to give an overall score of each alternatives. An example is illustrated to show the whole procedure.
Current clinical practice for monitoring patients’ health follows either regular or heuristic-based lab test (e.g. blood test) scheduling. Such practice not only gives rise to redundant measurements accruing cost, but may even cause unnecessary patient discomfort. From the computational perspective, heuristic-based test scheduling might lead to reduced accuracy of clinical forecasting models. A data-driven measurement scheduling is likely to lead to both more accurate predictions and less measurement costs. We address the scheduling problem using deep reinforcement learning (RL) and propose a general and scalable framework to achieve high predictive gain and low measurement cost, by scheduling fewer, but strategically timed tests. Using simulations we show that our policy outperforms heuristic-based measurement scheduling with higher predictive gain and lower cost. We then learn a scheduling policy for mortality forecasting in the real-world clinical dataset (MIMIC3). Our policy decreases the total number of measurements by 31% without reducing the predictive performance, or improves 3 times more predictive gain with the same number of measurements using off-policy policy evaluation.
We provide excess risk guarantees for statistical learning in the presence of an unknown nuisance component. We analyze a two-stage sample splitting meta-algorithm that takes as input two arbitrary estimation algorithms: one for the target model and one for the nuisance model. We show that if the population risk satisfies a condition called Neyman orthogonality, the impact of the first stage error on the excess risk bound achieved by the meta-algorithm is of second order. Our general theorem is agnostic to the particular algorithms used for the target and nuisance and only makes an assumption on their individual performance. This enables the use of a plethora of existing results from statistical learning and machine learning literature to give new guarantees for learning with a nuisance component. Moreover, by focusing on excess risk rather than parameter estimation, we can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class. When the nuisance and target parameters belong to arbitrary classes, we characterize conditions on the metric entropy such that oracle rates—rates of the same order as if we knew the nuisance model—are achieved. We also analyze the rates achieved by specific estimation algorithms such as variance-penalized empirical risk minimization, neural network estimation and sparse high-dimensional linear model estimation. We highlight the applicability of our results via four applications of primary importance: 1) heterogeneous treatment effect estimation, 2) offline policy optimization, 3) domain adaptation, and 4) learning with missing data.
Sometimes it is not enough for a DNN to produce an outcome. For example, in applications such as healthcare, users need to understand the rationale of the decisions. Therefore, it is imperative to develop algorithms to learn models with good interpretability (Doshi-Velez 2017). An important factor that leads to the lack of interpretability of DNNs is the ambiguity of neurons, where a neuron may fire for various unrelated concepts. This work aims to increase the interpretability of DNNs on the whole image space by reducing the ambiguity of neurons. In this paper, we make the following contributions: 1) We propose a metric to evaluate the consistency level of neurons in a network quantitatively. 2) We find that the learned features of neurons are ambiguous by leveraging adversarial examples. 3) We propose to improve the consistency of neurons on adversarial example subset by an adversarial training algorithm with a consistent loss.
When trained on multimodal image datasets, normal Generative Adversarial Networks (GANs) are usually outperformed by class-conditional GANs and ensemble GANs, but conditional GANs is restricted to labeled datasets and ensemble GANs lack efficiency. We propose a novel GAN variant called virtual conditional GAN (vcGAN) which is not only an ensemble GAN with multiple generative paths while adding almost zero network parameters, but also a conditional GAN that can be trained on unlabeled datasets without explicit clustering steps or objectives other than the adversary loss. Inside the vcGAN’s generator, a learnable “analog-to-digital converter (ADC)’ module maps a slice of the inputted multivariate Gaussian noise to discrete/digital noise (virtual label), according to which a selector selects the corresponding generative path to produce the sample. All the generative paths share the same decoder network while in each path the decoder network is fed with a concatenation of a different pre-computed amplified one-hot vector and the inputted Gaussian noise. We conducted a lot of experiments on several balanced/imbalanced image datasets to demonstrate that vcGAN converges faster and achieves improved Frech\’et Inception Distance (FID). In addition, we show the training byproduct that the ADC in vcGAN learned the categorical probability of each mode and that each generative path generates samples of specific mode, which enables class-conditional sampling. Codes are available at \url{https://…/vcgan}
This paper aims to use term clustering to build a modular ontology according to core ontology from domain-specific text. The acquisition of semantic knowledge focuses on noun phrase appearing with the same syntactic roles in relation to a verb or its preposition combination in a sentence. The construction of this co-occurrence matrix from context helps to build feature space of noun phrases, which is then transformed to several encoding representations including feature selection and dimensionality reduction. In addition, the content has also been presented with the construction of word vectors. These representations are clustered respectively with K-Means and Affinity Propagation (AP) methods, which differentiate into the term clustering frameworks. Due to the randomness of K-Means, iteration efforts are adopted to find the optimal parameter. The frameworks are evaluated extensively where AP shows dominant effectiveness for co-occurred terms and NMF encoding technique is salient by its promising facilities in feature compression.
This work lists and describes the main recent strategies for building fixed-length, dense and distributed representations for words, based on the distributional hypothesis. These representations are now commonly called word embeddings and, in addition to encoding surprisingly good syntactic and semantic information, have been proven useful as extra features in many downstream NLP tasks.
During active learning, an effective stopping method allows users to limit the number of annotations, which is cost effective. In this paper, a new stopping method called Predicted Change of F Measure will be introduced that attempts to provide the users an estimate of how much performance of the model is changing at each iteration. This stopping method can be applied with any base learner. This method is useful for reducing the data annotation bottleneck encountered when building text classification systems.
Stacking is a general approach for combining multiple models toward greater predictive accuracy. It has found various application across different domains, ensuing from its meta-learning nature. Our understanding, nevertheless, on how and why stacking works remains intuitive and lacking in theoretical insight. In this paper, we use the stability of learning algorithms as an elemental analysis framework suitable for addressing the issue. To this end, we analyze the hypothesis stability of stacking, bag-stacking, and dag-stacking and establish a connection between bag-stacking and weighted bagging. We show that the hypothesis stability of stacking is a product of the hypothesis stability of each of the base models and the combiner. Moreover, in bag-stacking and dag-stacking, the hypothesis stability depends on the sampling strategy used to generate the training set replicates. Our findings suggest that 1) subsampling and bootstrap sampling improve the stability of stacking, and 2) stacking improves the stability of both subbagging and bagging.
We consider the problem of selective prediction (also known as reject option) in deep neural networks, and introduce SelectiveNet, a deep neural architecture with an integrated reject option. Existing rejection mechanisms are based mostly on a threshold over the prediction confidence of a pre-trained network. In contrast, SelectiveNet is trained to optimize both classification (or regression) and rejection simultaneously, end-to-end. The result is a deep neural network that is optimized over the covered domain. In our experiments, we show a consistently improved risk-coverage trade-off over several well-known classification and regression datasets, thus reaching new state-of-the-art results for deep selective classification.
Humans are capable of attributing latent mental contents such as beliefs, or intentions to others. The social skill is critical in everyday life to reason about the potential consequences of their behaviors so as to plan ahead. It is known that humans use this reasoning ability recursively, i.e. considering what others believe about their own beliefs. In this paper, we start from level-$1$ recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents’ conditional policy, to which each agent finds the best response and then improve their own policy. We develop decentralized-training-decentralized-execution algorithms, PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenario when there is one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.