We utilize commonsense knowledge bases to address the problem of real- time multimodal analysis. In particular, we focus on the problem of multimodal sentiment analysis, which consists in the simultaneous analysis of different modalities, e.g., speech and video, for emotion and polarity detection. Our approach takes advantages of the massively parallel processing power of modern GPUs to enhance the performance of feature extraction from different modalities. In addition, in order to extract important textual features from multimodal sources we generate domain-specific graphs based on commonsense knowledge and apply GPU-based graph traversal for fast feature detection. Then, powerful ELM classifiers are applied to build the sentiment analysis model based on the extracted features. We conduct our experiments on the YouTube dataset and achieve an accuracy of 78% which outperforms all previous systems. In term of processing speed, our method shows improvements of several orders of magnitude for feature extraction compared to CPU-based counterparts.
Analyzing job hopping behavior is important for the understanding of job preference and career progression of working individuals. When analyzed at the workforce population level, job hop analysis helps to gain insights of talent flow among different jobs and organizations. Traditionally, surveys are conducted on job seekers and employers to study job hop behavior. Beyond surveys, job hop behavior can also be studied in a highly scalable and timely manner using a data driven approach in response to fast-changing job landscape. Fortunately, the advent of online professional networks (OPNs) has made it possible to perform a large-scale analysis of talent flow. In this paper, we present a new data analytics framework to analyze the talent flow patterns of close to 1 million working professionals from three different countries/regions using their publicly-accessible profiles in an established OPN. As OPN data are originally generated for professional networking applications, our proposed framework re-purposes the same data for a different analytics task. Prior to performing job hop analysis, we devise a job title normalization procedure to mitigate the amount of noise in the OPN data. We then devise several metrics to measure the amount of work experience required to take up a job, to determine that existence duration of the job (also known as the job age), and the correlation between the above metric and propensity of hopping. We also study how job hop behavior is related to job promotion/demotion. Lastly, we perform connectivity analysis at job and organization levels to derive insights on talent flow as well as job and organizational competitiveness.
Recommender systems objectives can be broadly characterized as modeling user preferences over short-or long-term time horizon. A large body of previous research studied long-term recommendation through dimensionality reduction techniques applied to the historical user-item interactions. A recently introduced session-based recommendation setting highlighted the importance of modeling short-term user preferences. In this task, Recurrent Neural Networks (RNN) have shown to be successful at capturing the nuances of user’s interactions within a short time window. In this paper, we evaluate RNN-based models on both short-term and long-term recommendation tasks. Our experimental results suggest that RNNs are capable of predicting immediate as well as distant user interactions. We also find the best performing configuration to be a stacked RNN with layer normalization and tied item embeddings.
To select the best algorithm for a new problem is an expensive and difficult task. However, there are automatic solutions to address this problem: using Metalearning, which takes advantage of problem characteristics (i.e. metafeatures), one is able to predict the relative performance of algorithms. In the Collaborative Filtering scope, recent works have proposed diverse metafeatures describing several dimensions of this problem. Despite interesting and effective findings, it is still unknown whether these are the most effective metafeatures. Hence, this work proposes a new set of graph metafeatures, which approach the Collaborative Filtering problem from a Graph Theory perspective. Furthermore, in order to understand whether metafeatures from multiple dimensions are a better fit, we investigate the effects of comprehensive metafeatures. These metafeatures are a selection of the best metafeatures from all existing Collaborative Filtering metafeatures. The impact of the most representative metafeatures is investigated in a controlled experimental setup. Another contribution we present is the use of a Pareto-Efficient ranking procedure to create multicriteria metatargets. These new rankings of algorithms, which take into account multiple evaluation measures, allow to explore the algorithm selection problem in a fairer and more detailed way. According to the experimental results, the graph metafeatures are a good alternative to related work metafeatures. However, the results have shown that the feature selection procedure used to create the comprehensive metafeatures is is not effective, since there is no gain in predictive performance. Finally, an extensive metaknowledge analysis was conducted to identify the most influential metafeatures.
With the improvement of medical data capturing, vast amount of continuous patient monitoring data, e.g., electrocardiogram (ECG), real-time vital signs and medications, become available for clinical decision support at intensive care units (ICUs). However, it becomes increasingly challenging to model such data, due to high density of the monitoring data, heterogeneous data types and the requirement for interpretable models. Integration of these high-density monitoring data with the discrete clinical events (including diagnosis, medications, labs) is challenging but potentially rewarding since richness and granularity in such multimodal data increase the possibilities for accurate detection of complex problems and predicting outcomes (e.g., length of stay and mortality). We propose Recurrent Attentive and Intensive Model (RAIM) for jointly analyzing continuous monitoring data and discrete clinical events. RAIM introduces an efficient attention mechanism for continuous monitoring data (e.g., ECG), which is guided by discrete clinical events (e.g, medication usage). We apply RAIM in predicting physiological decompensation and length of stay in those critically ill patients at ICU. With evaluations on MIMIC- III Waveform Database Matched Subset, we obtain an AUC-ROC score of 90.18% for predicting decompensation and an accuracy of 86.82% for forecasting length of stay with our final model, which outperforms our six baseline models.
There is a trend towards using very large deep neural networks (DNN) to improve the accuracy of complex machine learning tasks. However, the size of DNN models that can be explored today is limited by the amount of GPU device memory. This paper presents Tofu, a system for partitioning very large DNN models across multiple GPU devices. Tofu is designed for a tensor-based dataflow system: for each operator in the dataflow graph, it partitions its input/output tensors and parallelizes its execution across workers. Tofu can automatically discover how each operator can be partitioned by analyzing its semantics expressed in a simple specification language. Tofu uses a search algorithm based on dynamic programming to determine the best partition strategy for each operator in the entire dataflow graph. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves better performance than alternative approaches to train very large models on multiple GPUs.
Active Learning (AL) is a learning task that requires learners interactively query the labels of the sampled unlabeled instances to minimize the training outputs with human supervisions. In theoretical study, learners approximate the version space which covers all possible classification hypothesis into a bounded convex body and try to shrink the volume of it into a half-space by a given cut size. However, only the hypersphere with finite VC dimensions has obtained formal approximation guarantees that hold when the classes of Euclidean space are separable with a margin. In this paper, we approximate the version space to a structured {hypersphere} that covers most of the hypotheses, and then divide the available AL sampling approaches into two kinds of strategies: Outer Volume Sampling and Inner Volume Sampling. After providing provable guarantees for the performance of AL in version space, we aggregate the two kinds of volumes to eliminate their sampling biases via finding the optimal inscribed hyperspheres in the enclosing space of outer volume. To touch the version space from Euclidean space, we propose a theoretical bridge called Volume-based Model that increases the `sampling target-independent’. In non-linear feature space, spanned by kernel, we use sequential optimization to globally optimize the original space to a sparse space by halving the size of the kernel space. Then, the EM (Expectation Maximization) model which returns the local center helps us to find a local representation. To describe this process, we propose an easy-to-implement algorithm called Volume-based AL (VAL).
Ensemble learning for anomaly detection of data structured into complex network has been barely studied due to the inconsistent performance of complex network characteristics and lack of inherent objective function. In this paper, we propose the IFSAD, a new two-phase ensemble method for anomaly detection based on intuitionistic fuzzy set, and applies it to the abnormal behavior detection problem in temporal complex networks. First, it constructs the intuitionistic fuzzy set of single network characteristic which quantifies the degree of membership, non-membership and hesitation of each of network characteristic to the defined linguistic variables so that makes the unuseful or noise characteristics become part of the detection. To build an objective intuitionistic fuzzy relationship, we propose an Gaussian distribution-based membership function which gives a variable hesitation degree. Then, for the fuzzification of multiple network characteristics, the intuitionistic fuzzy weighted geometric operator is adopted to fuse multiple IFSs and to avoid the inconsistent of multiple characteristics. Finally, the score function and precision function are used to sort the fused IFS. Finally we carried out extensive experiments on several complex network datasets for anomaly detection, and the results demonstrate the superiority of our method to state-of-the-art approaches, validating the effectiveness of our method.
Network data has rapidly emerged as an important and active area of statistical methodology. In this paper we consider the problem of anomaly detection in networks. Given a large background network, we seek to detect whether there is a small anomalous subgraph present in the network, and if such a subgraph is present, which nodes constitute the subgraph. We propose an inferential tool based on egonets to answer this question. The proposed method is computationally efficient and naturally amenable to parallel computing, and easily extends to a wide variety of network models. We demonstrate through simulation studies that the egonet method works well under a wide variety of network models. We obtain some fascinating empirical results by applying the egonet method on several well-studied benchmark datasets.
We consider the clustering aggregation problem in which we are given a set of clusterings and want to find an aggregated clustering which minimizes the sum of mismatches to the input clusterings. In the binary case (each clustering is a bipartition) this problem was known to be NP-hard under Turing reduction. We strengthen this result by providing a polynomial-time many-one reduction. Our result also implies that no $2^{o(n)} \cdot |I|^{O(1)}$-time algorithm exists for any clustering instance $I$ with $n$ elements, unless the Exponential Time Hypothesis fails. On the positive side, we show that the problem is fixed-parameter tractable with respect to the number of input clusterings.
Assume that there are multiple data streams (channels, sensors) and in each stream the process of interest produces generally dependent and non-identically distributed observations. When the process is in a normal mode (in-control), the (pre-change) distribution is known, but when the process becomes abnormal there is a parametric uncertainty, i.e., the post-change (out-of-control) distribution is known only partially up to a parameter. Both the change point and the post-change parameter are unknown. Moreover, the change affects an unknown subset of streams, so that the number of affected streams and their location are unknown in advance. A good changepoint detection procedure should detect the change as soon as possible after its occurrence while controlling for a risk of false alarms. We consider a Bayesian setup with a given prior distribution of the change point and propose two sequential mixture-based change detection rules, one mixes a Shiryaev-type statistic over both the unknown subset of affected streams and the unknown post-change parameter and another mixes a Shiryaev-Roberts-type statistic. These rules generalize the mixture detection procedures studied by Tartakovsky (2018) in a single-stream case. We provide sufficient conditions under which the proposed multistream change detection procedures are first-order asymptotically optimal with respect to moments of the delay to detection as the probability of false alarm approaches zero.
Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually. In this work, we show that the existing resources are, however, not adequate for assessing cross-lingual AM, due to their heterogeneity or lack of complexity. We therefore create suitable parallel corpora by (human and machine) translating a popular AM dataset consisting of persuasive student essays into German, French, Spanish, and Chinese. We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates the loss from cross-lingual transfer. Moreover, we find that annotation projection works equally well when using either costly human or cheap machine translations. Our code and data are available at \url{http://…/coling2018-xling_argument_mining}.
Matrix completion aims to reconstruct a data matrix based on observations of a small number of its entries. Usually in matrix completion a single matrix is considered, which can be, for example, a rating matrix in recommendation system. However, in practical situations, data is often obtained from multiple sources which results in a collection of matrices rather than a single one. In this work, we consider the problem of collective matrix completion with multiple and heterogeneous matrices, which can be count, binary, continuous, etc. We first investigate the setting where, for each source, the matrix entries are sampled from an exponential family distribution. Then, we relax the assumption of exponential family distribution for the noise and we investigate the distribution-free case. In this setting, we do not assume any specific model for the observations. The estimation procedures are based on minimizing the sum of a goodness-of-fit term and the nuclear norm penalization of the whole collective matrix. We prove that the proposed estimators achieve fast rates of convergence under the two considered settings and we corroborate our results with numerical experiments.
Deep Learning is a consolidated, state-of-the-art Machine Learning tool to fit a function when provided with large data sets of examples. However, in regression tasks, the straightforward application of Deep Learning models provides a point estimate of the target. In addition, the model does not take into account the uncertainty of a prediction. This represents a great limitation for tasks where communicating an erroneous prediction carries a risk. In this paper we tackle a real-world problem of forecasting impending financial expenses and incomings of customers, while displaying predictable monetary amounts on a mobile app. In this context, we investigate if we would obtain an advantage by applying Deep Learning models with a Heteroscedastic model of the variance of a network’s output. Experimentally, we achieve a higher accuracy than non-trivial baselines. More importantly, we introduce a mechanism to discard low-confidence predictions, which means that they will not be visible to users. This should help enhance the user experience of our product.