Extreme Multi-Label Learning using Distributional Semantics (ExMLDS) google
We present a novel and scalable label embedding framework for large-scale multi-label learning a.k.a ExMLDS (Extreme Multi-Label Learning using Distributional Semantics). Our approach draws inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings for natural language processing tasks. Learning such embeddings can be reduced to a certain matrix factorization. Our approach is novel in that it highlights interesting connections between label embedding methods used for multi-label learning and paragraph/document embedding methods commonly used for learning representations of text data. The framework can also be easily extended to incorporate auxiliary information such as label-label correlations; this is crucial especially when there are a lot of missing labels in the training data. We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed learning methods perform favorably compared to several baselines and state-of-the-art methods for large-scale multi-label learning. …

Multi-Layer K-Means (MLKM) google
Data-target association is an important step in multi-target localization for the intelligent operation of un- manned systems in numerous applications such as search and rescue, traffic management and surveillance. The objective of this paper is to present an innovative data association learning approach named multi-layer K-means (MLKM) based on leveraging the advantages of some existing machine learning approaches, including K-means, K-means++, and deep neural networks. To enable the accurate data association from different sensors for efficient target localization, MLKM relies on the clustering capabilities of K-means++ structured in a multi-layer framework with the error correction feature that is motivated by the backpropogation that is well-known in deep learning research. To show the effectiveness of the MLKM method, numerous simulation examples are conducted to compare its performance with K-means, K-means++, and deep neural networks. …

KV-Index google
Time series data have exploded due to the popularity of new applications, like data center management and IoT. Time series data management system (TSDB), emerges to store and query the large volume of time series data. Subsequence matching is critical in many time series mining algorithms, and extensive approaches have been proposed. However, the shift of distributed storage system and the performance gap make these approaches not compatible with TSDB. To fill this gap, we propose a new index structure, KV-index, and the corresponding matching algorithm, KV-match. KV-index is a file-based structure, which can be easily implemented on local files, HDFS or HBase tables. KV-match algorithm probes the index efficiently with a few sequential scans. Moreover, two optimization techniques, window reduction and window reordering, are proposed to further accelerate the processing. To support the query of arbitrary lengths, we extend KV-match to KV-match$_{DP}$, which utilizes multiple varied length indexes to process the query simultaneously. A two-dimensional dynamic programming algorithm is proposed to find the optimal query segmentation. We implement our approach on both local files and HBase tables, and conduct extensive experiments on synthetic and real-world datasets. Results show that our index is of comparable size to the popular tree-style index while our query processing is order of magnitudes more efficient. …