SuperM-Tree google
A common approach to implementing similarity search applications is the usage of distance functions, where small distances indicate high similarity. In the case of metric distance functions, metric index structures can be used to accelerate nearest neighbor queries. On the other hand, many applications ask for approximate subsequences or subsets, e.g. searching for a similar partial sequence of a gene, for a similar scene in a movie, or for a similar object in a picture which is represented by a set of multidimensional features. Metric index structures such as the M-Tree cannot be utilized for these tasks because of the symmetry of the metric distance functions. In this work, we propose the SuperM-Tree as an extension of the M-Tree where approximate subsequence and subset queries become nearest neighbor queries. In order to do this, we introduce metric subset spaces as a generalized concept of metric spaces. Various metric distance functions can be extended to metric subset distance functions, e.g. the Euclidean distance (on windows), the Hausdorff distance (on subsets), the Edit distance and the Dog-Keeper distance (on subsequences). We show that these examples subsume the applications mentioned above. …

Termite google
Data-driven analysis is important in virtually every modern organization. Yet, most data is underutilized because it remains locked in silos inside of organizations; large organizations have thousands of databases, and billions of files that are not integrated together in a single, queryable repository. Despite 40+ years of continuous effort by the database community, data integration still remains an open challenge. In this paper, we advocate a different approach: rather than trying to infer a common schema, we aim to find another common representation for diverse, heterogeneous data. Specifically, we argue for an embedding (i.e., a vector space) in which all entities, rows, columns, and paragraphs are represented as points. In the embedding, the distance between points indicates their degree of relatedness. We present Termite, a prototype we have built to learn the best embedding from the data. Because the best representation is learned, this allows Termite to avoid much of the human effort associated with traditional data integration tasks. On top of Termite, we have implemented a Termite-Join operator, which allows people to identify related concepts, even when these are stored in databases with different schemas and in unstructured data such as text files, webpages, etc. Finally, we show preliminary evaluation results of our prototype via a user study, and describe a list of future directions we have identified. …

MortonNet google
We present a self-supervised task on point clouds, in order to learn meaningful point-wise features that encode local structure around each point. Our self-supervised network, named MortonNet, operates directly on unstructured/unordered point clouds. Using a multi-layer RNN, MortonNet predicts the next point in a point sequence created by a popular and fast Space Filling Curve, the Morton-order curve. The final RNN state (coined Morton feature) is versatile and can be used in generic 3D tasks on point clouds. In fact, we show how Morton features can be used to significantly improve performance (+3% for 2 popular semantic segmentation algorithms) in the task of semantic segmentation of point clouds on the challenging and large-scale S3DIS dataset. We also show how MortonNet trained on S3DIS transfers well to another large-scale dataset, vKITTI, leading to an improvement over state-of-the-art of 3.8%. Finally, we use Morton features to train a much simpler and more stable model for part segmentation in ShapeNet. Our results show how our self-supervised task results in features that are useful for 3D segmentation tasks, and generalize well to other datasets. …

Sparse Unscented Kalman Filter (sparse UKF) google
Several variations of the Kalman filter algorithm, such as the extended Kalman filter (EKF) and the unscented Kalman filter (UKF), are widely used in science and engineering applications. In this paper, we introduce two algorithms of sparsity-based Kalman filters, namely the sparse UKF and the progressive EKF. The filters are designed specifically for problems with very high dimensions. Different from various types of ensemble Kalman filters (EnKFs) in which the error covariance is approximated using a set of dense ensemble vectors, the algorithms developed in this paper are based on sparse matrix approximations of error covariance. The new algorithms enjoy several advantages. The error covariance has full rank without being limited by a set of ensembles. In addition to the estimated states, the algorithms provide updated error covariance for the next assimilation cycle. The sparsity of error covariance significantly reduces the required memory size for the numerical computation. In addition, the granularity of the sparse error covariance can be adjusted to optimize the parallelization of the algorithms. …