Balanced and Conformal Optimized Prediction set (BCOPS)
We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction set) that constructs a prediction set C(x) which tries to optimize out-of-sample performance, aiming to include the correct class as often as possible, but also detecting outliers x, for which the method returns no prediction (corresponding to C(x) equal to the empty set). BCOPS combines supervised-learning algorithms with the method of conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite-sample coverage guarantee without distributional assumptions. We also develop a variant of BCOPS in the online setting where we optimize the misclassification loss averaged over a proxy of the out-of-sample distribution. We also describe new methods for the evaluation of out-of-sample performance with mismatched data. We prove asymptotic consistency and efficiency of the proposed methods under suitable assumptions and illustrate our methods on real data examples. …
DeepLink
Recently, link prediction has attracted more attentions from various disciplines such as computer science, bioinformatics and economics. In this problem, unknown links between nodes are discovered based on numerous information such as network topology, profile information and user generated contents. Most of the previous researchers have focused on the structural features of the networks. While the recent researches indicate that contextual information can change the network topology. Although, there are number of valuable researches which combine structural and content information, but they face with the scalability issue due to feature engineering. Because, majority of the extracted features are obtained by a supervised or semi supervised algorithm. Moreover, the existing features are not general enough to indicate good performance on different networks with heterogeneous structures. Besides, most of the previous researches are presented for undirected and unweighted networks. In this paper, a novel link prediction framework called ‘DeepLink’ is presented based on deep learning techniques. In contrast to the previous researches which fail to automatically extract best features for the link prediction, deep learning reduces the manual feature engineering. In this framework, both the structural and content information of the nodes are employed. The framework can use different structural feature vectors, which are prepared by various link prediction methods. It considers all proximity orders that are presented in a network during the structural feature learning. We have evaluated the performance of DeepLink on two real social network datasets including Telegram and irBlogs. On both datasets, the proposed framework outperforms several structural and hybrid approaches for link prediction problem. …
Bayesian Nonparametric Principal Component Analysis (BNP-PCA)
Principal component analysis (PCA) is very popular to perform dimension reduction. The selection of the number of significant components is essential but often based on some practical heuristics depending on the application. Only few works have proposed a probabilistic approach able to infer the number of significant components. To this purpose, this paper introduces a Bayesian nonparametric principal component analysis (BNP-PCA). The proposed model projects observations onto a random orthogonal basis which is assigned a prior distribution defined on the Stiefel manifold. The prior on factor scores involves an Indian buffet process to model the uncertainty related to the number of components. The parameters of interest as well as the nuisance parameters are finally inferred within a fully Bayesian framework via Monte Carlo sampling. A study of the (in-)consistence of the marginal maximum a posteriori estimator of the latent dimension is carried out. A new estimator of the subspace dimension is proposed. Moreover, for sake of statistical significance, a Kolmogorov-Smirnov test based on the posterior distribution of the principal components is used to refine this estimate. The behaviour of the algorithm is first studied on various synthetic examples. Finally, the proposed BNP dimension reduction approach is shown to be easily yet efficiently coupled with clustering or latent factor models within a unique framework. …
Unsupervised Open Domain Recognition (UODR)
We address the unsupervised open domain recognition (UODR) problem, where categories in labeled source domain S is only a subset of those in unlabeled target domain T. The task is to correctly classify all samples in T including known and unknown categories. UODR is challenging due to the domain discrepancy, which becomes even harder to bridge when a large number of unknown categories exist in T. Moreover, the classification rules propagated by graph CNN (GCN) may be distracted by unknown categories and lack generalization capability. To measure the domain discrepancy for asymmetric label space between S and T, we propose Semantic-Guided Matching Discrepancy (SGMD), which first employs instance matching between S and T, and then the discrepancy is measured by a weighted feature distance between matched instances. We further design a limited balance constraint to achieve a more balanced classification output on known and unknown categories. We develop Unsupervised Open Domain Transfer Network (UODTN), which learns both the backbone classification network and GCN jointly by reducing the SGMD, enforcing the limited balance constraint and minimizing the classification loss on S. UODTN better preserves the semantic structure and enforces the consistency between the learned domain invariant visual features and the semantic embeddings. Experimental results show superiority of our method on recognizing images of both known and unknown categories. …
If you did not already know
30 Tuesday Mar 2021
Posted What is ...
in