Collaborative filtering analyzes user preferences for items (e.g., books, movies, restaurants, academic papers) by exploiting the similarity patterns across users. In implicit feedback settings, all the items, including the ones that a user did not consume, are taken into consideration. But this assumption does not accord with the common sense understanding that users have a limited scope and awareness of items. For example, a user might not have heard of a certain paper, or might live too far away from a restaurant to experience it. In the language of causal analysis, the assignment mechanism (i.e., the items that a user is exposed to) is a latent variable that may change for various user/item combinations. In this paper, we propose a new probabilistic approach that directly incorporates user exposure to items into collaborative filtering. The exposure is modeled as a latent variable and the model infers its value from data. In doing so, we recover one of the most successful state-of-the-art approaches as a special case of our model, and provide a plug-in method for conditioning exposure on various forms of exposure covariates (e.g., topics in text, venue locations). We show that our scalable inference algorithm outperforms existing benchmarks in four different domains both with and without exposure covariates.
Deep convolutional neural networks have become the gold standard for image recognition tasks, demonstrating many current state-of-the-art results and even achieving near-human level performance on some tasks. Despite this fact it has been shown that their strong generalisation qualities can be fooled to misclassify previously correctly classified natural images and give erroneous high confidence classifications to nonsense synthetic images. In this paper we extend that work, by presenting a straightforward way to perturb an image in such a way as to cause it to acquire any other label from within the dataset while leaving this perturbed image visually indistinguishable from the original.
We present the R-package mgm for the estimation of mixed graphical models underlying multivariate probability distributions over variables with different domains in high-dimensional data. Our method goes beyond existing methods in that it is the first general method to combine variables with categorical, count-measure and continuous domain, while modeling all variables on their proper domain, which avoids possible loss of information due to transformations. In addition to the presenting the estimation function, we provide a function to sample from pairwise mixed distributions and apply our method to a medical dataset.
We consider the problem of interpretable density estimation for high dimensional categorical data. In one or two dimensions, we would naturally consider histograms (bar charts) for simple density estimation problems. However, histograms do not scale to higher dimensions in an interpretable way, and one cannot usually visualize a high dimensional histogram. This work presents an alternative to the histogram for higher dimensions that can be directly visualized. These density models are in the form of a cascaded set of conditions (a tree structure), where each node in the tree is estimated to have constant density. We present two algorithms for this task, where the first one allows the user to specify the number of desired leaves in the tree as a Bayesian prior. The second algorithm allows the user to specify the desired number of branches within the prior. Our results indicate that the new approach yields sparser trees than other approaches that achieve similar test performance.