Lanczos Latent Factor Recommender (LLFR) google
The purpose if this master’s thesis is to study and develop a new algorithmic framework for Collaborative Filtering to produce recommendations in the top-N recommendation problem. Thus, we propose Lanczos Latent Factor Recommender (LLFR); a novel ‘big data friendly’ collaborative filtering algorithm for top-N recommendation. Using a computationally efficient Lanczos-based procedure, LLFR builds a low dimensional item similarity model, that can be readily exploited to produce personalized ranking vectors over the item space. A number of experiments on real datasets indicate that LLFR outperforms other state-of-the-art top-N recommendation methods from a computational as well as a qualitative perspective. Our experimental results also show that its relative performance gains, compared to competing methods, increase as the data get sparser, as in the Cold Start Problem. More specifically, this is true both when the sparsity is generalized – as in the New Community Problem, a very common problem faced by real recommender systems in their beginning stages, when there is not sufficient number of ratings for the collaborative filtering algorithms to uncover similarities between items or users – and in the very interesting case where the sparsity is localized in a small fraction of the dataset – as in the New Users Problem, where new users are introduced to the system, they have not rated many items and thus, the CF algorithm can not make reliable personalized recommendations yet. …

Set Aggregating Network (SAN) google
We construct a general unified framework for learning representation of structured data, i.e. data which cannot be represented as the fixed-length vectors (e.g. sets, graphs, texts or images of varying sizes). The key factor is played by an intermediate network called SAN (Set Aggregating Network), which maps a structured object to a fixed length vector in a high dimensional latent space. Our main theoretical result shows that for sufficiently large dimension of the latent space, SAN is capable of learning a unique representation for every input example. Experiments demonstrate that replacing pooling operation by SAN in convolutional networks leads to better results in classifying images with different sizes. Moreover, its direct application to text and graph data allows to obtain results close to SOTA, by simpler networks with smaller number of parameters than competitive models. …

Dimensional Collapse google
… When I stared at the plot, I ask myself, why not map the x-axis information of the points to the very first one according to the y-axis ‘connections’. When everything goes well and all done, all the grey points should be mapped along the red arrows to the first marks of the groups, and there should be only 4 marks leave on x-axis: a, b, d and g, instead of 9 marks in the first place. And the y-axis information, after contributing all the ‘connection rules’, can be put away now, since the left x-axis marks are exactly what I want: the final flags. It is why I like to call it ‘Dimensional Collapse’. … …

Morphed Learning google
The concern of potential privacy violation has prevented efficient use of big data for improving deep learning based applications. In this paper, we propose Morphed Learning, a privacy-preserving technique for deep learning based on data morphing that, allows data owners to share their data without leaking sensitive privacy information. Morphed Learning allows the data owners to send securely morphed data and provides the server with an Augmented Convolutional layer to train the network on morphed data without performance loss. Morphed Learning has these three features: (1) Strong protection against reverse-engineering on the morphed data; (2) Acceptable computational and data transmission overhead with no correlation to the depth of the neural network; (3) No degradation of the neural network performance. Theoretical analyses on CIFAR-10 dataset and VGG-16 network show that our method is capable of providing 10^89 morphing possibilities with only 5% computational overhead and 10% transmission overhead under limited knowledge attack scenario. Further analyses also proved that our method can offer same resilience against full knowledge attack if more resources are provided. …