Gossip-bAseD sub-GradiEnT SVM (GADGET SVM) google
In the era of big data, an important weapon in a machine learning researcher’s arsenal is a scalable Support Vector Machine (SVM) algorithm. SVMs are extensively used for solving classification problems. Traditional algorithms for learning SVMs often scale super linearly with training set size which becomes infeasible very quickly for large data sets. In recent years, scalable algorithms have been designed which study the primal or dual formulations of the problem. This often suggests a way to decompose the problem and facilitate development of distributed algorithms. In this paper, we present a distributed algorithm for learning linear Support Vector Machines in the primal form for binary classification called Gossip-bAseD sub-GradiEnT (GADGET) SVM. The algorithm is designed such that it can be executed locally on nodes of a distributed system. Each node processes its local homogeneously partitioned data and learns a primal SVM model. It then gossips with random neighbors about the classifier learnt and uses this information to update the model. Extensive theoretical and empirical results suggest that this anytime algorithm has performance comparable to its centralized and online counterparts. …

Multi Preference Closure (MP-closure) google
The paper describes a preferential approach for dealing with exceptions in KLM preferential logics, based on the rational closure. It is well known that the rational closure does not allow an independent handling of the inheritance of different defeasible properties of concepts. Several solutions have been proposed to face this problem and the lexicographic closure is the most notable one. In this work, we consider an alternative closure construction, called the Multi Preference closure (MP-closure), that has been first considered for reasoning with exceptions in DLs. Here, we reconstruct the notion of MP-closure in the propositional case and we show that it is a natural variant of Lehmann’s lexicographic closure. Abandoning Maximal Entropy (an alternative route already considered but not explored by Lehmann) leads to a construction which exploits a different lexicographic ordering w.r.t. the lexicographic closure, and determines a preferential consequence relation rather than a rational consequence relation. We show that, building on the MP-closure semantics, rationality can be recovered, at least from the semantic point of view, resulting in a rational consequence relation which is stronger than the rational closure, but incomparable with the lexicographic closure. We also show that the MP-closure is stronger than the Relevant Closure. …

Episodic Memory Reader (EMR) google
We consider a novel question answering (QA) task where the machine needs to read from large streaming data (long documents or videos) without knowing when the questions will be given, in which case the existing QA methods fail due to lack of scalability. To tackle this problem, we propose a novel end-to-end reading comprehension method, which we refer to as Episodic Memory Reader (EMR) that sequentially reads the input contexts into an external memory, while replacing memories that are less important for answering unseen questions. Specifically, we train an RL agent to replace a memory entry when the memory is full in order to maximize its QA accuracy at a future timepoint, while encoding the external memory using the transformer architecture to learn representations that considers relative importance between the memory entries. We validate our model on a real-world large-scale textual QA task (TriviaQA) and a video QA task (TVQA), on which it achieves significant improvements over rule-based memory scheduling policies or an RL-based baseline that learns the query-specific importance of each memory independently. …

BinGAN google
In this paper, we propose a novel regularization method for Generative Adversarial Networks, which allows the model to learn discriminative yet compact binary representations of image patches (image descriptors). We employ the dimensionality reduction that takes place in the intermediate layers of the discriminator network and train binarized low-dimensional representation of the penultimate layer to mimic the distribution of the higher-dimensional preceding layers. To achieve this, we introduce two loss terms that aim at: (i) reducing the correlation between the dimensions of the binarized low-dimensional representation of the penultimate layer i. e. maximizing joint entropy) and (ii) propagating the relations between the dimensions in the high-dimensional space to the low-dimensional space. We evaluate the resulting binary image descriptors on two challenging applications, image matching and retrieval, and achieve state-of-the-art results. …