Gated Transfer Network (GTN)
Deep neural networks have led to a series of breakthroughs in computer vision given sufficient annotated training datasets. For novel tasks with limited labeled data, the prevalent approach is to transfer the knowledge learned in the pre-trained models to the new tasks by fine-tuning. Classic model fine-tuning utilizes the fact that well trained neural networks appear to learn cross domain features. These features are treated equally during transfer learning. In this paper, we explore the impact of feature selection in model fine-tuning by introducing a transfer module, which assigns weights to features extracted from pre-trained models. The proposed transfer module proves the importance of feature selection for transferring models from source to target domains. It is shown to significantly improve upon fine-tuning results with only marginal extra computational cost. We also incorporate an auxiliary classifier as an extra regularizer to avoid over-fitting. Finally, we build a Gated Transfer Network (GTN) based on our transfer module and achieve state-of-the-art results on six different tasks. …

Probabilistic approach to neural ARchitecture SEarCh (PARSEC)
In neural architecture search (NAS), the space of neural network architectures is automatically explored to maximize predictive accuracy for a given task. Despite the success of recent approaches, most existing methods cannot be directly applied to large scale problems because of their prohibitive computational complexity or high memory usage. In this work, we propose a Probabilistic approach to neural ARchitecture SEarCh (PARSEC) that drastically reduces memory requirements while maintaining state-of-the-art computational complexity, making it possible to directly search over more complex architectures and larger datasets. Our approach only requires as much memory as is needed to train a single architecture from our search space. This is due to a memory-efficient sampling procedure wherein we learn a probability distribution over high-performing neural network architectures. Importantly, this framework enables us to transfer the distribution of architectures learnt on smaller problems to larger ones, further reducing the computational cost. We showcase the advantages of our approach in applications to CIFAR-10 and ImageNet, where our approach outperforms methods with double its computational cost and matches the performance of methods with costs that are three orders of magnitude larger. …

String Attractors
String attractors are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. …

Multi-Probe Count
An important question that arises in the study of high dimensional vector representations learned from data is: given a set $\mathcal{D}$ of vectors and a query $q$, estimate the number of points within a specified distance threshold of $q$. We develop two estimators, LSH Count and Multi-Probe Count that use locality sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and sample complexity of our schemes, and demonstrate their effectiveness in experiments on a standard word embedding dataset. …