CharBot
Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this work, we present a novel DGA called CharBot which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). CharBot is very simple, effective and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date. …

We consider the problem of ranking a set of items from pairwise comparisons in the presence of features associated with the items. Recent works have established that $O(n\log(n))$ samples are needed to rank well when there is no feature information present. However, this might be sub-optimal in the presence of associated features. We introduce a new probabilistic preference model called feature-Bradley-Terry-Luce (f-BTL) model that generalizes the standard BTL model to incorporate feature information. We present a new least squares based algorithm called fBTL-LS which we show requires much lesser than $O(n\log(n))$ pairs to obtain a good ranking — precisely our new sample complexity bound is of $O(\alpha\log \alpha)$, where $\alpha$ denotes the number of `independent items’ of the set, in general $\alpha << n$. Our analysis is novel and makes use of tools from classical graph matching theory to provide tighter bounds that sheds light on the true complexity of the ranking problem, capturing the item dependencies in terms of their feature representations. This was not possible with earlier matrix completion based tools used for this problem. We also prove an information theoretic lower bound on the required sample complexity for recovering the underlying ranking, which essentially shows the tightness of our proposed algorithms. The efficacy of our proposed algorithms are validated through extensive experimental evaluations on a variety of synthetic and real world datasets. …