Yellowbrick google
Yellowbrick is a suite of visual diagnostic tools called ‘Visualizers’ that extend the scikit-learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines scikit-learn with matplotlib in the best tradition of the scikit-learn documentation, but to produce visualizations for your machine learning workflow! For complete documentation on the Yellowbrick API, a gallery of available visualizers, the contributor’s guide, tutorials and teaching resources, frequently asked questions, and more, please visit our documentation at http://www.scikit-yb.org. …

Holdout Randomization Test (HRT) google
We consider the problem of feature selection using black box predictive models. For example, high-throughput devices in science are routinely used to gather thousands of features for each sample in an experiment. The scientist must then sift through the many candidate features to find explanatory signals in the data, such as which genes are associated with sensitivity to a prospective therapy. Often, predictive models are used for this task: the model is fit, error on held out data is measured, and strong performing models are assumed to have discovered some fundamental properties of the system. A model-specific heuristic is then used to inspect the model parameters and rank important features, with top features reported as ‘discoveries.’ However, such heuristics provide no statistical guarantees and can produce unreliable results. We propose the holdout randomization test (HRT) as a principled approach to feature selection using black box predictive models. The HRT is similar to a permutation test, where each random reshuffling is a draw from the complete conditional distribution of the feature being tested. The HRT is model agnostic and produces a valid $p$-value for each feature, enabling control over the false discovery rate (or Type I error) for any predictive model. Further, the HRT is computationally efficient and, in simulations, has greater power than a competing knockoffs-based approach. Code is available at https://…/hrt.

Multi-hop Assortativity google
Several social, medical, engineering and biological challenges rely on discovering the functionality of networks from their structure and node metadata, when is available. For example, in chemoinformatics one might want to detect whether a molecule is toxic based on structure and atomic types, or discover the research field for scientific collaboration networks. Existing techniques rely on counting or measuring structural patterns that are known to show large variations from network to network, such as number of triangles, or the assortativity of node metadata. We introduce the concept of multi-hop assortativity, that captures the similarity of node situated at the extremities of a randomly selected path of a given length. We show that multi-hop assortativity unifies various existing concepts and offers a versatile family of fingerprints to characterize networks. These fingerprints allow in turn to recover the functionalities of a network, with the help of the machine learning toolbox. Our method is evaluated empirically on established social and chemoinformatic network benchmarks. Results reveal that our assortativity based features are competitive providing highly accurate results often outperforming state of the art methods for the network classification task …

RecurJac google
The Jacobian matrix (or the gradient for single-output networks) is directly related to many important properties of neural networks, such as the function landscape, stationary points, (local) Lipschitz constants and robustness to adversarial attacks. In this paper, we propose a recursive algorithm, RecurJac, to compute both upper and lower bounds for each element in the Jacobian matrix of a neural network with respect to network’s input, and the network can contain a wide range of activation functions. As a byproduct, we can efficiently obtain a (local) Lipschitz constant, which plays a crucial role in neural network robustness verification, as well as the training stability of GANs. Experiments show that (local) Lipschitz constants produced by our method is of better quality than previous approaches, thus providing better robustness verification results. Our algorithm has polynomial time complexity, and its computation time is reasonable even for relatively large networks. Additionally, we use our bounds of Jacobian matrix to characterize the landscape of the neural network, for example, to determine whether there exist stationary points in a local neighborhood. Source code available at https://…/RecurJac-Jacobian-Bounds.