Personalized Embedding Propagation (PEP)
Neural message passing algorithms for semi-supervised classification on graphs have recently achieved great success. However, these methods only consider nodes that are a few propagation steps away and the size of this utilized neighborhood cannot be easily extended. In this paper, we use the relationship between graph convolutional networks (GCN) and PageRank to derive an improved propagation scheme based on personalized PageRank. We utilize this propagation procedure to construct personalized embedding propagation (PEP) and its approximation, PEP$_\text{A}$. Our model’s training time is on par or faster and its number of parameters on par or lower than previous models. It leverages a large, adjustable neighborhood for classification and can be combined with any neural network. We show that this model outperforms several recently proposed methods for semi-supervised classification on multiple graphs in the most thorough study done so far for GCN-like models. …

EnsembleNet
Ensembling is a universally useful approach to boost the performance of machine learning models. However, individual models in an ensemble are typically trained independently in separate stages, without information access about the overall ensemble. In this paper, model ensembles are treated as first-class citizens, and their performance is optimized end-to-end with parameter sharing and a novel loss structure that improves generalization. On large-scale datasets including ImageNet, Youtube-8M, and Kinetics, we demonstrate a procedure that starts from a strongly performing single deep neural network, and constructs an EnsembleNet that has both a smaller size and better performance. Moreover, an EnsembleNet can be trained in one stage just like a single model without manual intervention. …

Digital Twin
Digital twin refers to a digital replica of physical assets, processes and systems that can be used for various purposes. The digital representation provides both the elements and the dynamics of how an Internet of Things device operates and lives throughout its life cycle. Digital Twins integrate artificial intelligence, machine learning and software analytics with data to create living digital Simulation models that update and change as their physical counterparts’ change. A digital twin continuously learns and updates itself from multiple sources to represent their near real-time status, working condition or position. This learning system, learns from itself, using sensor data that conveys various aspects of its operating condition; from human experts, such as engineers with deep and relevant industry domain knowledge; from other similar machines; from other similar fleets of machines; and from the larger systems and environment in which it may be a part of. A digital twin also integrates historical data from past machine usage to factor into its digital model. …

SciTokens
The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. In this paper, we introduce SciTokens, open source software to help scientists manage their security credentials more reliably and securely. We describe the SciTokens system architecture, design, and implementation addressing use cases from the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey Telescope (LSST) projects. We also present our integration with widely-used software that supports distributed scientific computing, including HTCondor, CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for capability-based secure access to remote scientific data. The access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems. …