**Towards Machine Learning Mathematical Induction**

Mathematical induction lies at the heart of mathematics and computer science. However, automated theorem proving of inductive problems is still limited in its power. In this abstract, we first summarize our progress in automating inductive theorem proving for Isabelle/HOL. Then, we present MiLkMaId, our approach to suggesting promising applications of mathematical induction without completing a proof search.

**seq2graph: Discovering Dynamic Dependencies from Multivariate Time Series with Multi-level Attention**

Discovering temporal lagged and inter-dependencies in multivariate time series data is an important task. However, in many real-world applications, such as commercial cloud management, manufacturing predictive maintenance, and portfolios performance analysis, such dependencies can be non-linear and time-variant, which makes it more challenging to extract such dependencies through traditional methods such as Granger causality or clustering. In this work, we present a novel deep learning model that uses multiple layers of customized gated recurrent units (GRUs) for discovering both time lagged behaviors as well as inter-timeseries dependencies in the form of directed weighted graphs. We introduce a key component of Dual-purpose recurrent neural network that decodes information in the temporal domain to discover lagged dependencies within each time series, and encodes them into a set of vectors which, collected from all component time series, form the informative inputs to discover inter-dependencies. Though the discovery of two types of dependencies are separated at different hierarchical levels, they are tightly connected and jointly trained in an end-to-end manner. With this joint training, learning of one type of dependency immediately impacts the learning of the other one, leading to overall accurate dependencies discovery. We empirically test our model on synthetic time series data in which the exact form of (non-linear) dependencies is known. We also evaluate its performance on two real-world applications, (i) performance monitoring data from a commercial cloud provider, which exhibit highly dynamic, non-linear, and volatile behavior and, (ii) sensor data from a manufacturing plant. We further show how our approach is able to capture these dependency behaviors via intuitive and interpretable dependency graphs and use them to generate highly accurate forecasts.

**Regression Based Bayesian Approach for Nonparanormal Graphical Models**

A nonparanormal graphical model is a semiparametric generalization of a Gaussian graphical model for continuous variables in which it is assumed that the variables follow a Gaussian graphical model only after some unknown smooth monotone transformations. We consider a Bayesian approach in the nonparanormal graphical model in which we put priors on the unknown transformations through random series based on B-splines. We use a regression formulation to construct the likelihood through the Cholesky decomposition on the underlying precision matrix of the transformed variables and put shrinkage priors on the regression coefcients. We apply a plug-in variational Bayesian algorithm for learning the sparse precision matrix and compare the performance to a posterior Gibbs sampling scheme in a simulation study. We finally apply the proposed methods to a real data set.

**Deep Program Reidentification: A Graph Neural Network Solution**

Program or process is an integral part of almost every IT/OT system. Can we trust the identity/ID (e.g., executable name) of the program? To avoid detection, malware may disguise itself using the ID of a legitimate program, and a system tool (e.g., PowerShell) used by the attackers may have the fake ID of another common software, which is less sensitive. However, existing intrusion detection techniques often overlook this critical program reidentification problem (i.e., checking the program’s identity). In this paper, we propose an attentional multi-channel graph neural network model (DeepRe-ID) to verify the program’s identity based on its system behaviors. The key idea is to leverage the representation learning of the program behavior graph to guide the reidentification process. We formulate the program reidentification as a graph classification problem and develop an effective multi-channel attentional graph embedding algorithm to solve it. Extensive experiments — using real-world enterprise monitoring data and real attacks — demonstrate the effectiveness of DeepRe-ID across multiple popular metrics and the robustness to the normal dynamic changes like program version upgrades.

**SIMD-X: Programming and Processing of Graph Algorithms on GPUs**

With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However, graph algorithms such as breadth-first search and k-core, often fail to take full advantage of GPUs, due to irregularity in memory access and control flow. To address this challenge, we have developed SIMD-X, for programming and processing of single instruction multiple, complex, data on GPUs. Specifically, the new Active-Compute-Combine (ACC) model not only provides ease of programming to programmers, but more importantly creates opportunities for system-level optimizations. To this end, SIMD-X utilizes just-in-time task management which filters out inactive vertices at runtime and intelligently maps various tasks to different amount of GPU cores in pursuit of workload balancing. In addition, SIMD-X leverages push-pull based kernel fusion that, with the help of a new deadlock-free global barrier, reduces a large number of computation kernels to very few. Using SIMD-X, a user can program a graph algorithm in tens of lines of code, while achieving 3?, 6?, 24?, 3? speedup over Gunrock, Galois, CuSha, and Ligra, respectively.

**Top-N-Rank: A Scalable List-wise Ranking Method for Recommender Systems**

We propose Top-N-Rank, a novel family of list-wise Learning-to-Rank models for reliably recommending the N top-ranked items. The proposed models optimize a variant of the widely used cumulative discounted gain (DCG) objective function which differs from DCG in two important aspects: (i) It limits the evaluation of DCG only on the top N items in the ranked lists, thereby eliminating the impact of low-ranked items on the learned ranking function; and (ii) it incorporates weights that allow the model to leverage multiple types of implicit feedback with differing levels of reliability or trustworthiness. Because the resulting objective function is non-smooth and hence challenging to optimize, we consider two smooth approximations of the objective function, using the traditional sigmoid function and the rectified linear unit (ReLU). We propose a family of learning-to-rank algorithms (Top-N-Rank) that work with any smooth objective function. Then, a more efficient variant, Top-N-Rank.ReLU, is introduced, which effectively exploits the properties of ReLU function to reduce the computational complexity of Top-N-Rank from quadratic to linear in the average number of items rated by users. The results of our experiments using two widely used benchmarks, namely, the MovieLens data set and the Amazon Video Games data set demonstrate that: (i) The `top-N truncation’ of the objective function substantially improves the ranking quality of the top N recommendations; (ii) using the ReLU for smoothing the objective function yields significant improvement in both ranking quality as well as runtime as compared to using the sigmoid; and (iii) Top-N-Rank.ReLU substantially outperforms the well-performing list-wise ranking methods in terms of ranking quality.

**Delta Embedding Learning**

Learning from corpus and learning from supervised NLP tasks both give useful semantics that can be incorporated into a good word representation. We propose an embedding learning method called Delta Embedding Learning, to learn semantic information from high-level supervised tasks like reading comprehension, and combine it with an unsupervised word embedding. The simple technique not only improved the performance of various supervised NLP tasks, but also simultaneously learns improved universal word embeddings out of these tasks.

**Deep Learning on Graphs: A Survey**

Deep learning has been shown successful in a number of domains, ranging from acoustics, images to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, a significant amount of research efforts have been devoted to this area, greatly advancing graph analyzing techniques. In this survey, we comprehensively review different kinds of deep learning methods applied to graphs. We divide existing methods into three main categories: semi-supervised methods including Graph Neural Networks and Graph Convolutional Networks, unsupervised methods including Graph Autoencoders, and recent advancements including Graph Recurrent Neural Networks and Graph Reinforcement Learning. We then provide a comprehensive overview of these methods in a systematic manner following their history of developments. We also analyze the differences of these methods and how to composite different architectures. Finally, we briefly outline their applications and discuss potential future directions.

**On the Dimensionality of Word Embedding**

In this paper, we provide a theoretical understanding of word embedding and its dimensionality. Motivated by the unitary-invariance of word embedding, we propose the Pairwise Inner Product (PIP) loss, a novel metric on the dissimilarity between word embeddings. Using techniques from matrix perturbation theory, we reveal a fundamental bias-variance trade-off in dimensionality selection for word embeddings. This bias-variance trade-off sheds light on many empirical observations which were previously unexplained, for example the existence of an optimal dimensionality. Moreover, new insights and discoveries, like when and how word embeddings are robust to over-fitting, are revealed. By optimizing over the bias-variance trade-off of the PIP loss, we can explicitly answer the open question of dimensionality selection for word embedding.

**Text data mining and data quality management for research information systems in the context of open data and open science**

In the implementation and use of research information systems (RIS) in scientific institutions, text data mining and semantic technologies are a key technology for the meaningful use of large amounts of data. It is not the collection of data that is difficult, but the further processing and integration of the data in RIS. Data is usually not uniformly formatted and structured, such as texts and tables that cannot be linked. These include various source systems with their different data formats such as project and publication databases, CERIF and RCD data model, etc. Internal and external data sources continue to develop. On the one hand, they must be constantly synchronized and the results of the data links checked. On the other hand, the texts must be processed in natural language and certain information extracted. Using text data mining, the quality of the metadata is analyzed and this identifies the entities and general keywords. So that the user is supported in the search for interesting research information. The information age makes it easier to store huge amounts of data and increase the number of documents on the internet, in institutions’ intranets, in newswires and blogs is overwhelming. Search engines should help to specifically open up these sources of information and make them usable for administrative and research purposes. Against this backdrop, the aim of this paper is to provide an overview of text data mining techniques and the management of successful data quality for RIS in the context of open data and open science in scientific institutions and libraries, as well as to provide ideas for their application. In particular, solutions for the RIS will be presented.

**Efficient Model-Free Reinforcement Learning Using Gaussian Process**

Efficient Reinforcement Learning usually takes advantage of demonstration or good exploration strategy. By applying posterior sampling in model-free RL under the hypothesis of GP, we propose Gaussian Process Posterior Sampling Reinforcement Learning(GPPSTD) algorithm in continuous state space, giving theoretical justifications and empirical results. We also provide theoretical and empirical results that various demonstration could lower expected uncertainty and benefit posterior sampling exploration. In this way, we combined the demonstration and exploration process together to achieve a more efficient reinforcement learning.

**DRONE: a Distributed gRaph cOmputiNg Engine**

Nowadays, in big data era, social networks, graph database, knowledge graph, electronic commerce and etc. demand efficient and scalable capability to process ever increasingly volume of graph-structured data. To meet the challenge, two mainstream distributed programming models, vertex-centric VC and subgraph-centric (SC) were proposed. Compared to the VC model, the SC model converges faster with less communication overhead on well-partitioned graphs, and is easy to program with due to the ‘think like a graph’ philosophy. However, edge-cut method causes significant performance bottleneck for preprocessing large graphs, especially power-law graphs. Although the edge-cut method is considered as a natural choice of subgraph-centric model for graph partitioning, and adopted by Giraph++, Blogel, GRAPE. Thus, the SC model is less competitive in practice. In this paper, we present an innovative distributed graph computing framework, DRONE(Distributed gRaph cOmputiNg Engine). It combines the subgraph-centric model and the vertex-cut graph partitioning strategy. Experiments show that DRONE outperform the state-of-art distributed graph computing engines on real-world graphs and synthetic power-law graphs. DRONE is capable to scale up to process one-trillion-edges synthetic power-law graphs, which is orders of magnitude larger than previously reported by existing SC-based frameworks.

**The Empusa code generator: bridging the gap between the intended and the actual content of RDF resources**

The RDF data model facilitates integration of diverse data available in structured and semi-structured formats. To obtain an RDF graph with a low amount of errors and internal redundancy, the chosen ontology must be consistently applied. However, with each addition of new diverse data the ontology must evolve thereby increasing its complexity, which could lead to accumulation of unintended erroneous composites. Thus, there is a need for a gatekeeping system that compares the intended content described in the ontology with the actual content of the resource. Here we present Empusa, a tool that has been developed to facilitate the creation of composite RDF resources from disparate sources. Empusa can be used to convert a schema into an associated application programming interface (API) that can be used to perform data consistency checks and generates Markdown documentation to make persistent URLs resolvable. In this way, the use of Empusa ensures consistency within and between the ontology (OWL), the Shape Expressions (ShEx) describing the graph structure, and the content of the resource.

**Deep Anomaly Detection with Outlier Exposure**

It is important to detect and handle anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data commonly used by deep learning systems are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This approach enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments in vision and natural language processing settings, we find that Outlier Exposure significantly improves the detection performance. Our approach is even applicable to density estimation models and anomaly detectors for large-scale images. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.

**Metrics for Explainable AI: Challenges and Prospects**

The question addressed in this paper is: If we present to a user an AI system that explains how it works, how do we know whether the explanation works and the user has achieved a pragmatic understanding of the AI? In other words, how do we know that an explanainable AI system (XAI) is any good? Our focus is on the key concepts of measurement. We discuss specific methods for evaluating: (1) the goodness of explanations, (2) whether users are satisfied by explanations, (3) how well users understand the AI systems, (4) how curiosity motivates the search for explanations, (5) whether the user’s trust and reliance on the AI are appropriate, and finally, (6) how the human-XAI work system performs. The recommendations we present derive from our integration of extensive research literatures and our own psychometric evaluations.