Gabor Convolutional Networks (GCNs,Gabor CNN) 
Steerable properties dominate the design of traditional filters, e.g., Gabor filters, and endow features the capability of dealing with spatial transformations. However, such excellent properties have not been well explored in the popular deep convolutional neural networks (DCNNs). In this paper, we propose a new deep model, termed Gabor Convolutional Networks (GCNs or Gabor CNNs), which incorporates Gabor filters into DCNNs to enhance the resistance of deep learned features to the orientation and scale changes. By only manipulating the basic element of DCNNs based on Gabor filters, i.e., the convolution operator, GCNs can be easily implemented and are compatible with any popular deep learning architecture. Experimental results demonstrate the super capability of our algorithm in recognizing objects, where the scale and rotation changes occur frequently. The proposed GCNs have much fewer learnable network parameters, and thus is easier to train with an endtoend pipeline. To encourage further developments, the source code is released at Github. 
GaborNet  The article describes a system for image recognition using deep convolutional neural networks. Modified network architecture is proposed that focuses on improving convergence and reducing training complexity. The filters in the first layer of the network are constrained to fit the Gabor function. The parameters of Gabor functions are learnable and are updated by standard backpropagation techniques. The system was implemented on Python, tested on several datasets and outperformed the common convolutional networks. 
Galam Model  Consider a community where initially, each individual is positive or negative regarding a reform proposal. In each round, individuals gather randomly in fixed rooms of different sizes, and all individuals in a room agree on the majority opinion in the room (with ties broken in favor of the negative opinion). The Galam model—introduced in statistical physics, specifically sociophysics—approximates this basic random process. Phase Transition in Democratic Opinion Dynamics 
Galaxy Learning  The recent rapid development of artificial intelligence (AI, mainly driven by machine learning research, especially deep learning) has achieved phenomenal success in various applications. However, to further apply AI technologies in realworld context, several significant issues regarding the AI ecosystem should be addressed. We identify the main issues as data privacy, ownership, and exchange, which are difficult to be solved with the current centralized paradigm of machine learning training methodology. As a result, we propose a novel model training paradigm based on blockchain, named Galaxy Learning, which aims to train a model with distributed data and to reserve the data ownership for their owners. In this new paradigm, encrypted models are moved around instead, and are federated once trained. Model training, as well as the communication, is achieved with blockchain and its smart contracts. Pricing of training data is determined by its contribution, and therefore it is not about the exchange of data ownership. In this position paper, we describe the motivation, paradigm, design, and challenges as well as opportunities of Galaxy Learning. 
GaleShapley Algorithm  GaleShapley Algorithm is a solution for the Stable Marriage Problem. In 1962, David Gale and Lloyd Shapley proved that, for any equal number of men and women, it is always possible to solve the SMP and make all marriages stable. They presented an algorithm to do so. The GaleShapley algorithm involves a number of ’rounds’ (or ‘iterations’). In the first round, first a) each unengaged man proposes to the woman he prefers most, and then b) each woman replies ‘maybe’ to her suitor she most prefers and ‘no’ to all other suitors. She is then provisionally ‘engaged’ to the suitor she most prefers so far, and that suitor is likewise provisionally engaged to her. In each subsequent round, first a) each unengaged man proposes to the mostpreferred woman to whom he has not yet proposed (regardless of whether the woman is already engaged), and then b) each woman replies ‘maybe’ to her suitor she most prefers (whether her existing provisional partner or someone else) and rejects the rest (again, perhaps including her current provisional partner). The provisional nature of engagements preserves the right of an alreadyengaged woman to ‘trade up’ (and, in the process, to ‘jilt’ her untilthen partner). The runtime complexity of this algorithm is O(n^2) where n is number of men or women. This algorithm guarantees that: · Everyone gets married: At the end, there cannot be a man and a woman both unengaged, as he must have proposed to her at some point (since a man will eventually propose to everyone, if necessary) and, being proposed to, she would necessarily be engaged (to someone) thereafter. · The marriages are stable: Let Alice be a woman and Bob be a man who are both engaged, but not to each other. Upon completion of the algorithm, it is not possible for both Alice and Bob to prefer each other over their current partners. If Bob prefers Alice to his current partner, he must have proposed to Alice before he proposed to his current partner. If Alice accepted his proposal, yet is not married to him at the end, she must have dumped him for someone she likes more, and therefore doesn’t like Bob more than her current partner. If Alice rejected his proposal, she was already with someone she liked more than Bob. ➘ “Stable Marriage Problem” matchingR 
Game Theory  Game theory is the study of strategic decision making. Specifically, it is ‘the study of mathematical models of conflict and cooperation between intelligent rational decisionmakers.’ An alternative term suggested ‘as a more descriptive name for the discipline’ is interactive decision theory. Game theory is mainly used in economics, political science, and psychology, as well as logic, computer science, and biology. The subject first addressed zerosum games, such that one person’s gains exactly equal net losses of the other participant or participants. Today, however, game theory applies to a wide range of behavioral relations, and has developed into an umbrella term for the logical side of decision science, including both humans and nonhumans (e.g. computers, animals). Modern game theory began with the idea regarding the existence of mixedstrategy equilibria in twoperson zerosum games and its proof by John von Neumann. Von Neumann’s original proof used Brouwer fixedpoint theorem on continuous mappings into compact convex sets, which became a standard method in game theory and mathematical economics. His paper was followed by the 1944 book Theory of Games and Economic Behavior, cowritten with Oskar Morgenstern, which considered cooperative games of several players. The second edition of this book provided an axiomatic theory of expected utility, which allowed mathematical statisticians and economists to treat decisionmaking under uncertainty. This theory was developed extensively in the 1950s by many scholars. Game theory was later explicitly applied to biology in the 1970s, although similar developments go back at least as far as the 1930s. Game theory has been widely recognized as an important tool in many fields. With the Nobel Memorial Prize in Economic Sciences going to game theorist Jean Tirole in 2014, eleven gametheorists have now won the economics Nobel Prize. John Maynard Smith was awarded the Crafoord Prize for his application of game theory to biology. 
Gamification  Gamification is the use of game thinking and game mechanics in nongame contexts to engage users in solving problems. Gamification has been studied and applied in several domains, such as to improve user engagement, physical exercise, return on investment, data quality, timeliness, and learning. A review of research on gamification shows that most studies on gamification find positive effects from gamification 
Gamma Divergence  The gammadivergence is a generalization of the KullbackLeibler divergence with the power index gamma. It employs the power transformation of density functions, instead of the logarithmic transformation employed by the KullbackLeibler divergence. rsggm 
GammaPoisson Shrinker (GPS) 
➘ “MultiItem Gamma Poisson Shrinker” openEBGM 
GAN Augmentation  One of the biggest issues facing the use of machine learning in medical imaging is the lack of availability of large, labelled datasets. The annotation of medical images is not only expensive and time consuming but also highly dependent on the availability of expert observers. The limited amount of training data can inhibit the performance of supervised machine learning algorithms which often need very large quantities of data on which to train to avoid overfitting. So far, much effort has been directed at extracting as much information as possible from what data is available. Generative Adversarial Networks (GANs) offer a novel way to unlock additional information from a dataset by generating synthetic samples with the appearance of real images. This paper demonstrates the feasibility of introducing GAN derived synthetic data to the training datasets in two brain segmentation tasks, leading to improvements in Dice Similarity Coefficient (DSC) of between 1 and 5 percentage points under different conditions, with the strongest effects seen fewer than ten training image stacks are available. 
GAN Dissection  Generative Adversarial Networks (GANs) have recently achieved impressive results for many realworld applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, visualization and understanding of GANs is largely missing. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit, object, and scenelevel. We first identify a group of interpretable units that are closely related to object concepts with a segmentationbased network dissection method. Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. Finally, we examine the contextual relationship between these units and their surrounding by inserting the discovered object concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifactcausing units, to interactively manipulating objects in the scene. We provide open source interpretation tools to help peer researchers and practitioners better understand their GAN models. 
GAN Expectation Maximization (GANEM) 
Expectation maximization (EM) algorithm is to find maximum likelihood solution for models having latent variables. A typical example is Gaussian Mixture Model (GMM) which requires Gaussian assumption, however, natural images are highly nonGaussian so that GMM cannot be applied to perform clustering task on pixel space. To overcome such limitation, we propose a GAN based EM learning framework that can maximize the likelihood of images and estimate the latent variables with only the constraint of LLipschitz continuity. We call this model GANEM, which is a framework for image clustering, semisupervised classification and dimensionality reduction. In Mstep, we design a novel loss function for discriminator of GAN to perform maximum likelihood estimation (MLE) on data with soft class label assignments. Specifically, a conditional generator captures data distribution for $K$ classes, and a discriminator tells whether a sample is real or fake for each class. Since our model is unsupervised, the class label of real data is regarded as latent variable, which is estimated by an additional network (Enet) in Estep. The proposed GANEM achieves stateoftheart clustering and semisupervised classification results on MNIST, SVHN and CelebA, as well as comparable quality of generated images to other recently developed generative models. 
GAN Qlearning  Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation. However, there are many different ways in which one can leverage the distributional approach to reinforcement learning. In this paper, we propose GAN Qlearning, a novel distributional RL method based on generative adversarial networks (GANs) and analyze its performance in simple tabular environments, as well as OpenAI Gym. We empirically show that our algorithm leverages the flexibility and blackbox approach of deep learning models while providing a viable alternative to other stateoftheart methods. 
GANBased Augmentation Technique (AugmentGAN) 
A natural language interface (NLI) to structured query is intriguing due to its wide industrial applications and high economical values. In this work, we tackle the problem of domain adaptation for NLI with limited data on target domain. Two important approaches are considered: (a) effective generalknowledgelearning on source domain semantic parsing, and (b) data augmentation on target domain. We present a Structured Query Inference Network (SQIN) to enhance learning for domain adaptation, by separating schema information from NL and decoding SQL in a more structuralaware manner; we also propose a GANbased augmentation technique (AugmentGAN) to mitigate the issue of lacking target domain data. We report solid results on GeoQuery, Overnight, and WikiSQL to demonstrate stateoftheart performances for both indomain and domaintransfer tasks. 
GanDef  Machine learning models, especially neural network (NN) classifiers, are widely used in many applications including natural language processing, computer vision and cybersecurity. They provide high accuracy under the assumption of attackfree scenarios. However, this assumption has been defied by the introduction of adversarial examples — carefully perturbed samples of input that are usually misclassified. Many researchers have tried to develop a defense against adversarial examples; however, we are still far from achieving that goal. In this paper, we design a Generative Adversarial Net (GAN) based adversarial training defense, dubbed GanDef, which utilizes a competition game to regulate the feature selection during the training. We analytically show that GanDef can train a classifier so it can defend against adversarial examples. Through extensive evaluation on different whitebox adversarial examples, the classifier trained by GanDef shows the same level of test accuracy as those trained by stateoftheart adversarial training defenses. More importantly, GanDefComb, a variant of GanDef, could utilize the discriminator to achieve a dynamic tradeoff between correctly classifying original and adversarial examples. As a result, it achieves the highest overall test accuracy when the ratio of adversarial examples exceeds 41.7%. 
GandhiWashington Method (GWM) 
Many investigations in empirical software engineering look at sequences of data resulting from development or management processes. In this paper, we propose an analytical approach called the GandhiWashington Method (GWM) to investigate the impact of recurring events in software projects. GWM takes an encoding of events and activities provided by a software analyst as input. It uses regular expressions to automatically condense and summarize information and infer treatments. Relating the treatments to the outcome through statistical tests, treatmentoutcome constructs are automatically mined from the data. The output of GWM is a set of treatmentoutcome constructs. Each treatment in the set of mined constructs is significantly different from the other treatments considering the impact on the outcome and/or is structurally different from other treatments considering the sequence of events. We describe GWM and classes of problems to which GWM can be applied. We demonstrate the applicability of this method for empirical studies on sequences of file editing, code ownership, and release cycle time. 
Gang of GANs  Traditional generative adversarial networks (GAN) and many of its variants are trained by minimizing the KL or JSdivergence loss that measures how close the generated data distribution is from the true data distribution. A recent advance called the WGAN based on Wasserstein distance can improve on the KL and JSdivergence based GANs, and alleviate the gradient vanishing, instability, and mode collapse issues that are common in the GAN training. In this work, we aim at improving on the WGAN by first generalizing its discriminator loss to a marginbased one, which leads to a better discriminator, and in turn a better generator, and then carrying out a progressive training paradigm involving multiple GANs to contribute to the maximum margin ranking loss so that the GAN at later stages will improve upon early stages. We call this method Gang of GANs (GoGAN). We have shown theoretically that the proposed GoGAN can reduce the gap between the true data distribution and the generated data distribution by at least half in an optimally trained WGAN. We have also proposed a new way of measuring GAN quality which is based on image completion tasks. We have evaluated our method on four visual datasets: CelebA, LSUN Bedroom, CIFAR10, and 50KSSFF, and have seen both visual and quantitative improvement over baseline WGAN. 
GANsfer Learning  Medical imaging is a domain which suffers from a paucity of manually annotated data for the training of learning algorithms. Manually delineating pathological regions at a pixel level is a time consuming process, especially in 3D images, and often requires the time of a trained expert. As a result, supervised machine learning solutions must make do with small amounts of labelled data, despite there often being additional unlabelled data available. Whilst of less value than labelled images, these unlabelled images can contain potentially useful information. In this paper we propose combining both labelled and unlabelled data within a GAN framework, before using the resulting network to produce images for use when training a segmentation network. We explore the task of deep grey matter multiclass segmentation in an AD dataset and show that the proposed method leads to a significant improvement in segmentation results, particularly in cases where the amount of labelled data is restricted. We show that this improvement is largely driven by a greater ability to segment the structures known to be the most affected by AD, thereby demonstrating the benefits of exposing the system to more examples of pathological anatomical variation. We also show how a shift in domain of the training data from young and healthy towards older and more pathological examples leads to better segmentations of the latter cases, and that this leads to a significant improvement in the ability for the computed segmentations to stratify cases of AD. 
GANtest  Generative adversarial networks (GANs) are one of the most popular methods for generating images today. While impressive results have been validated by visual inspection, a number of quantitative criteria have emerged only recently. We argue here that the existing ones are insufficient and need to be in adequation with the task at hand. In this paper we introduce two measures based on image classification—GANtrain and GANtest, which approximate the recall (diversity) and precision (quality of the image) of GANs respectively. We evaluate a number of recent GAN approaches based on these two measures and demonstrate a clear difference in performance. Furthermore, we observe that the increasing difficulty of the dataset, from CIFAR10 over CIFAR100 to ImageNet, shows an inverse correlation with the quality of the GANs, as clearly evident from our measures. 
GANtrain  Generative adversarial networks (GANs) are one of the most popular methods for generating images today. While impressive results have been validated by visual inspection, a number of quantitative criteria have emerged only recently. We argue here that the existing ones are insufficient and need to be in adequation with the task at hand. In this paper we introduce two measures based on image classification—GANtrain and GANtest, which approximate the recall (diversity) and precision (quality of the image) of GANs respectively. We evaluate a number of recent GAN approaches based on these two measures and demonstrate a clear difference in performance. Furthermore, we observe that the increasing difficulty of the dataset, from CIFAR10 over CIFAR100 to ImageNet, shows an inverse correlation with the quality of the GANs, as clearly evident from our measures. 
GAPNet  Exploiting finegrained semantic features on point cloud is still challenging due to its irregular and sparse structure in a nonEuclidean space. Among existing studies, PointNet provides an efficient and promising approach to learn shape features directly on unordered 3D point cloud and has achieved competitive performance. However, local feature that is helpful towards better contextual learning is not considered. Meanwhile, attention mechanism shows efficiency in capturing node representation on graphbased data by attending over neighboring nodes. In this paper, we propose a novel neural network for point cloud, dubbed GAPNet, to learn local geometric representations by embedding graph attention mechanism within stacked MultiLayerPerceptron (MLP) layers. Firstly, we introduce a GAPLayer to learn attention features for each point by highlighting different attention weights on neighborhood. Secondly, in order to exploit sufficient features, a multihead mechanism is employed to allow GAPLayer to aggregate different features from independent heads. Thirdly, we propose an attention pooling layer over neighbors to capture local signature aimed at enhancing network robustness. Finally, GAPNet applies stacked MLP layers to attention features and local signature to fully extract local geometric structures. The proposed GAPNet architecture is tested on the ModelNet40 and ShapeNet part datasets, and achieves stateoftheart performance in both shape classification and part segmentation tasks. 
GappedKmer Support Vector Machine  Oligomers of length k, or kmers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, kmers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific kmer becomes very small, and kmer counts approach a binary variable, with most kmers absent and a few present once. Thus, any statistical learning approach using kmers as features becomes susceptible to noisy training set kmer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped kmers, a new classifier, gkmSVM, and a general method for robust estimation of kmer frequencies. To make the method applicable to largescale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmerSVM and alternative approaches, our gkmSVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkmSVM consistently outperforms kmerSVM on human ENCODE ChIPseq datasets, and further demonstrate the general utility of our method using a NaiveBayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. gkmSVM 
Gas Station Problem  In the gas station problem we want to find the cheapest path between two vertices of an $n$vertex graph. Our car has a specific fuel capacity and at each vertex we can fill our car with gas, with the fuel cost depending on the vertex. 
Gated Attention Network (GaAN) 
We propose a new network architecture, Gated Attention Networks (GaAN), for learning on graphs. Unlike the traditional multihead attention mechanism, which equally consumes all attention heads, GaAN uses a convolutional subnetwork to control each attention head’s importance. We demonstrate the effectiveness of GaAN on the inductive node classification problem. Moreover, with GaAN as a building block, we construct the Graph Gated Recurrent Unit (GGRU) to address the traffic speed forecasting problem. Extensive experiments on three realworld datasets show that our GaAN framework achieves stateoftheart results on both tasks. 
Gated Fully Fusion (GFF) 
Semantic segmentation generates comprehensive understanding of scenes at a semantic level through densely predicting the category for each pixel. Highlevel features from Deep Convolutional Neural Networks already demonstrate their effectiveness in semantic segmentation tasks, however the coarse resolution of highlevel features often leads to inferior results for small/thin objects where detailed information is important but missing. It is natural to consider importing low level features to compensate the lost detailed information in high level representations. Unfortunately, simply combining multilevel features is less effective due to the semantic gap existing among them. In this paper, we propose a new architecture, named Gated Fully Fusion(GFF), to selectively fuse features from multiple levels using gates in a fully connected way. Specifically, features at each level are enhanced by higherlevel features with stronger semantics and lowerlevel features with more details, and gates are used to control the propagation of useful information which significantly reduces the noises during fusion. We achieve the state of the art results on two challenging scene understanding datasets, i.e., 82.3% mIoU on Cityscapes test set and 45.3% mIoU on ADE20K validation set. Codes and the trained models will be made publicly available. 
Gated Graph Neural Network  GraphtoSequence Learning using Gated Graph Neural Networks 
Gated Linear Network  This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss. Rather than relying on nonlinear transfer functions, our method gains representational power by the use of data conditioning. We state under general conditions a learnable capacity theorem that shows this approach can in principle learn any bounded Borelmeasurable function on a compact subset of euclidean space; the result is stronger than many universality results for connectionist architectures because we provide both the model and the learning procedure for which convergence is guaranteed. 
Gated Path Planning Network  Value Iteration Networks (VINs) are effective differentiable path planning modules that can be used by agents to perform navigation while still maintaining endtoend differentiability of the entire architecture. Despite their effectiveness, they suffer from several disadvantages including training instability, random seed sensitivity, and other optimization problems. In this work, we reframe VINs as recurrentconvolutional networks which demonstrates that VINs couple recurrent convolutions with an unconventional maxpooling activation. From this perspective, we argue that standard gated recurrent update equations could potentially alleviate the optimization issues plaguing VIN. The resulting architecture, which we call the Gated Path Planning Network, is shown to empirically outperform VIN on a variety of metrics such as learning speed, hyperparameter sensitivity, iteration count, and even generalization. Furthermore, we show that this performance gap is consistent across different maze transition types, maze sizes and even show success on a challenging 3D environment, where the planner is only provided with firstperson RGB images. 
Gated Recurrent Neural Tensor Network  Recurrent Neural Networks (RNNs), which are a powerful scheme for modeling temporal and sequential data need to capture longterm dependencies on datasets and represent them in hidden layers with a powerful model to capture more information from inputs. For modeling longterm dependencies in a dataset, the gating mechanism concept can help RNNs remember and forget previous information. Representing the hidden layers of an RNN with more expressive operations (i.e., tensor products) helps it learn a more complex relationship between the current input and the previous hidden layer information. These ideas can generally improve RNN performances. In this paper, we proposed a novel RNN architecture that combine the concepts of gating mechanism and the tensor product into a single model. By combining these two concepts into a single RNN, our proposed models learn longterm dependencies by modeling with gating units and obtain more expressive and direct interaction between input and hidden layers using a tensor product on 3dimensional array (tensor) weight parameters. We use Long Short Term Memory (LSTM) RNN and Gated Recurrent Unit (GRU) RNN and combine them with a tensor product inside their formulations. Our proposed RNNs, which are called a LongShort Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product. We conducted experiments with our proposed models on wordlevel and characterlevel language modeling tasks and revealed that our proposed models significantly improved their performance compared to our baseline models. 
Gated Recurrent Unit (GRU) 
Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long shortterm memory (LSTM). However, GRUs have been shown to exhibit better performance on smaller datasets. They have fewer parameters than LSTM, as they lack an output gate. 
Gated Transfer Network (GTN) 
Deep neural networks have led to a series of breakthroughs in computer vision given sufficient annotated training datasets. For novel tasks with limited labeled data, the prevalent approach is to transfer the knowledge learned in the pretrained models to the new tasks by finetuning. Classic model finetuning utilizes the fact that well trained neural networks appear to learn cross domain features. These features are treated equally during transfer learning. In this paper, we explore the impact of feature selection in model finetuning by introducing a transfer module, which assigns weights to features extracted from pretrained models. The proposed transfer module proves the importance of feature selection for transferring models from source to target domains. It is shown to significantly improve upon finetuning results with only marginal extra computational cost. We also incorporate an auxiliary classifier as an extra regularizer to avoid overfitting. Finally, we build a Gated Transfer Network (GTN) based on our transfer module and achieve stateoftheart results on six different tasks. 
GaterNet  The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing. In this paper, we investigate inputdependent dynamic filter selection in deep convolutional neural networks (CNNs). The problem is interesting because the idea of forcing different parts of the model to learn from different types of samples may help us acquire better filters in CNNs, improve the model generalization performance and potentially increase the interpretability of model behavior. We propose a novel yet simple framework called GaterNet, which involves a backbone and a gater network. The backbone network is a regular CNN that performs the major computation needed for making a prediction, while a global gater network is introduced to generate binary gates for selectively activating filters in the backbone network based on each input. Extensive experiments on CIFAR and ImageNet datasets show that our models consistently outperform the original models with a large margin. On CIFAR10, our model also improves upon stateoftheart results. 
GatherExcite  While the use of bottomup local operators in convolutional neural networks (CNNs) matches well some of the statistics of natural images, it may also prevent such models from capturing contextual longrange feature interactions. In this work, we propose a simple, lightweight approach for better context exploitation in CNNs. We do so by introducing a pair of operators: gather, which efficiently aggregates feature responses from a large spatial extent, and excite, which redistributes the pooled information to local features. The operators are cheap, both in terms of number of added parameters and computational complexity, and can be integrated directly in existing architectures to improve their performance. Experiments on several datasets show that gatherexcite can bring benefits comparable to increasing the depth of a CNN at a fraction of the cost. For example, we find ResNet50 with gatherexcite operators is able to outperform its 101layer counterpart on ImageNet with no additional learnable parameters. We also propose a parametric gatherexcite operator pair which yields further performance gains, relate it to the recentlyintroduced SqueezeandExcitation Networks, and analyse the effects of these changes to the CNN feature activation statistics. 
GaussDB  Huawei GaussDB is an enterpriseclass AINative distributed database that uses the massively Parallel Processing (MPP) architecture. GaussDB supports both row and columnoriented storage and is capable of processing petabytes of data. GaussDB offers a costefficient, generalpurpose computing platform to manage massive data sets and is compatible with a wide range of data warehousing systems, Business Intelligence (BIs) systems, and Decision Support Systems (DSSs). Huawei GaussDB integrates AI technology into the database kernel architecture and algorithms, providing users distributed databases with higher performance, higher availability, and more diverse computing power. 
Gaussian Differential Privacy  Differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy in the past decade. But it also has some well known weaknesses: notably, it does not tightly handle composition. This weakness has inspired several recent relaxations of differential privacy based on Renyi divergences. We propose an alternative relaxation of differential privacy, which we term ‘$f$differential privacy’, which has a number of appealing properties and avoids some of the difficulties associated with divergence based relaxations. First, it preserves the hypothesis testing interpretation of differential privacy, which makes its guarantees easily interpretable. It allows for lossless reasoning about composition and postprocessing, and notably, a direct way to import existing tools from differential privacy, including privacy amplification by subsampling. We define a canonical single parameter family of definitions within our class which we call ‘Gaussian Differential Privacy’, defined based on the hypothesis testing of two shifted Gaussian distributions. We show that this family is focal by proving a central limit theorem, which shows that the privacy guarantees of \emph{any} hypothesistesting based definition of privacy (including differential privacy) converges to Gaussian differential privacy in the limit under composition. We also prove a finite (BerryEsseen style) version of the central limit theorem, which gives a useful tool for tractably analyzing the exact composition of potentially complicated expressions. We demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent. 
Gaussian Graphical Model (GGM) 
A Gaussian graphical model is a graph in which all random variables are continuous and jointly Gaussian. This model corresponds to the multivariate normal distribution for N variables. Conditional independence in a Gaussian graphical model is simply reflected in the zero entries of the precision matrix. MGL 
Gaussian image entropy and piecewise stationary time series analysis (SPEV) 
Visionbased methods for visibility estimation can play a critical role in reducing traffic accidents caused by fog and haze. To overcome the disadvantages of current visibility estimation methods, we present a novel datadriven approach based on Gaussian image entropy and piecewise stationary time series analysis (SPEV). This is the first time that Gaussian image entropy is used for estimating atmospheric visibility. To lessen the impact of landscape and sunshine illuminance on visibility estimation, we used region of interest (ROI) analysis and took into account relative ratios of image entropy, to improve estimation accuracy. We assume fog and haze cause blurred images and that fog and haze can be considered as a piecewise stationary signal. We used piecewise stationary time series analysis to construct the piecewise causal relationship between image entropy and visibility. To obtain a realworld visibility measure during fog and haze, a subjective assessment was established through a study with 36 subjects who performed visibility observations. Finally, a total of two million videos were used for training the SPEV model and validate its effectiveness. The videos were collected from the constantly foggy and hazy Tongqi expressway in Jiangsu, China. The contrast model of visibility estimation was used for algorithm performance comparison, and the validation results of the SPEV model were encouraging as 99.14% of the relative errors were less than 10%. 
Gaussian Markov Random Field (GMRF) 
➘ “Markov Random Field” http://…/GMRFbook 
Gaussian Means (GMeans) 
The Gmeans algorithm starts with a small number of kmeans centers, and grows the number of centers. Each iteration of the algorithm splits into two those centers whose data appear not to come from a Gaussian distribution. Between each round of splitting, we run kmeans on the entire dataset and all the centers to refine the current solution. We can initialize with just k = 1, or we can choose some larger value of k if we have some prior knowledge about the range of k. 
Gaussian Mixture GAN (GMGAN) 
Generative Adversarial Networks (GANs) have been shown to produce realistically looking synthetic images with remarkable success, yet their performance seems less impressive when the training set is highly diverse. In order to provide a better fit to the target data distribution when the dataset includes many different classes, we propose a variant of the basic GAN model, called Gaussian Mixture GAN (GMGAN), where the probability distribution over the latent space is a mixture of Gaussians. We also propose a supervised variant which is capable of conditional sample synthesis. In order to evaluate the model’s performance, we propose a new scoring method which separately takes into account two (typically conflicting) measures – diversity vs. quality of the generated data. Through a series of empirical experiments, using both synthetic and realworld datasets, we quantitatively show that GMGANs outperform baselines, both when evaluated using the commonly used Inception Score, and when evaluated using our own alternative scoring method. In addition, we qualitatively demonstrate how the \textit{unsupervised} variant of GMGAN tends to map latent vectors sampled from different Gaussians in the latent space to samples of different classes in the data space. We show how this phenomenon can be exploited for the task of unsupervised clustering, and provide quantitative evaluation showing the superiority of our method for the unsupervised clustering of image datasets. Finally, we demonstrate a feature which further sets our model apart from other GAN models: the option to control the qualitydiversity tradeoff by altering, posttraining, the probability distribution of the latent space. This allows one to sample higher quality and lower diversity samples, or vice versa, according to one’s needs. 
Gaussian Mixture Model (GMM) 
A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. GMMs are commonly used as a parametric model of the probability distribution of continuous measurements or features in a biometric system, such as vocaltract related spectral features in a speaker recognition system. GMM parameters are estimated from training data using the iterative ExpectationMaximization (EM) algorithm or Maximum A Posteriori (MAP) estimation from a welltrained prior model. AdaptGauss 
Gaussian Multivariance  ➘ “Total Distance Multivariance” 
Gaussian Naive Bayes  When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a Gaussian distribution. For example, suppose the training data contain a continuous attribute, x. We first segment the data by the class, and then compute the mean and variance of x in each class. 
Gaussian Process (GP) 
In probability theory and statistics, a Gaussian process is a stochastic process whose realizations consist of random values associated with every point in a range of times (or of space) such that each such random variable has a normal distribution. Moreover, every finite collection of those random variables has a multivariate normal distribution. The concept of Gaussian processes is named after Carl Friedrich Gauss because it is based on the notion of the normal distribution which is often called the Gaussian distribution. In fact, one way of thinking of a Gaussian process is as an infinitedimensional generalization of the multivariate normal distribution. 
Gaussian Process Autoregressive Regression Model (GPAR) 
Multioutput regression models must exploit dependencies between outputs to maximise predictive performance. The application of Gaussian processes (GPs) to this setting typically yields models that are computationally demanding and have limited representational power. We present the Gaussian Process Autoregressive Regression (GPAR) model, a scalable multioutput GP model that is able to capture nonlinear, possibly inputvarying, dependencies between outputs in a simple and tractable way: the product rule is used to decompose the joint distribution over the outputs into a set of conditionals, each of which is modelled by a standard GP. GPAR’s efficacy is demonstrated on a variety of synthetic and realworld problems, outperforming existing GP models and achieving stateoftheart performance on the tasks with existing benchmarks. 
Gaussian Process Coach (GPC) 
Learning from human feedback is a viable alternative to control design that does not require modelling or control expertise. Particularly, learning from corrective advice garners advantages over evaluative feedback as it is a more intuitive and scalable format. The current stateoftheart in this field, COACH, has proven to be a effective approach for confined problems. However, it parameterizes the policy with Radial Basis Function networks, which require meticulous feature space engineering for higher order systems. We introduce Gaussian Process Coach (GPC), where feature space engineering is avoided by employing Gaussian Processes. In addition, we use the available policy uncertainty to 1) inquire feedback samples of maximal utility and 2) to adapt the learning rate to the teacher’s learning phase. We demonstrate that the novel algorithm outperforms the current stateoftheart in final performance, convergence rate and robustness to erroneous feedback in OpenAI Gym continuous control benchmarks, both for simulated and real human teachers. 
Gaussian Process Latent Variable Alignment Learning  We present a model that can automatically learn alignments between highdimensional data in an unsupervised manner. Learning alignments is an illconstrained problem as there are many different ways of defining a good alignment. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. We derive a probabilistic model built on nonparametric priors that allows for flexible warps while at the same time providing means to specify interpretable constraints. We show results on several datasets, including different motion capture sequences and show that the suggested model outperform the classical algorithmic approaches to the alignment task. 
Gaussian Process Partially Observable Model (GPPOM) 
The inference of the causal relationship between a pair of observed variables is a fundamental problem in science, and most existing approaches are based on one single causal model. In practice, however, observations are often collected from multiple sources with heterogeneous causal models due to certain uncontrollable factors, which renders causal analysis results obtained by a single model skeptical. In this paper, we generalize the Additive Noise Model (ANM) to a mixture model, which consists of a finite number of ANMs, and provide the condition of its causal identifiability. To conduct model estimation, we propose Gaussian Process Partially Observable Model (GPPOM), and incorporate independence enforcement into it to learn latent parameter associated with each observation. Causal inference and clustering according to the underlying generating mechanisms of the mixture model are addressed in this work. Experiments on synthetic and real data demonstrate the effectiveness of our proposed approach. 
Gaussian Process Posterior Sampling Reinforcement Learning (GPPSTD) 
Efficient Reinforcement Learning usually takes advantage of demonstration or good exploration strategy. By applying posterior sampling in modelfree RL under the hypothesis of GP, we propose Gaussian Process Posterior Sampling Reinforcement Learning(GPPSTD) algorithm in continuous state space, giving theoretical justifications and empirical results. We also provide theoretical and empirical results that various demonstration could lower expected uncertainty and benefit posterior sampling exploration. In this way, we combined the demonstration and exploration process together to achieve a more efficient reinforcement learning. 
Gaussian Process Regression (GPR) 
Gaussian process regression (GPR) is an even finer approach than this. Rather than claiming f(x) relates to some specific models (e.g. f(x)=mx+c), a Gaussian process can represent f(x) obliquely, but rigorously, by letting the data ‘speak’ more clearly for themselves. GPR is still a form of supervised learning, but the training data are harnessed in a subtler way. As such, GPR is a less ‘parametric’ tool. However, it’s not completely freeform, and if we’re unwilling to make even basic assumptions about f(x), then more general techniques should be considered, including those underpinned by the principle of maximum entropy; Chapter 6 of Sivia and Skilling (2006) offers an introduction. nsgp 
GaussianInduced Convolution (GIC) 
Learning representation on graph plays a crucial role in numerous tasks of pattern recognition. Different from gridshaped images/videos, on which local convolution kernels can be lattices, however, graphs are fully coordinatefree on vertices and edges. In this work, we propose a Gaussianinduced convolution (GIC) framework to conduct local convolution filtering on irregular graphs. Specifically, an edgeinduced Gaussian mixture model is designed to encode variations of subgraph region by integrating edge information into weighted Gaussian models, each of which implicitly characterizes one component of subgraph variations. In order to coarsen a graph, we derive a vertexinduced Gaussian mixture model to cluster vertices dynamically according to the connection of edges, which is approximately equivalent to the weighted graph cut. We conduct our multilayer graph convolution network on several public datasets of graph classification. The extensive experiments demonstrate that our GIC is effective and can achieve the stateoftheart results. 
GaussMarkov Theorem  In statistics, the GaussMarkov theorem, named after Carl Friedrich Gauss and Andrey Markov, states that in a linear regression model in which the errors have expectation zero and are uncorrelated and have equal variances, the best linear unbiased estimator (BLUE) of the coefficients is given by the ordinary least squares (OLS) estimator. Here ‘best’ means giving the lowest variance of the estimate, as compared to other unbiased, linear estimators. The errors don’t need to be normal, nor do they need to be independent and identically distributed (only uncorrelated and homoscedastic). The hypothesis that the estimator be unbiased cannot be dropped, since otherwise estimators better than OLS exist. See for examples the JamesStein estimator (which also drops linearity) or ridge regression. 
Gauss–Newton Algorithm (GNA) 
The GaussNewton algorithm is a method used to solve nonlinear least squares problems. It is a modification of Newton’s method for finding a minimum of a function. Unlike Newton’s method, the GaussNewton algorithm can only be used to minimize a sum of squared function values, but it has the advantage that second derivatives, which can be challenging to compute, are not required. Nonlinear least squares problems arise for instance in nonlinear regression, where parameters in a model are sought such that the model is in good agreement with available observations. 
gcForest  In this paper, we propose gcForest, a decision tree ensemble approach with performance highly competitive to deep neural networks. In contrast to deep neural networks which require great effort in hyperparameter tuning, gcForest is much easier to train. Actually, even when gcForest is applied to different data from different domains, excellent performance can be achieved by almost same settings of hyperparameters. The training process of gcForest is efficient and scalable. In our experiments its training time running on a PC is comparable to that of deep neural networks running with GPU facilities, and the efficiency advantage may be more apparent because gcForest is naturally apt to parallel implementation. Furthermore, in contrast to deep neural networks which require largescale training data, gcForest can work well even when there are only smallscale training data. Moreover, as a treebased approach, gcForest should be easier for theoretical analysis than deep neural networks. 
GCN with Link Attributes and Sampling Estimation (GCNLASE) 
Graph Convolutional Networks (GCNs) have proved to be a most powerful architecture in aggregating local neighborhood information for individual graph nodes. Lowrank proximities and node features are successfully leveraged in existing GCNs, however, attributes that graph links may carry are commonly ignored, as almost all of these models simplify graph links into binary or scalar values describing node connectedness. In our paper instead, links are reverted to hypostatic relationships between entities with descriptional attributes. We propose GCNLASE (GCN with Link Attributes and Sampling Estimation), a novel GCN model taking both node and link attributes as inputs. To adequately captures the interactions between link and node attributes, their tensor product is used as neighbor features, based on which we define several graph kernels and further develop according architectures for LASE. Besides, to accelerate the training process, the sum of features in entire neighborhoods are estimated through Monte Carlo method, with novel sampling strategies designed for LASE to minimize the estimation variance. Our experiments show that LASE outperforms strong baselines over various graph datasets, and further experiments corroborate the informativeness of link attributes and our model’s ability of adequately leveraging them. 
GCNv2  In this paper, we present a deep learningbased network, GCNv2, for generation of keypoints and descriptors. GCNv2 is built on our previous method, GCN, a network trained for 3D projective geometry. GCNv2 is designed with a binary descriptor vector as the ORB feature so that it can easily replace ORB in systems such as ORBSLAM. GCNv2 significantly improves the computational efficiency over GCN that was only able to run on desktop hardware. We show how a modified version of ORBSLAM using GCNv2 features runs on a Jetson TX2, an embdded lowpower platform. Experimental results show that GCNv2 retains almost the same accuracy as GCN and that it is robust enough to use for control of a flying drone. 
GCOMB  In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Qlearning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks. 
Gear Training  The training of Deep Neural Networks usually needs tremendous computing resources. Therefore many deep models are trained in large cluster instead of single machine or GPU. Though major researchs at present try to run whole model on all machines by using asynchronous asynchronous stochastic gradient descent (ASGD), we present a new approach to train deep model parallely — split the model and then seperately train different parts of it in different speed. 
Gelly  Gelly is a Java Graph API for Flink. It contains a set of methods and utilities which aim to simplify the development of graph analysis applications in Flink. In Gelly, graphs can be transformed and modified using highlevel functions similar to the ones provided by the batch processing API. Gelly provides methods to create, transform and modify graphs, as well as a library of graph algorithms. ➚ “Apache Flink” Research and Development Roadmap for Flink Gelly 
GEMRank  Recently, word embedding algorithms have been applied to map the entities of recommender systems, such as users and items, to new feature spaces using textual elementcontext relations among them. Unlike many other domains, this approach has not achieved a desired performance in collaborative filtering problems, probably due to unavailability of appropriate textual data. In this paper we propose a new recommendation framework, called GEMRank that can be applied when the useritem matrix is the sole available souce of information. It uses the concept of profile cooccurrence for defining relations among entities and applies a factorization method for embedding the users and items. GEMRank then feeds the extracted representations to a neural network model to predict useritem like/dislike relations which the final recommendations are made based on. We evaluated GEMRank in an extensive set of experiments against state of the art recommendation methods. The results show that GEMRank significantly outperforms the baseline algorithms in a variety of data sets with different degrees of density. 
GenAttack  Deep neural networks (DNNs) are vulnerable to adversarial examples, even in the blackbox case, where the attacker is limited to solely query access. Existing blackbox approaches to generating adversarial examples typically require a significant amount of queries, either for training a substitute network or estimating gradients from the output scores. We introduce GenAttack, a gradientfree optimization technique which uses genetic algorithms for synthesizing adversarial examples in the blackbox setting. Our experiments on the MNIST, CIFAR10, and ImageNet datasets show that GenAttack can successfully generate visually imperceptible adversarial examples against stateoftheart image recognition models with orders of magnitude fewer queries than existing approaches. For example, in our CIFAR10 experiments, GenAttack required roughly 2,568 times less queries than the current stateoftheart blackbox attack. Furthermore, we show that GenAttack can successfully attack both the stateoftheart ImageNet defense, ensemble adversarial training, and nondifferentiable, randomized input transformation defenses. GenAttack’s success against ensemble adversarial training demonstrates that its query efficiency enables it to exploit the defense’s weakness to direct blackbox attacks. GenAttack’s success against nondifferentiable input transformations indicates that its gradientfree nature enables it to be applicable against defenses which perform gradient masking/obfuscation to confuse the attacker. Our results suggest that populationbased optimization opens up a promising area of research into effective gradientfree blackbox attacks. 
Genepool Optimal Mixing Evolutionary Algorithm  GOMEA belongs to the class of ModelBased EAs (MBEAs) and focuses particularly on efficiently learning and exploiting socalled linkage models that describe dependencies between the variables that are used to encode solutions to the optimization problem at hand. Modelbased Genetic Programming with GOMEA for Symbolic Regression of Small Expressions 
General Algorithmic Search (GAS) 
In this paper we present a metaheuristic for global optimization called General Algorithmic Search (GAS). Specifically, GAS is a stochastic, singleobjective method that evolves a swarm of agents in search of a global extremum. Numerical simulations with a sample of 31 test functions show that GAS outperforms Basin Hopping, Cuckoo Search, and Differential Evolution, especially in concurrent optimization, i.e., when several runs with different initial settings are executed and the first best wins. Python codes of all algorithms and complementary information are available online. 
General Architecture for Text Engineering (GATE) 
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including information extraction in many languages. GATE has been compared to NLTK, R and RapidMiner. As well as being widely used in its own right, it forms the basis of the KIM semantic platform. GATE community and research has been involved in several European research projects including TAO, SEKT, NeOn, MediaCampaign, Musing, ServiceFinder, LIRICS and KnowledgeWeb, as well as many other projects. As of May 28, 2011, 881 people are on the gateusers mailing list at SourceForge.net, and 111,932 downloads from SourceForge are recorded since the project moved to SourceForge in 2005. The paper “GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications” has received over 800 citations in the seven years since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide, include “Building Search Applications: Lucene, LingPipe, and Gate”, by Manu Konchady, and “Introduction to Linguistic Annotation and Text Analytics”, by Graham Wilcock. 
General Filter Convolutional Neural Network (GFNN) 
We applied predefined kernels also known as filters or masks developed for image processing to convolution neural network. Instead of letting neural networks find its own kernels, we used 41 different generalpurpose kernels of blurring, edge detecting, sharpening, discrete cosine transformation, etc. for the first layer of the convolution neural networks. This architecture, thus named as general filter convolutional neural network (GFNN), can reduce training time by 30% with a better accuracy compared to the regular convolutional neural network (CNN). GFNN also can be trained to achieve 90% accuracy with only 500 samples. Furthermore, even though these kernels are not specialized for the MNIST dataset, we achieved 99.56% accuracy without ensemble nor any other special algorithms. 
General Graph Representation Learning Framework (DeepGL) 
This paper presents a general graph representation learning framework called DeepGL for learning deep node and edge representations from large (attributed) graphs. In particular, DeepGL begins by deriving a set of base features (e.g., graphlet features) and automatically learns a multilayered hierarchical graph representation where each successive layer leverages the output from the previous layer to learn features of a higherorder. Contrary to previous work, DeepGL learns relational functions (each representing a feature) that generalize acrossnetworks and therefore useful for graphbased transfer learning tasks. Moreover, DeepGL naturally supports attributed graphs, learns interpretable features, and is spaceefficient (by learning sparse feature vectors). In addition, DeepGL is expressive, flexible with many interchangeable components, efficient with a time complexity of $\mathcal{O}(E)$, and scalable for large networks via an efficient parallel implementation. Compared with the stateoftheart method, DeepGL is (1) effective for acrossnetwork transfer learning tasks and attributed graph representation learning, (2) spaceefficient requiring up to 6x less memory, (3) fast with up to 182x speedup in runtime performance, and (4) accurate with an average improvement of 20% or more on many learning tasks. 
General Language Understanding Evaluation Benchmark (GLUE) 
For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is modelagnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a handcrafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multitask and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems. 
General Likelihood Uncertainty Estimation (GLUE) 
The GLUE methodology (Beven and Binley 1992) rejects the idea of one single optimal solution and adopts the concept of equifinality of models, parameters and variables (Beven and Binley 1992; Beven 1993). Equifinality originates from the imperfect knowledge of the system under consideration, and many sets of models, parameters and variables may therefore be considered equal or almost equal simulators of the system. Using the GLUE analysis, the prior set of models, parameters and variables is divided into a set of nonacceptable solutions and a set of acceptable solutions. The GLUE methodology deals with the variable degree of membership of the sets. The degree of membership is determined by assessing the extent to which solutions fit the model, which in turn is determined by subjective likelihood functions. RGLUEANN 
Generalised Method of Codifferential Descent (GMCD) 
This paper is devoted to a detailed convergence analysis of the method of codifferential descent (MCD) developed by professor V.F. Demyanov for solving a large class of nonsmooth nonconvex optimization problems. We propose a generalization of the MCD that is more suitable for applications than the original method, and that utilizes only a part of a codifferential on every iteration, which allows one to reduce the overall complexity of the method. With the use of some general results on uniformly codifferentiable functions obtained in this paper, we prove the global convergence of the generalized MCD in the infinite dimensional case. Also, we propose and analyse a quadratic regularization of the MCD, which is the first general method for minimizing a codifferentiable function over a convex set. Apart from convergence analysis, we also discuss the robustness of the MCD with respect to computational errors, possible step size rules, and a choice of parameters of the algorithm. In the end of the paper we estimate a rate of convergence of the MCD for a class of nonsmooth nonconvex functions that arises, in particular, in cluster analysis. We prove that under some general assumptions the method converges with linear rate, and it convergence quadratically, provided a certain first order sufficient optimality condition holds true. 
Generalizable Approaching Policy LEarning (GAPLE) 
We study the problem of learning a generalizable action policy for an intelligent agent to actively approach an object of interest in indoor environment solely from its visual inputs. While scenedriven or recognitiondriven visual navigation has been widely studied, prior efforts suffer severely from the limited generalization capability. In this paper, we first argue the object searching task is environment dependent while the approaching ability is general. To learn a generalizable approaching policy, we present a novel solution dubbed as GAPLE which adopts two channels of visual features: depth and semantic segmentation, as the inputs to the policy learning module. The empirical studies conducted on the House3D dataset as well as on a physical platform in a real world scenario validate our hypothesis, and we further provide indepth qualitative analysis. 
Generalizable Approximate Graph Partitioning Framework (GAP) 
Graph partitioning is the problem of dividing the nodes of a graph into balanced partitions while minimizing the edge cut across the partitions. Due to its combinatorial nature, many approximate solutions have been developed, including variants of multilevel methods and spectral clustering. We propose GAP, a Generalizable Approximate Partitioning framework that takes a deep learning approach to graph partitioning. We define a differentiable loss function that represents the partitioning objective and use backpropagation to optimize the network parameters. Unlike baselines that redo the optimization per graph, GAP is capable of generalization, allowing us to train models that produce performant partitions at inference time, even on unseen graphs. Furthermore, because we learn the representation of the graph while jointly optimizing for the partitioning loss function, GAP can be easily tuned for a variety of graph structures. We evaluate the performance of GAP on graphs of varying sizes and structures, including graphs of widely used machine learning models (e.g., ResNet, VGG, and InceptionV3), scalefree graphs, and random graphs. We show that GAP achieves competitive partitions while being up to 100 times faster than the baseline and generalizes to unseen graphs. 
Generalization Error  The generalization error of a machine learning model is a function that measures how well a learning machine generalizes to unseen data. It is measured as the distance between the error on the training set and the test set and is averaged over the entire set of possible training data that can be generated after each iteration of the learning process. It has this name because this function indicates the capacity of a machine that learns with the specified algorithm to infer a rule (or generalize) that is used by the teacher machine to generate data based only on a few examples. 
Generalization Error Analysis  Domain generalization is the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distributionfree, kernelbased approach that predicts a classifier from the marginal distribution of features, by leveraging the trends present in related classification tasks. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on synthetic data and three real data applications demonstrate the superiority of the method with respect to a pooling strategy. 
Generalization Tower Network (GTN) 
Deep learning (DL) advances stateoftheart reinforcement learning (RL), by incorporating deep neural networks in learning representations from the input to RL. However, the conventional deep neural network architecture is limited in learning representations for multitask RL (MTRL), as multiple tasks can refer to different kinds of representations. In this paper, we thus propose a novel deep neural network architecture, namely generalization tower network (GTN), which can achieve MTRL within a single learned model. Specifically, the architecture of GTN is composed of both horizontal and vertical streams. In our GTN architecture, horizontal streams are used to learn representation shared in similar tasks. In contrast, the vertical streams are introduced to be more suitable for handling diverse tasks, which encodes hierarchical shared knowledge of these tasks. The effectiveness of the introduced vertical stream is validated by experimental results. Experimental results further verify that our GTN architecture is able to advance the stateoftheart MTRL, via being tested on 51 Atari games. 
Generalized Additive Mixed Model (GAMM) 
gammSlice 
Generalized Additive Models (GAM) 
In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. GAMs were originally developed by Trevor Hastie and Robert Tibshirani to blend properties of generalized linear models with additive models. https://…additivepartibackgroundandrationale GAM: The Predictive Modeling Silver Bullet gamsel 
Generalized Additive Models for Location, Scale and Shape (GAMLSS) 
This paper introduces generalized additive models for location, scale and shape (GAMLSS) as a modeling framework for analyzing treatment effects beyond the mean. By relating each parameter of the response distribution to explanatory variables, GAMLSS model the treatment effect on the whole conditional distribution. Additionally, any nonnormal outcome and nonlinear effects of explanatory variables can be incorporated. We elaborate on the combination of GAMLSS with program evaluation methods in economics and provide a practical guide to the usage of GAMLSS by reanalyzing data from the \textit{Progresa} program. Contrary to expectations, no significant effects of a cash transfer on the conditional inequality level between treatment and control group are found. 
Generalized Approximate Survey Propagation (GASP) 
In Generalized Linear Estimation (GLE) problems, we seek to estimate a signal that is observed through a linear transform followed by a componentwise, possibly nonlinear and noisy, channel. In the Bayesian optimal setting, Generalized Approximate Message Passing (GAMP) is known to achieve optimal performance for GLE. However, its performance can significantly degrade whenever there is a mismatch between the assumed and the true generative model, a situation frequently encountered in practice. In this paper, we propose a new algorithm, named Generalized Approximate Survey Propagation (GASP), for solving GLE in the presence of prior or model misspecifications. As a prototypical example, we consider the phase retrieval problem, where we show that GASP outperforms the corresponding GAMP, reducing the reconstruction threshold and, for certain choices of its parameters, approaching Bayesian optimal performance. Furthermore, we present a set of State Evolution equations that exactly characterize the dynamics of GASP in the highdimensional limit. 
Generalized Autoregressive Conditional Heteroscedasticity (GARCH) 
If an autoregressive moving average model (ARMA model) is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH, Bollerslev (1986)) model. mfGARCH 
Generalized Autoregressive Moving Average Models (GARMA) 
A class of generalized autoregressive moving average (GARMA) models is developed that extends the univariate Gaussian ARMA time series model to a flexible observationdriven model for nonGaussian time series data. The dependent variable is assumed to have a conditional exponential family distribution given the past history of the process. The model estimation is carried out using an iteratively reweighted least squares algorithm. Properties of the model, including stationarity and marginal moments, are either derived explicitly or investigated using Monte Carlo simulation. The relationship of the GARMA model to other models is shown, including the autoregressive models of Zeger and Qaqish, the moving average models of Li, and the reparameterized generalized autoregressive conditional heteroscedastic GARCH model (providing the formula for its fourth marginal moment not previously derived). The model is demonstrated by the application of the GARMA model with a negative binomial conditional distribution to a wellknown time series dataset of poliomyelitis counts. VGAM 
Generalized Boosted Regression Models  This R package (gbm) implements extensions to Freund and Schapire’s AdaBoost algorithm and J. Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, logistic, Poisson, Cox proportional hazards partial likelihood, multinomial, tdistribution, AdaBoost exponential loss, Learning to Rank, and Huberized hinge loss. gbm 
Generalized Canonical Polyadic Tensor Decomposition (GCP) 
Tensor decomposition is a fundamental unsupervised machine learning method in data science, with applications including network analysis and sensor data processing. This work develops a generalized canonical polyadic (GCP) lowrank tensor decomposition that allows other loss functions besides squared error. For instance, we can use logistic loss or KullbackLeibler divergence, enabling tensor decomposition for binary or count data. We present a variety statisticallymotivated loss functions for various scenarios. We provide a generalized framework for computing gradients and handling missing data that enables the use of standard optimization methods for fitting the model. We demonstrate the flexibility of GCP on several realworld examples including interactions in a social network, neural activity in a mouse, and monthly rainfall measurements in India. 
Generalized ComplexValued Kernel LeastMeanSquare (gCKLMS) 
We propose a novel adaptive kernel based regression method for complexvalued signals: the generalized complexvalued kernel leastmeansquare (gCKLMS). We borrow from the new results on widely linear reproducing kernel Hilbert space (WLRKHS) for nonlinear regression and complexvalued signals, recently proposed by the authors. This paper shows that in the adaptive version of the kernel regression for complexvalued signals we need to include another kernel term, the socalled pseudokernel. This new solution is endowed with better representation capabilities in complexvalued fields, since it can efficiently decouple the learning of the real and the imaginary part. Also, we review previous realizations of the complex KLMS algorithm and its augmented version to prove that they can be rewritten as particular cases of the gCKLMS. Furthermore, important conclusions on the kernels design are drawn that help to greatly improve the convergence of the algorithms. In the experiments, we revisit the nonlinear channel equalization problem to highlight the better convergence of the gCKLMS compared to previous solutions. Also, the flexibility of the proposed generalized approach is tested in a second experiment with nonindependent real and imaginary parts. The results illustrate the significant performance improvements of the gCKLMS approach when the complexvalued signals have different properties for the real and imaginary parts. 
Generalized Data Representation (GDR) 
Deep learning (DL) based autoencoder is a promising architecture to implement endtoend communication systems. In this paper, we focus on the fundamental problems of DLbased communication systems, including high rate transmission and performance analysis. To address the limited data rate issue, we first consider the error rate constraint and design a transmission algorithm to adaptively select the transmission vectors to maximize the data rate for various channel scenarios. Furthermore, a novel generalized data representation (GDR) scheme is proposed to improve the data rate of DLbased communication systems. Then, we analyze the effect of signaltonoise ratio (SNR) and mean squared error performance of the proposed DLbased communication systems. Finally, numerical results show that the proposed adaptive transmission and GDR schemes achieve higher data rate and have lower training complexity than the conventional onehot vector scheme. Both the new schemes and the conventional scheme have comparable block error rate (BLER) performance. According to both theoretical analysis and simulated results, it is suggested that low or widerange training SNR is beneficial to attain good BLER performance for practical transmission with various channel scenarios. 
Generalized Dilation Neural Network  Vanilla convolutional neural networks are known to provide superior performance not only in image recognition tasks but also in natural language processing and time series analysis. One of the strengths of convolutional layers is the ability to learn features about spatial relations in the input domain using various parameterized convolutional kernels. However, in time series analysis learning such spatial relations is not necessarily required nor effective. In such cases, kernels which model temporal dependencies or kernels with broader spatial resolutions are recommended for more efficient training as proposed by dilation kernels. However, the dilation has to be fixed a priori which limits the flexibility of the kernels. We propose generalized dilation networks which generalize the initial dilations in two aspects. First we derive an endtoend learnable architecture for dilation layers where also the dilation rate can be learned. Second we break up the strict dilation structure, in that we develop kernels operating independently in the input space. 
Generalized DirichletProcessMeans (Generalised DPMeans) 
DPmeans clustering was obtained as an extension of Kmeans clustering. While it is implemented with a simple and efficient algorithm, it can estimate the number of clusters simultaneously. However, DPmeans is specifically designed for the average distortion measure. Therefore, it is vulnerable to outliers in data, and it can cause large maximum distortion in clusters. In this work, we extend the objective function of the DPmeans to fseparable distortion measures and propose a unified learning algorithm to overcome the above problems by the selection of the function f. Furthermore, the influence function of the estimated cluster center is analyzed to evaluate the robustness against outliers. We show the effectiveness of the generalized method by numerical experiments using real datasets. 
Generalized Discrimination Score  The Generalized Discrimination Score is a generic forecast verification framework which can be applied to any of the following verification contexts: dichotomous, polychotomous (ordinal and nominal), continuous, probabilistic, and ensemble. A comprehensive description of the Generalized Discrimination Score, including all equations used in this package, is provided by Mason and Weigel (2009) <doi:10.1175/MWRD1005069.1> afc 
Generalized Dissimilarity Modeling (GDM) 
Generalized dissimilarity modelling (GDM) is a statistical technique for analysing and predicting spatial patterns of turnover in community composition (beta diversity) across large regions. gdm 
Generalized Dual Averaging  We present a new class of algorithms for solving regularized optimization and saddle point problems. We analyse this class of methods for convex optimization and convexconcave saddle point problems and expect that they will be useful for solving nonconvex problems as well. For convex and convexconcave problems, our algorithms form a novel class of primal dual subgradient methods. This new class of methods extends existing methods by utilizing a more general bound on the objective error and duality gap. This leads to methods for which we can control the step size of the proximal update, which is of interest for problems where the sparsity of the iterates is important. We prove that our class of methods is optimal from the point of view of worstcase blackbox complexity for convex optimization problems, and derive a version for convexconcave saddle point problems. We also analyse our methods in the stochastic and online settings. Finally, we exhibit a variety of special cases and discuss their usefulness for nonconvex optimization. 
Generalized Dynamic Principal Components (GDPC) 
Brillinger defined dynamic principal components (DPC) for time series based on a reconstruction criterion. He gave a very elegant theoretical solution and proposed an estimator which is consistent under stationarity. Here, we propose a new enterally empirical approach to DPC. The main differences with the existing methodsmainly Brillinger procedureare (1) the DPC we propose need not be a linear combination of the observations and (2) it can be based on a variety of loss functions including robust ones. Unlike Brillinger, we do not establish any consistency results; however, contrary to Brillinger’s, which has a very strong stationarity flavor, our concept aims at a better adaptation to possible nonstationary features of the series. We also present a robust version of our procedure that allows to estimate the DPC when the series have outlier contamination. We give iterative algorithms to compute the proposed procedures that can be used with a large number of variables. Our nonrobust and robust procedures are illustrated with real datasets. Supplementary materials for this article are available online. Consistency of Generalized Dynamic Principal Components in Dynamic Factor Models gdpc 
Generalized Entropy Agglomeration (GEA) 
Entropy Agglomeration (EA) is a hierarchical clustering algorithm introduced in 2013. Here, we generalize it to define Generalized Entropy Agglomeration (GEA) that can work with multiset blocks and blocks with rational occurrence numbers. We also introduce a numerical categorization procedure to apply GEA to numerical datasets. The software REBUS 2.0 is published with these capabilities: http://…/rebus2 
Generalized Estimation Equation (GEE) 
In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unknown correlation between outcomes. Parameter estimates from the GEE are consistent even when the covariance structure is misspecified, under mild regularity conditions. The focus of the GEE is on estimating the average response over the population (‘populationaveraged’ effects) rather than the regression parameters that would enable prediction of the effect of changing one or more covariates on a given individual. GEEs are usually used in conjunction with HuberWhite standard error estimates, also known as ‘robust standard error’ or ‘sandwich variance’ estimates. In the case of a linear model with a working independence variance structure, these are known as ‘heteroscedasticity consistent standard error’ estimators. Indeed, the GEE unified several independent formulations of these standard error estimators in a general framework. GEEs belong to a class of semiparametric regression techniques because they rely on specification of only the first two moments. Under correct model specification and mild regularity conditions, parameter estimates from GEEs are consistent. They are a popular alternative to the likelihoodbased generalized linear mixed model which is more sensitive to variance structure specification. They are commonly used in large epidemiological studies, especially multisite cohort studies because they can handle many types of unmeasured dependence between outcomes. Sequential estimation for GEE with adaptive variables and subject selection mmmgee 
Generalized Four Moment Theorem (G4MT) 
The universality for the local spiked eigenvalues is a powerful tool to deal with the problems of the asymptotic law for the bulks of spiked eigenvalues of highdimensional generalized Fisher matrices. In this paper, we focus on a more generalized spiked Fisher matrix, where $\Sigma_1\Sigma_2^{1}$ is free of the restriction of diagonal independence, and both of the spiked eigenvalues and the population 4th moments are not necessary required to be bounded. By reducing the matching four moments constraint to a tail probability, we propose a Generalized Four Moment Theorem (G4MT) for the bulks of spiked eigenvalues of highdimensional generalized Fisher matrices, which shows that the limiting distribution of the spiked eigenvalues of a generalized spiked Fisher matrix is independent of the actual distributions of the samples provided to satisfy the our relaxed assumptions. Furthermore, as an illustration, we also apply the G4MT to the Central Limit Theorem for the spiked eigenvalues of generalized spiked Fisher matrix, which removes the strict condition of the diagonal block independence given in Wang and Yao (2017) and extends their result to a wider usage without the requirements of the bounded 4th moments and the diagonal block independent structure, meeting the actual cases better. 
Generalized Gaussian Kernel Adaptive Filtering  The present paper proposes generalized Gaussian kernel adaptive filtering, where the kernel parameters are adaptive and datadriven. The Gaussian kernel is parametrized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of the scalar width parameter. These parameters are adaptively updated on the basis of a proposed leastsquaretype rule to minimize the estimation error. The main contribution of this paper is to establish update rules for precision matrices on the SPD manifold in order to keep their symmetric positivedefiniteness. Different from conventional kernel adaptive filters, the proposed regressor is a superposition of Gaussian kernels with all different parameters, which makes such regressor more flexible. The kernel adaptive filtering algorithm is established together with a l1regularized least squares to avoid overfitting and the increase of dimensionality of the dictionary. Experimental results confirm the validity of the proposed method. 
Generalized Graded Unfolding Model (GGUM) 
The generalized graded unfolding model (GGUM) is developed. This model allows for either binary or graded responses and generalizes previous item response models for unfolding in two useful ways. First, it implements a discrimination parameter that varies across items, which allows items to discriminate among respondents in different ways. Second, the GGUM permits response category threshold parameters to vary across items. Amarginal maximum likelihood algorithm is implemented to estimate GGUM item parameters, whereas person parameters are derived from an expected a posteriori technique. The applicability of the GGUM to common attitude testing situations is illustrated with real data on student attitudes toward abortion. http://…/gbm2.pdf ScoreGGUM 
Generalized Hyperbolic Distributions (GH) 
The generalised hyperbolic distribution (GH) is a continuous probability distribution defined as the normal variancemean mixture where the mixing distribution is the generalized inverse Gaussian distribution. Its probability density function is given in terms of modified Bessel function of the second kind. As the name suggests it is of a very general form, being the superclass of, among others, the Student’s tdistribution, the Laplace distribution, the hyperbolic distribution, the normalinverse Gaussian distribution and the variancegamma distribution. It is mainly applied to areas that require sufficient probability of farfield behaviour, which it can model due to its semiheavy tails – a property the normal distribution does not possess. The generalised hyperbolic distribution is often used in economics, with particular application in the fields of modelling financial markets and risk management, due to its semiheavy tails. This class is closed under linear operations. 
Generalized Integration Model  Integrates individuallevel data and summary statistics under a generalized linear model framework. gim 
Generalized Integrative Principal Component Analysis (GIPCA) 
Highdimensional multisource data are encountered in many fields. Despite recent developments on the integrative dimension reduction of such data, most existing methods cannot easily accommodate data of multiple types (e.g., binary or countvalued). Moreover, multisource data often have blockwise missing structure, i.e., data in one or more sources may be completely unobserved for a sample. The heterogeneous data types and presence of blockwise missing data pose significant challenges to the integration of multisource data and further statistical analyses. In this paper, we develop a lowrank method, called Generalized Integrative Principal Component Analysis (GIPCA), for the simultaneous dimension reduction and imputation of multisource blockwise missing data, where different sources may have different data types. We also devise an adapted BIC criterion for rank estimation. Comprehensive simulation studies demonstrate the efficacy of the proposed method in terms of rank estimation, signal recovery, and missing data imputation. We apply GIPCA to a mortality study. We achieve accurate blockwise missing data imputation and identify intriguing latent mortality rate patterns with sociological relevance. 
Generalized Kalman Smoothing  Statespace smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example RauchTungStriebel and MayneFraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model. These classical formulations fall short in many important circumstances. For instance, smoothers obtained using quadratic penalties can fail when outliers are present in the data, and cannot track impulsive inputs and abrupt state changes. Motivated by these shortcomings, generalized Kalman smoothing formulations have been proposed in the last few years, replacing quadratic models with more suitable, often nonsmooth, convex functions. In contrast to classical models, these general estimators require use of iterated algorithms, and these have received increased attention from control, signal processing, machine learning, and optimization communities. In this survey we show that the optimization viewpoint provides the control and signal processing community great freedom in the development of novel modeling and inference frameworks for dynamical systems. We discuss general statistical models for dynamic systems, making full use of nonsmooth convex penalties and constraints, and providing links to important models in signal processing and machine learning. We also survey optimization techniques for these formulations, paying close attention to dynamic problem structure. Modeling concepts and algorithms are illustrated with numerical examples. 
Generalized kNearest Neighbor (GkNN) 
Three methods of temporal data upscaling, which may collectively be called the generalized knearest neighbor (GkNN) method, are considered. The accuracy of the GkNN simulation of month by month yield is considered (where the term yield denotes the dependent variable). The notion of an eventually well distributed time series is introduced and on the basis of this assumption some properties of the average annual yield and its variance for a GkNN simulation are computed. The total yield over a planning period is determined and a general framework for considering the GkNN algorithm based on the notion of stochastically dependent time series is described and it is shown that for a sufficiently large training set the GkNN simulation has the same statistical properties as the training data. An example of the application of the methodology is given in the problem of simulating yield of a rainwater tank given monthly climatic data. 
Generalized Lambda Distribution (GLD) 
Generalized lambda distribution is a generic distribution that can be used for various curve fittings or in general mathematical analysis. It is interesting because of the wide variety of distributional shapes it can take on. There are methods how to use this distribution to approximate various other distributions, or to fit experimental data set to this distribution. GLDEX 
Generalized Least Squares Screening (GLSS) 
Variable selection is a widely studied problem in high dimensional statistics, primarily since estimating the precise relationship between the covariates and the response is of great importance in many scientific disciplines. However, most of theory and methods developed towards this goal for the linear model invoke the assumption of iid subGaussian covariates and errors. This paper analyzes the theoretical properties of Sure Independence Screening (SIS) (Fan and Lv ) for high dimensional linear models with dependent and/or heavy tailed covariates and errors. We also introduce a generalized least squares screening (GLSS) procedure which utilizes the serial correlation present in the data. By utilizing this serial correlation when estimating our marginal effects, GLSS is shown to outperform SIS in many cases. For both procedures we prove sure screening properties, which depend on the moment conditions, and the strength of dependence in the error and covariate processes, amongst other factors. Additionally, combining these screening procedures with the adaptive Lasso is analyzed. Dependence is quantified by functional dependence measures (Wu ), and the results rely on the use of Nagaev type and exponential inequalities for dependent random variables. We also conduct simulations to demonstrate the finite sample performance of these procedures, and include a real data application of forecasting the US inflation rate. 
Generalized Likelihood Ratio Test (GLRT) 

Generalized Linear Mixed Model (GLMM) 
In statistics, a generalized linear mixed model (GLMM) is a particular type of mixed model. It is an extension to the generalized linear model in which the linear predictor contains random effects in addition to the usual fixed effects. These random effects are usually assumed to have a normal distribution. Fitting such models by maximum likelihood involves integrating over these random effects. In general, these integrals cannot be expressed in analytical form. Various approximate methods have been developed, but none has good properties for all possible models and data sets (ungrouped binary data being particularly problematic). For this reason, methods involving numerical quadrature or Markov chain Monte Carlo have increased in use as increasing computing power and advances in methods have made them more practical. 
Generalized Linear Models (GLM) 
In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. mdscore,mglmn 
Generalized Logistic Distribution  The term generalized logistic distribution is used as the name for several different families of probability distributions. For example, Johnson et al. list four forms, which are listed below. One family described here has also been called the skewlogistic distribution. For other families of distributions that have also been called generalized logistic distributions, see the shifted loglogistic distribution, which is a generalization of the loglogistic distribution. 
Generalized LpNorm TwoDimensional Linear Discriminant Analysis (G2DLDA) 
Recent advances show that twodimensional linear discriminant analysis (2DLDA) is a successful matrix based dimensionality reduction method. However, 2DLDA may encounter the singularity issue theoretically and the sensitivity to outliers. In this paper, a generalized Lpnorm 2DLDA framework with regularization for an arbitrary $p>0$ is proposed, named G2DLDA. There are mainly two contributions of G2DLDA: one is G2DLDA model uses an arbitrary Lpnorm to measure the betweenclass and withinclass scatter, and hence a proper $p$ can be selected to achieve the robustness. The other one is that by introducing an extra regularization term, G2DLDA achieves better generalization performance, and solves the singularity problem. In addition, G2DLDA can be solved through a series of convex problems with equality constraint, and it has closed solution for each single problem. Its convergence can be guaranteed theoretically when $1\leq p\leq2$. Preliminary experimental results on three contaminated human face databases show the effectiveness of the proposed G2DLDA. 
Generalized Mallows Model Latent Dirichlet Allocation (GMMLDA) 
Modeling document structure is of great importance for discourse analysis and related applications. The goal of this research is to capture the document intent structure by modeling documents as a mixture of topic words and rhetorical words. While the topics are relatively unchanged through one document, the rhetorical functions of sentences usually change following certain orders in discourse. We propose GMMLDA, a topic modeling based Bayesian unsupervised model, to analyze the document intent structure cooperated with order information. Our model is flexible that has the ability to combine the annotations and do supervised learning. Additionally, entropic regularization can be introduced to model the significant divergence between topics and intents. We perform experiments in both unsupervised and supervised settings, results show the superiority of our model over several stateoftheart baselines. 
Generalized Matrix Chain Algorithm  In this paper, we present a generalized version of the matrix chain algorithm to generate efficient code for linear algebra problems, a task for which human experts often invest days or even weeks of works. The standard matrix chain problem consists in finding the parenthesization of a matrix product $M := A_1 A_2 \cdots A_n$ that minimizes the number of scalar operations. In practical applications, however, one frequently encounters more complicated expressions, involving transposition, inversion, and matrix properties. Indeed, the computation of such expressions relies on a set of computational kernels that offer functionality well beyond the simple matrix product. The challenge then shifts from finding an optimal parenthesization to finding an optimal mapping of the input expression to the available kernels. Furthermore, it is often the case that a solution based on the minimization of scalar operations does not result in the optimal solution in terms of execution time. In our experiments, the generated code outperforms other libraries and languages on average by a factor of about 9. The motivation for this work comes from the fact that—despite great advances in the development of compilers—the task of mapping linear algebra problems to optimized kernels is still to be done manually. In order to relieve the user from this complex task, new techniques for the compilation of linear algebra expressions have to be developed. 
Generalized Matrix Splitting Algorithm (GMSA) 
Composite function minimization captures a wide spectrum of applications in both computer vision and machine learning. It includes bound constrained optimization, $\ell_1$ norm regularized optimization, and $\ell_0$ norm regularized optimization as special cases. This paper proposes and analyzes a new Generalized Matrix Splitting Algorithm (GMSA) for minimizing composite functions. It can be viewed as a generalization of the classical GaussSeidel method and the Successive OverRelaxation method for solving linear systems in the literature. Our algorithm is derived from a novel triangle operator mapping, which can be computed exactly using a new generalized Gaussian elimination procedure. We establish the global convergence, convergence rate, and iteration complexity of GMSA for convex problems. In addition, we also discuss several important extensions of GMSA. Finally, we validate the performance of our proposed method on three particular applications: nonnegative matrix factorization, $\ell_0$ norm regularized sparse coding, and $\ell_1$ norm regularized Dantzig selector problem. Extensive experiments show that our method achieves stateoftheart performance in term of both efficiency and efficacy. 
Generalized Maximum Entropy Estimation  We consider the problem of estimating a probability distribution that maximizes the entropy while satisfying a finite number of moment constraints, possibly corrupted by noise. Based on duality of convex programming, we present a novel approximation scheme using a smoothed fast gradient method that is equipped with explicit bounds on the approximation error. We further demonstrate how the presented scheme can be used for approximating the chemical master equation through the zeroinformation moment closure method. 
Generalized Method of Wavelet Moments (GMWM) 
gmwm 
Generalized MinMax (GMM) 
We develop some theoretical results for a robust similarity measure named ‘generalized minmax’ (GMM). This similarity has direct applications in machine learning as a positive definite kernel and can be efficiently computed via probabilistic hashing. Owing to the discrete nature, the hashed values can also be used for efficient near neighbor search. We prove the theoretical limit of GMM and the consistency result, assuming that the data follow an elliptical distribution, which is a very general family of distributions and includes the multivariate $t$distribution as a special case. The consistency result holds as long as the data have bounded first moment (an assumption which essentially holds for datasets commonly encountered in practice). Furthermore, we establish the asymptotic normality of GMM. Compared to the ‘cosine’ similarity which is routinely adopted in current practice in statistics and machine learning, the consistency of GMM requires much weaker conditions. Interestingly, when the data follow the $t$distribution with $\nu$ degrees of freedom, GMM typically provides a better measure of similarity than ‘cosine’ roughly when $\nu<8$ (which is already very close to normal). These theoretical results will help explain the recent success of GMM in learning tasks. 
Generalized Modularity  The network embedding problem aims to map nodes that are similar to each other to vectors in a Euclidean space that are close to each other. Like centrality analysis (ranking) and community detection, network embedding is in general considered as an illposed problem, and its solution may depend on a person’s view on this problem. In this book chapter, we adopt the framework of sampled graphs that treat a person’s view as a sampling method for a network. The modularity for a sampled graph, called the generalized modularity in the book chapter, is a similarity matrix that has a specific probabilistic interpretation. One of the main contributions of this book chapter is to propose using the generalized modularity matrix for network embedding and show that the network embedding problem can be treated as a trace maximization problem like the community detection problem. Our generalized modularity embedding approach is very general and flexible. In particular, we show that the Laplacian eigenmaps is a special case of our generalized modularity embedding approach. Also, we show that dimensionality reduction can be done by using a particular sampled graph. Various experiments are conducted on real datasets to illustrate the effectiveness of our approach. 
Generalized Multistate Simulation Model  GUIgems,gems 
Generalized Network Dismantling  Finding the set of nodes, which removed or (de)activated can stop the spread of (dis)information, contain an epidemic or disrupt the functioning of a corrupt/criminal organization is still one of the key challenges in network science. In this paper, we introduce the generalized network dismantling problem, which aims to find the set of nodes that, when removed from a network, results in a network fragmentation into subcritical network components at minimum cost. For unit costs, our formulation becomes equivalent to the standard network dismantling problem. Our nonunit cost generalization allows for the inclusion of topological cost functions related to node centrality and nontopological features such as the price, protection level or even social value of a node. In order to solve this optimization problem, we propose a method, which is based on the spectral properties of a novel nodeweighted Laplacian operator. The proposed method is applicable to largescale networks with millions of nodes. It outperforms current stateoftheart methods and opens new directions in understanding the vulnerability and robustness of complex systems. 
Generalized Outlier Robust Extreme Learning Machine (GORELM) 
The popularity of algorithms based on Extreme Learning Machine (ELM), which can be used to train Single Layer Feedforward Neural Networks (SLFN), has increased in the past years. They have been successfully applied to a wide range of classification and regression tasks. The most commonly used methods are the ones based on minimizing the $\ell_2$ norm of the error, which is not suitable to deal with outliers, essentially in regression tasks. The use of $\ell_1$ norm was proposed in Outlier Robust ELM (ORELM), which is defined to onedimensional outputs. In this paper, we generalize ORELM to deal with multitarget regression problems, using the error $\ell_{2,1}$ norm and the Elastic Net theory, which can result in a more sparse network, resulting in our method, Generalized Outlier Robust Extreme Learning Machine (GORELM). We use Alternating Direction Method of Multipliers (ADMM) to solve the resulting optimization problem. An incremental version of GORELM is also proposed. We chose 15 public realworld multitarget regression datasets to test our methods. Our conducted experiments show that they are statistically better than other ELMbased techniques, when considering data contaminated with outliers, and equivalent to them, otherwise. 
Generalized Power Generalized Weibull Distribution  This paper introduces a new generalization of the power generalized Weibull distribution called the generalized power generalized Weibull distribution. This distribution can also be considered as a generalization of Weibull distribution. The hazard rate function of the new model has nice and flexible properties and it can take various shapes, including increasing, decreasing, upsidedown bathtub and bathtub shapes. Some of the statistical properties of the new model, including quantile function, moment generating function, reliability function, hazard function and the reverse hazard function are obtained. The moments, incomplete moments, mean deviations and Bonferroni and Lorenz curves and the order statistics densities are also derived. The model parameters are estimated by the maximum likelihood method. The usefulness of the proposed model is illustrated by using two applications of reallife data. 
Generalized Probabilistic Principal Component Analysis (GPPCA) 
Principal component analysis (PCA) is a wellestablished tool in machine learning and data processing. \cite{tipping1999probabilistic} proposed a probabilistic formulation of PCA (PPCA) by showing that the principal axes in PCA are equivalent to the maximum marginal likelihood estimator of the factor loading matrix in a latent factor model for the observed data, assuming that the latent factors are independently distributed as standard normal distributions. However, the independence assumption may be unrealistic for many scenarios such as modeling multiple time series, spatial processes, and functional data, where the output variables are correlated. In this paper, we introduce the generalized probabilistic principal component analysis (GPPCA) to study the latent factor model of multiple correlated outcomes, where each factor is modeled by a Gaussian process. The proposed method provides a probabilistic solution of the latent factor model with the scalable computation. In particular, we derive the maximum marginal likelihood estimator of the factor loading matrix and the predictive distribution of the output. Based on the explicit expression of the precision matrix in the marginal likelihood, the number of the computational operations is linear to the number of output variables. Moreover, with the use of the Mat{\’e}rn covariance function, the number of the computational operations is also linear to the number of time points for modeling the multiple time series without any approximation to the likelihood function. We discuss the connection of the GPPCA with other approaches such as the PCA and PPCA, and highlight the advantage of GPPCA in terms of the practical relevance, estimation accuracy and computational convenience. Numerical studies confirm the excellent finitesample performance of the proposed approach. 
Generalized Probability Smoothing  In this work we consider a generalized version of Probability Smoothing, the core elementary model for sequential prediction in the state of the art PAQ family of data compression algorithms. Our main contribution is a code length analysis that considers the redundancy of Probability Smoothing with respect to a Piecewise Stationary Source. The analysis holds for a finite alphabet and expresses redundancy in terms of the total variation in probability mass of the stationary distributions of a Piecewise Stationary Source. By choosing parameters appropriately Probability Smoothing has redundancy $O(S\cdot\sqrt{T\log T})$ for sequences of length $T$ with respect to a Piecewise Stationary Source with $S$ segments. 
Generalized Procrustes Analysis (GPA) 
Generalized Procrustes analysis (GPA) is a method of statistical analysis that can be used to compare the shapes of objects, or the results of surveys, interviews, or panels. It was developed for analysing the results of freechoice profiling, a survey technique which allows respondents (such as sensory panelists) to describe a range of products in their own words or language. GPA is one way to make sense of freechoice profiling data; other ways can be multiple factor analysis (MFA), or the STATIS method. The method was first published by J. C. Gower in 1975. 
Generalized Resistant Hyperplane Mechanisms  This paper is part of an emerging line of work at the intersection of machine learning and mechanism design, which aims to avoid noise in training data by correctly aligning the incentives of data sources. Specifically, we focus on the ubiquitous problem of linear regression, where strategyproof mechanisms have previously been identified in two dimensions. In our setting, agents have singlepeaked preferences and can manipulate only their response variables. Our main contribution is the discovery of a family of group strategyproof linear regression mechanisms in any number of dimensions, which we call generalized resistant hyperplane mechanisms. The gametheoretic properties of these mechanisms — and, in fact, their very existence — are established through a connection to a discrete version of the Ham Sandwich Theorem. 
Generalized Robust Risk Minimization (GRRM) 
Different types of training data have led to numerous schemes for supervised classification. Current learning techniques are tailored to one specific scheme and cannot handle general ensembles of training data. This paper presents a unifying framework for supervised classification with general ensembles of training data, and proposes the learning methodology of generalized robust risk minimization (GRRM). The paper shows how current and novel supervision schemes can be addressed under the proposed framework by representing the relationship between examples at test and training via probabilistic transformations. The results show that GRRM can handle different types of training data in a unified manner, and enable new supervision schemes that aggregate general ensembles of training data. 
Generalized Sliced Wasserstein Distances (GSW) 
The Wasserstein distance and its variations, e.g., the slicedWasserstein (SW) distance, have recently drawn attention from the machine learning community. The SW distance, specifically, was shown to have similar properties to the Wasserstein distance, while being much simpler to compute, and is therefore used in various applications including generative modeling and general supervised/unsupervised learning. In this paper, we first clarify the mathematical connection between the SW distance and the Radon transform. We then utilize the generalized Radon transform to define a new family of distances for probability measures, which we call generalized slicedWasserstein (GSW) distances. We also show that, similar to the SW distance, the GSW distance can be extended to a maximum GSW (maxGSW) distance. We then provide the conditions under which GSW and maxGSW distances are indeed distances. Finally, we compare the numerical performance of the proposed distances on several generative modeling tasks, including SW flows and SW autoencoders. 
Generalized Sparse Additive Model  We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this class under a weak compatibility condition. In addition, we characterize the rate of convergence when this compatibility condition is not met. Finally, we also show that the optimal penalty parameters for structure and sparsity penalties in our framework are linked, allowing crossvalidation to be conducted over only a single tuning parameter. We complement our theoretical results with empirical studies comparing some existing methods within this framework. 
Generalized Structured Component Analysis (GSCA) 
gesca 
Generalized Strucutral Causal Model (GSCM) 
Structural causal models are a popular tool to describe causal relations in systems in many fields such as economy, the social sciences, and biology. In this work, we show that these models are not flexible enough in general to give a complete causal representation of equilibrium states in dynamical systems that do not have a unique stable equilibrium independent of initial conditions. We prove that our proposed generalized structural causal models do capture the essential causal semantics that characterize these systems. We illustrate the power and flexibility of this extension on a dynamical system corresponding to a basic enzymatic reaction. We motivate our approach further by showing that it also efficiently describes the effects of interventions on functional laws such as the ideal gas law. 
Generalized TimeDependent ROC  Treebased methods are popular nonparametric tools in studying timetoevent outcomes. In this article, we introduce a novel framework for survival trees and forests, where the trees partition the dynamic survivor population and can handle timedependent covariates. Using the idea of randomized tests, we develop generalized timedependent ROC curves to evaluate the performance of survival trees and establish the optimality of the target hazard function with respect to the ROC curve. The treegrowing algorithm is guided by decisiontheoretic criteria based on ROC, targeting specifically for prediction accuracy. While existing survival trees with timedependent covariates have practical limitations due to ambiguous prediction, the proposed method provides a consistent prediction of the failure risk. We further extend the survival trees to random forests, where the ensemble is based on martingale estimating equations, in contrast with many existing survival forest algorithms that average the predicted survival or cumulative hazard functions. Simulations studies demonstrate strong performances of the proposed methods. We apply the methods to a study on AIDS for illustration. 
Generalized Value Iteration Network (GVIN) 
In this paper, we introduce a generalized value iteration network (GVIN), which is an endtoend neural network planning module. GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. We propose three novel differentiable kernels as graph convolution operators and show that the embedding based kernel achieves the best performance. We further propose episodic Qlearning, an improvement upon traditional nstep Qlearning that stabilizes training for networks that contain a planning module. Lastly, we evaluate GVIN on planning problems in 2D mazes, irregular graphs, and realworld street networks, showing that GVIN generalizes well for both arbitrary graphs and unseen graphs of larger scale and outperforms a naive generalization of VIN (discretizing a spatial graph into a 2D image). 
Generalized Variational Inference (GVI) 
This paper introduces a generalized representation of Bayesian inference. It is derived axiomatically, recovering existing Bayesian methods as special cases. We use it to prove that variational inference (VI) based on the KullbackLeibler Divergence with a variational family Q produces the uniquely optimal Qconstrained approximation to the exact Bayesian inference problem. Surprisingly, this implies that standard VI dominates any other Qconstrained approximation to the exact Bayesian inference problem. This means that alternative Qconstrained approximations such as VI targeted at minimizing other divergences and Expectation Propagation can produce better posteriors than VI only by implicitly targeting more appropriate Bayesian inference problems. Inspired by this, we introduce Generalized Variational Inference (GVI), a modular approach for instead solving such alternative inference problems explicitly. We explore some applications of GVI, including robustness and better marginals. Lastly, we derive black box GVI and apply it to Bayesian Neural Networks as well as Deep Gaussian Processes, where GVI comprehensively outperforms competing methods. Robust Deep Gaussian Processes 
Generalized Vector Space Model (GVSM) 
The Generalized vector space model is a generalization of the vector space model used in information retrieval. Many classifiers, especially those which are related to document or text classification, use the TFIDF basis of VSM. However, this is where the similarity between the models ends – the generalized model uses the results of the TFIDF dictionary to generate similarity metrics based on distance or angle difference, rather than centroid based classification. Wong et al. presented an analysis of the problems that the pairwise orthogonality assumption of the vector space model (VSM) creates. From here they extended the VSM to the generalized vector space model (GVSM). 
Generalized ZeroShot learning (GZSL) 
We propose a novel Generalized ZeroShot learning (GZSL) method that is agnostic to both unseen images and unseen semantic vectors during training. Prior works in this context propose to map highdimensional visual features to the semantic domain, we believe contributes to the semantic gap. To bridge the gap, we propose a novel lowdimensional embedding of visual instances that is ‘visually semantic.’ Analogous to semantic data that quantifies the existence of an attribute in the presented instance, components of our visual embedding quantifies existence of a prototypical parttype in the presented instance. In parallel, as a thought experiment, we quantify the impact of noisy semantic data by utilizing a novel visual oracle to visually supervise a learner. These factors, namely semantic noise, visualsemantic gap and label noise lead us to propose a new graphical model for inference with pairwise interactions between label, semantic data, and inputs. We tabulate results on a number of benchmark datasets demonstrating significant improvement in accuracy over stateoftheart under both semantic and visual supervision. 
GeneraltoSpecific Model (GETS) 
This paper discusses the econometric methodology of generaltospecific modeling, in which the modeler simplifies an initially general model that adequately characterizes the empirical evidence within his or her theoretical framework. Central aspects of this approach include the theory of reduction, dynamic specification, model selection procedures, model selection criteria, model comparison, encompassing, computer automation, and empirical implementation. This paper thus reviews the theory of reduction, summarizes the approach of generaltospecific modeling, and discusses the econometrics of model selection, noting that generaltospecific modeling is the practical embodiment of reduction. gets 
GeneRAting TIme Series (GRATIS) 
The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires a diverse collection of time series benchmarking data to enable reliable comparisons against alternative approaches. We propose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We generate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application. 
Generative Actor Critic (GAC) 
We identify a fundamental problem in policy gradientbased methods in continuous control. As policy gradient methods require the agent’s underlying probability distribution, they limit policy representation to parametric distribution classes. We show that optimizing over such sets results in local movement in the action space and thus convergence to suboptimal solutions. We suggest a novel distributional framework, able to represent arbitrary distribution functions over the continuous action space. Using this framework, we construct a generative scheme, trained using an offpolicy actorcritic paradigm, which we call the Generative Actor Critic (GAC). Compared to policy gradient methods, GAC does not require knowledge of the underlying probability distribution, thereby overcoming these limitations. Empirical evaluation shows that our approach is comparable and often surpasses current stateoftheart baselines in continuous domains. 
Generative Adversarial Autoencoder Network (GAAN) 
We introduce an effective model to overcome the problem of mode collapse when training Generative Adversarial Networks (GAN). Firstly, we propose a new generator objective that finds it better to tackle mode collapse. And, we apply an independent Autoencoders (AE) to constrain the generator and consider its reconstructed samples as ‘real’ samples to slow down the convergence of discriminator that enables to reduce the gradient vanishing problem and stabilize the model. Secondly, from mappings between latent and data spaces provided by AE, we further regularize AE by the relative distance between the latent and data samples to explicitly prevent the generator falling into mode collapse setting. This idea comes when we find a new way to visualize the mode collapse on MNIST dataset. To the best of our knowledge, our method is the first to propose and apply successfully the relative distance of latent and data samples for stabilizing GAN. Thirdly, our proposed model, namely Generative Adversarial Autoencoder Networks (GAAN), is stable and has suffered from neither gradient vanishing nor mode collapse issues, as empirically demonstrated on synthetic, MNIST, MNIST1K, CelebA and CIFAR10 datasets. Experimental results show that our method can approximate well multimodal distribution and achieve better results than stateoftheart methods on these benchmark datasets. Our model implementation is published here: https://…/gaan 
Generative Adversarial Capsule Network (CapsuleGAN) 
We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) setting, while modeling image data. We provide guidelines for designing CapsNet discriminators and the updated GAN objective function, which incorporates the CapsNet margin loss, for training CapsuleGAN models. We show that CapsuleGAN outperforms convolutionalGAN at modeling image data distribution on the MNIST dataset of handwritten digits, evaluated on the generative adversarial metric and at semisupervised image classification. 
Generative Adversarial Imitation Learning (GAIL) 
➘ “RiskAverse Imitation Learning” 
Generative Adversarial Imputation Net (GAIN) 
We propose a novel method for imputing missing data by adapting the wellknown Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms stateoftheart imputation methods. 
Generative Adversarial Mapping Networks (GAMN) 
Generative Adversarial Networks (GANs) have shown impressive performance in generating photorealistic images. They fit generative models by minimizing certain distance measure between the real image distribution and the generated data distribution. Several distance measures have been used, such as JensenShannon divergence, $f$divergence, and Wasserstein distance, and choosing an appropriate distance measure is very important for training the generative network. In this paper, we choose to use the maximum mean discrepancy (MMD) as the distance metric, which has several nice theoretical guarantees. In fact, generative moment matching network (GMMN) (Li, Swersky, and Zemel 2015) is such a generative model which contains only one generator network $G$ trained by directly minimizing MMD between the real and generated distributions. However, it fails to generate meaningful samples on challenging benchmark datasets, such as CIFAR10 and LSUN. To improve on GMMN, we propose to add an extra network $F$, called mapper. $F$ maps both real data distribution and generated data distribution from the original data space to a feature representation space $\mathcal{R}$, and it is trained to maximize MMD between the two mapped distributions in $\mathcal{R}$, while the generator $G$ tries to minimize the MMD. We call the new model generative adversarial mapping networks (GAMNs). We demonstrate that the adversarial mapper $F$ can help $G$ to better capture the underlying data distribution. We also show that GAMN significantly outperforms GMMN, and is also superior to or comparable with other stateoftheart GAN based methods on MNIST, CIFAR10 and LSUNBedrooms datasets. 
Generative Adversarial Minority Oversampling  Class imbalance is a longstanding problem relevant to a number of realworld applications of deep learning. Oversampling techniques, which are effective for handling class imbalance in classical learning systems, can not be directly applied to endtoend deep learning systems. We propose a threeplayer adversarial game between a convex generator, a multiclass classifier network, and a real/fake discriminator to perform oversampling in deep learning systems. The convex generator generates new samples from the minority classes as convex combinations of existing instances, aiming to fool both the discriminator as well as the classifier into misclassifying the generated samples. Consequently, the artificial samples are generated at critical locations near the peripheries of the classes. This, in turn, adjusts the classifier induced boundaries in a way which is more likely to reduce misclassification from the minority classes. Extensive experiments on multiple class imbalanced image datasets establish the efficacy of our proposal. 
Generative Adversarial Network (GAN) 
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax twoplayer game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples. GitXiv 
Generative Adversarial Network Embedding (GANE) 
Network embedding has become a hot research topic recently which can provide lowdimensional feature representations for many machine learning applications. Current work focuses on either (1) whether the embedding is designed as an unsupervised learning task by explicitly preserving the structural connectivity in the network, or (2) whether the embedding is a byproduct during the supervised learning of a specific discriminative task in a deep neural network. In this paper, we focus on bridging the gap of the two lines of the research. We propose to adapt the Generative Adversarial model to perform network embedding, in which the generator is trying to generate vertex pairs, while the discriminator tries to distinguish the generated vertex pairs from real connections (edges) in the network. Wasserstein1 distance is adopted to train the generator to gain better stability. We develop three variations of models, including GANE which applies cosine similarity, GANEO1 which preserves the firstorder proximity, and GANEO2 which tries to preserves the secondorder proximity of the network in the lowdimensional embedded vector space. We later prove that GANEO2 has the same objective function as GANEO1 when negative sampling is applied to simplify the training process in GANEO2. Experiments with realworld network datasets demonstrate that our models constantly outperform stateoftheart solutions with significant improvements on precision in link prediction, as well as on visualizations and accuracy in clustering tasks. 
Generative Adversarial Network Game (GANG) 
Generative Adversarial Networks (GAN) have become one of the most successful frameworks for unsupervised generative modeling. As GANs are difficult to train much research has focused on this. However, very little of this research has directly exploited gametheoretic techniques. We introduce Generative Adversarial Network Games (GANGs), which explicitly model a finite zerosum game between a generator ($G$) and classifier ($C$) that use mixed strategies. The size of these games precludes exact solution methods, therefore we define resourcebounded best responses (RBBRs), and a resourcebounded Nash Equilibrium (RBNE) as a pair of mixed strategies such that neither $G$ or $C$ can find a better RBBR. The RBNE solution concept is richer than the notion of `local Nash equilibria’ in that it captures not only failures of escaping local optima of gradient descent, but applies to any approximate best response computations, including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RBNE. We compare our results to standard GAN setups, and demonstrate that our method deals well with typical GAN problems such as mode collapse, partial mode coverage and forgetting. 
Generative Adversarial Networks With DecoderEncoder Output Noise (DEGAN) 
In recent years, research on image generation methods has been developing fast. The autoencoding variational Bayes method (VAEs) was proposed in 2013, which uses variational inference to learn a latent space from the image database and then generates images using the decoder. The generative adversarial networks (GANs) came out as a promising framework, which uses adversarial training to improve the generative ability of the generator. However, the generated pictures by GANs are generally blurry. The deep convolutional generative adversarial networks (DCGANs) were then proposed to leverage the quality of generated images. Since the input noise vectors are randomly sampled from a Gaussian distribution, the generator has to map from a whole normal distribution to the images. This makes DCGANs unable to reflect the inherent structure of the training data. In this paper, we propose a novel deep model, called generative adversarial networks with decoderencoder output noise (DEGANs), which takes advantage of both the adversarial training and the variational Bayesain inference to improve the performance of image generation. DEGANs use a pretrained decoderencoder architecture to map the random Gaussian noise vectors to informative ones and pass them to the generator of the adversarial networks. Since the decoderencoder architecture is trained by the same images as the generators, the output vectors could carry the intrinsic distribution information of the original images. Moreover, the loss function of DEGANs is different from GANs and DCGANs. A hiddenspace loss function is added to the adversarial loss function to enhance the robustness of the model. Extensive empirical results show that DEGANs can accelerate the convergence of the adversarial training process and improve the quality of the generated images. 
Generative Adversarial Privacy (GAP) 
We present a datadriven framework called generative adversarial privacy (GAP). Inspired by recent advancements in generative adversarial networks (GANs), GAP allows the data holder to learn the privatization mechanism directly from the data. Under GAP, finding the optimal privacy mechanism is formulated as a constrained minimax game between a privatizer and an adversary. We show that for appropriately chosen adversarial loss functions, GAP provides privacy guarantees against strong informationtheoretic adversaries. We also evaluate the performance of GAP on multidimensional Gaussian mixture models and the GENKI face database. 
Generative Adversarial SelfImitation Learning (GASIL) 
This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial SelfImitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make longterm credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics. 
Generative Adversarial Tree Search (GATS) 
We propose Generative Adversarial Tree Search (GATS), a sampleefficient Deep Reinforcement Learning (DRL) algorithm. While Monte Carlo Tree Search (MCTS) is known to be effective for search and planning in RL, it is often sampleinefficient and therefore expensive to apply in practice. In this work, we develop a Generative Adversarial Network (GAN) architecture to model an environment’s dynamics and a predictor model for the reward function. We exploit collected data from interaction with the environment to learn these models, which we then use for modelbased planning. During planning, we deploy a finite depth MCTS, using the learned model for tree search and a learned Qvalue for the leaves, to find the best action. We theoretically show that GATS improves the biasvariance tradeoff in valuebased DRL. Moreover, we show that the generative model learns the model dynamics using orders of magnitude fewer samples than the Qlearner. In nonstationary settings where the environment model changes, we find the generative model adapts significantly faster than the Qlearner to the new environment. 
Generative Artificial Intelligence  Generative artificial intelligence refers to programs that make it possible for machines to use things like text, audio files and images to create content. 
Generative Autotransporter (GAT) 
In this paper, we aim to introduce the classic Optimal Transport theory to enhance deep generative probabilistic modeling. For this purpose, we design a Generative Autotransporter (GAT) model with explicit distribution optimal transport. Particularly, the GAT model owns a deep distribution transporter to transfer the target distribution to a specific prior probability distribution, which enables a regular decoder to generate target samples from the input data that follows the transported prior distribution. With such a design, the GAT model can be stably trained to generate novel data by merely using a very simple $l_1$ reconstruction loss function with a generalized manifoldbased Adam training algorithm. The experiments on two standard benchmarks demonstrate its strong generation ability. 
Generative Determinantal Point Process (GDPP) 
Generative models have proven to be an outstanding tool for representing highdimensional probability distributions and generating realistic looking images. A fundamental characteristic of generative models is their ability to produce multimodal outputs. However, while training, they are often susceptible to mode collapse, which means that the model is limited in mapping the input noise to only a few modes of the true data distribution. In this paper, we draw inspiration from Determinantal Point Process (DPP) to devise a generative model that alleviates mode collapse while producing higher quality samples. DPP is an elegant probabilistic measure used to model negative correlations within a subset and hence quantify its diversity. We use DPP kernel to model the diversity in real data as well as in synthetic data. Then, we devise a generation penalty term that encourages the generator to synthesize data with a similar diversity to real data. In contrast to previous stateoftheart generative models that tend to use additional trainable parameters or complex training paradigms, our method does not change the original training scheme. Embedded in an adversarial training and variational autoencoder, our Generative Determinantal Point Process approach shows a consistent resistance to modecollapse on a widevariety of synthetic data and natural image datasets including MNIST, CIFAR10, and CelebA, while outperforming stateoftheart methods for dataefficiency, convergencetime, and generation quality. Our code is publicly available. 
Generative Ensemble  Deep generative models are capable of learning probability distributions over large, highdimensional datasets such as images, video and natural language. Generative models trained on samples from $p(x)$ ought to assign low likelihoods to outofdistribution (OoD) samples from $q(x)$, making them suitable for anomaly detection applications. We show that in practice, likelihood models are themselves susceptible to OoD errors, and even assign large likelihoods to images from other natural datasets. To mitigate these issues, we propose Generative Ensembles, a modelindependent technique for OoD detection that combines densitybased anomaly detection with uncertainty estimation. Our method outperforms ODIN and VIB baselines on image datasets, and achieves comparable performance to a classification model on the Kaggle Credit Fraud dataset. 
Generative Exploration and Exploitation (GENE) 
Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we propose a novel method called Generative Exploration and Exploitation (GENE) to overcome sparse reward. GENE dynamically changes the start state of agent to the generated novel state to encourage the agent to explore the environment or to the generated rewarding state to boost the agent to exploit the received reward signal. GENE relies on no prior knowledge about the environment and can be combined with any RL algorithm, no matter onpolicy or offpolicy, singleagent or multiagent. Empirically, we demonstrate that GENE significantly outperforms existing methods in four challenging tasks with only binary rewards indicating whether or not the task is completed, including Maze, Goal Ant, Pushing, and Cooperative Navigation. The ablation studies verify that GENE can adaptively tradeoff between exploration and exploitation as the learning progresses by automatically adjusting the proportion between generated novel states and rewarding states, which is the key for GENE to solving these challenging tasks effectively and efficiently. 
Generative Information Lower BOund (GILBO) 
We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). It offers a data independent measure of the complexity of the learned latent variable description, giving the log of the effective description length. It is welldefined for both VAEs and GANs. We compute the GILBO for 800 GANs and VAEs trained on MNIST and discuss the results. 
Generative Learning Algorithms  Algorithms that try to learn p(yx) directly (such as logistic regression), or algorithms that try to learn mappings directly from the space of inputs X to the labels {0, 1}, (such as the perceptron algorithm) are called discrim inative learning algorithms. Here, we’ll talk about algorithms that instead try to model p(xy) (and p(y)). These algorithms are called generative learning algorithms. For instance, if y indicates whether an example is a dog (0) or an elephant (1), then p(xy = 0) models the distribution of dogs’ features, and p(xy = 1) models the distribution of elephants’ features. Naive Bayes Generative Learning Algorithms 
Generative Markov Network (GMN) 
The assumption that data samples are independently identically distributed is the backbone of many learning algorithms. Nevertheless, datasets often exhibit rich structures in practice, and we argue that there exist some unknown orders within the data instances. Aiming to find such orders, we introduce a novel Generative Markov Network (GMN) which we use to extract the order of data instances automatically. Specifically, we assume that the instances are sampled from a Markov chain. Our goal is to learn the transitional operator of the chain as well as the generation order by maximizing the generation probability under all possible data permutations. One of our key ideas is to use neural networks as a soft lookup table for approximating the possibly huge, but discrete transition matrix. This strategy allows us to amortize the space complexity with a single model and make the transitional operator generalizable to unseen instances. To ensure the learned Markov chain is ergodic, we propose a greedy batchwise permutation scheme that allows fast training. Empirically, we evaluate the learned Markov chain by showing that GMNs are able to discover orders among data instances and also perform comparably well to stateoftheart methods on the oneshot recognition benchmark task. 
Generative Mixture of Networks  A generative model based on training deep architectures is proposed. The model consists of K networks that are trained together to learn the underlying distribution of a given data set. The process starts with dividing the input data into K clusters and feeding each of them into a separate network. After few iterations of training networks separately, we use an EMlike algorithm to train the networks together and update the clusters of the data. We call this model Mixture of Networks. The provided model is a platform that can be used for any deep structure and be trained by any conventional objective function for distribution modeling. As the components of the model are neural networks, it has high capability in characterizing complicated data distributions as well as clustering data. We apply the algorithm on MNIST handwritten digits and Yale face datasets. We also demonstrate the clustering ability of the model using some realworld and toy examples. 
Generative Model  In probability and statistics, a generative model is a model for randomly generating observable data, typically given some hidden parameters. It specifies a joint probability distribution over observation and label sequences. Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. A conditional distribution can be formed from a generative model through Bayes’ rule. Shannon (1948) gives an example in which a table of frequencies of English word pairs is used to generate a sentence beginning with “representing and speedily is an good”; which is not proper English but which will increasingly approximate it as the table is moved from word pairs to word triplets etc. 
Generative Moment Matching Network (GMMN) 
Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a twosample test based on kernel maximum mean discrepancy (MMD). 
Generative Moment Matching Network – Generative Adversarial Network (MMDGAN) 
Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a twosample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMDGAN. The new distance measure in MMDGAN is a meaningful loss that enjoys the advantage of weak topology and can be optimized via gradient descent with relatively small batch sizes. In our evaluation on multiple benchmark datasets, including MNIST, CIFAR 10, CelebA and LSUN, the performance of MMDGAN significantly outperforms GMMN, and is competitive with other representative GAN works. 
Generative Neural Machine Translation (GNMT) 
We introduce Generative Neural Machine Translation (GNMT), a latent variable architecture which is designed to model the semantics of the source and target sentences. We modify an encoderdecoder translation model by adding a latent variable as a language agnostic representation which is encouraged to learn the meaning of the sentence. GNMT achieves competitive BLEU scores on pure translation tasks, and is superior when there are missing words in the source sentence. We augment the model to facilitate multilingual translation and semisupervised learning without adding parameters. This framework significantly reduces overfitting when there is limited paired data available, and is effective for translating between pairs of languages not seen during training. 
Generative OneShot Learning (GOL) 
Highly Autonomous Driving (HAD) systems rely on deep neural networks for the visual perception of the driving environment. Such networks are trained on large manually annotated databases. In this work, a semiparametric approach to oneshot learning is proposed, with the aim of bypassing the manual annotation step required for training perceptions systems used in autonomous driving. The proposed generative framework, coined Generative OneShot Learning (GOL), takes as input single oneshot objects, or generic patterns, and a small set of socalled regularization samples used to drive the generative process. New synthetic data is generated as Pareto optimal solutions from oneshot objects using a set of generalization functions built into a generalization generator. GOL has been evaluated on environment perception challenges encountered in autonomous vision. 
Generative Predecessor Models for Imitation Learning (GPRIL) 
We propose Generative Predecessor Models for Imitation Learning (GPRIL), a novel imitation learning algorithm that matches the stateaction distribution to the distribution observed in expert demonstrations, using generative models to reason probabilistically about alternative histories of demonstrated states. We show that this approach allows an agent to learn robust policies using only a small number of expert demonstrations and selfsupervised interactions with the environment. We derive this approach from first principles and compare it empirically to a stateoftheart imitation learning method, showing that it outperforms or matches its performance on two simulated robot manipulation tasks and demonstrate significantly higher sample efficiency by applying the algorithm on a real robot. 
Generative Reversible Network  ➘ “Reversible Neural Network” 
Generative Tensor Network Classification (GTNC) 
Tensor network (TN) has recently triggered extensive interests in developing machinelearning models in quantum manybody Hilbert space. Here we purpose a generative TN classification (GTNC) approach for supervised learning. The strategy is to train the generative TN for each class of the samples to construct the classifiers. The classification is implemented by comparing the distance in the manybody Hilbert space. The numerical experiments by GTNC show impressive performance on the MNIST and FashionMNIST dataset. The testing accuracy is competitive to the stateoftheart convolutional neural network while higher than the naive Bayes classifier (a generative classifier) and support vector machine. Moreover, GTNC is more efficient than the existing TN models that are in general discriminative. By investigating the distances in the manybody Hilbert space, we find that (a) the samples are naturally clustering in such a space; and (b) bounding the bond dimensions of the TN’s to finite values corresponds to removing redundant information in the image recognition. These two characters make GTNC an adaptive and universal model of excellent performance. 
Generative Topic Embedding  Word embedding maps words into a lowdimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a lowdimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative topic embedding model to combine the two types of patterns. In our model, topics are represented by embedding vectors, and are shared across documents. The probability of each word is influenced by both its local context and its topic. A variational inference method yields the topic embeddings as well as the topic mixing proportions for each document. Jointly they represent the document in a lowdimensional continuous space. In two d 
Generative Topographic Map (GTM) 
Generative topographic map (GTM) is a machine learning method that is a probabilistic counterpart of the selforganizing map (SOM), is provably convergent and does not require a shrinking neighborhood or a decreasing step size. It is a generative model: the data is assumed to arise by first probabilistically picking a point in a lowdimensional space, mapping the point to the observed highdimensional input space (via a smooth function), then adding noise in that space. The parameters of the lowdimensional probability distribution, the smooth map and the noise are all learned from the training data using the expectationmaximization (EM) algorithm. GTM was introduced in 1996 in a paper by Christopher M. Bishop, Markus Svensen, and Christopher K. I. Williams. 
Generator and Responsibility Predictor (GRP) 
Learning from complex demonstrations is challenging, especially when the demonstration consists of different strategies. A popular approach is to use a deep neural network to perform imitation learning. However, the structure of that deep neural network has to be “deep’ enough to capture all possible scenarios. Besides the machine learning issue, how humans learn in the sense of physiology has rarely been addressed and relevant works on spinal cord learning are rarer. In this work, we develop a novel modular learning architecture, the Generator and Responsibility Predictor (GRP) model, which automatically learns the subtask policies from an unsegmented controller demonstration and learns to switch between the policies. We also introduce a more physiological based neural network architecture. We implemented our GRP model and our proposed neural network to form a model the transfers the swing leg control from the brain to the spinal cord. Our result suggests that by using the GRP model the brain can successfully transfer the target swing leg control to the spinal cord and the resulting model can switch between subcontrol policies automatically. 
Generic Diffusion Process (genericDP) 
Image restoration problems are typical illposed problems where the regularization term plays an important role. The regularization term learned via generative approaches is easy to transfer to various image restoration, but offers inferior restoration quality compared with that learned via discriminative approaches. On the contrary, the regularization term learned via discriminative approaches are usually trained for a specific image restoration problem, and fail in the problem for which it is not trained. To address this issue, we propose a generic diffusion process (genericDP) to handle multiple Gaussian denoising problems based on the Trainable Nonlinear Reaction Diffusion (TNRD) models. Instead of one model, which consists of a diffusion and a reaction term, for one Gaussian denoising problem in TNRD, we enforce multiple TNRD models to share one diffusion term. The trained genericDP model can provide both promising denoising performance and high training efficiency compared with the original TNRD models. We also transfer the trained diffusion term to nonblind deconvolution which is unseen in the training phase. Experiment results show that the trained diffusion term for multiple Gaussian denoising can be transferred to image nonblind deconvolution as an image prior and provide competitive performance. 
Generic Holdout  Adaptive data analysis has posed a challenge to science due to its ability to generate false hypotheses on moderately large data sets. In general, with nonadaptive data analyses (where queries to the data are generated without being influenced by answers to previous queries) a data set containing $n$ samples may support exponentially many queries in $n$. This number reduces to linearly many under naive adaptive data analysis, and even sophisticated remedies such as the Reusable Holdout (Dwork et. al 2015) only allow quadratically many queries in $n$. In this work, we propose a new framework for adaptive science which exponentially improves on this number of queries under a restricted yet scientifically relevant setting, where the goal of the scientist is to find a single (or a few) true hypotheses about the universe based on the samples. Such a setting may describe the search for predictive factors of some disease based on medical data, where the analyst may wish to try a number of predictive models until a satisfactory one is found. Our solution, the Generic Holdout, involves two simple ingredients: (1) a partitioning of the data into a exploration set and a holdout set and (2) a limited exposure strategy for the holdout set. An analyst is free to use the exploration set arbitrarily, but when testing hypotheses against the holdout set, the analyst only learns the answer to the question: ‘Is the given hypothesis true (empirically) on the holdout set?’ — and no more information, such as ‘how well’ the hypothesis fits the holdout set. The resulting scheme is immediate to analyze, but despite its simplicity we do not believe our method is obvious, as evidenced by the many violations in practice. Our proposal can be seen as an alternative to preregistration, and allows researchers to get the benefits of adaptive data analysis without the problems of adaptivity. 
GENESYS  Modern deep learning systems rely on (a) a handtuned neural network topology, (b) massive amounts of labeled training data, and (c) extensive training over largescale compute resources to build a system that can perform efficient image classification or speech recognition. Unfortunately, we are still far away from implementing adaptive general purpose intelligent systems which would need to learn autonomously in unknown environments and may not have access to some or any of these three components. Reinforcement learning and evolutionary algorithm (EA) based methods circumvent this problem by continuously interacting with the environment and updating the models based on obtained rewards. However, deploying these algorithms on ubiquitous autonomous agents at the edge (robots/drones) demands extremely high energyefficiency due to (i) tight power and energy budgets, (ii) continuous/lifelong interaction with the environment, (iii) intermittent or no connectivity to the cloud to run heavyweight processing. To address this need, we present GENESYS, an HWSW prototype of an EAbased learning system, that comprises a closed loop learning engine called EvE and an inference engine called ADAM. EvE can evolve the topology and weights of neural networks completely in hardware for the task at hand, without requiring handoptimization or backpropagation training. ADAM continuously interacts with the environment and is optimized for efficiently running the irregular neural networks generated by EvE. GENESYS identifies and leverages multiple unique avenues of parallelism unique to EAs that we term ‘gene’ level parallelism, and ‘population’level parallelism. We ran GENESYS with a suite of environments from OpenAI gym and observed 25 orders of magnitude higher energyefficiency over stateoftheart embedded and desktop CPU and GPU systems. 
Genetic Algorithm (GA) 
In the computer science field of artificial intelligence, a genetic algorithm (GA) is a search heuristic that mimics the process of natural selection. This heuristic (also sometimes called a metaheuristic) is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover. Genetic algorithms find application in bioinformatics, phylogenetics, computational science, engineering, economics, chemistry, manufacturing, mathematics, physics, pharmacometrics and other fields. 
Genetic Evolution Network (GEN) 
In this paper, we introduce an alternative approach, namely GEN (Genetic Evolution Network) Model, to the deep learning models. Instead of building one single deep model, GEN adopts a geneticevolutionary learning strategy to build a group of unit models generations by generations. Significantly different from the wellknown representation learning models with extremely deep structures, the unit models covered in GEN are of a much shallower architecture. In the training process, from each generation, a subset of unit models will be selected based on their performance to evolve and generate the child models in the next generation. GEN has significant advantages compared with existing deep representation learning models in terms of both learning effectiveness, efficiency and interpretability of the learning process and learned results. Extensive experiments have been done on diverse benchmark datasets, and the experimental results have demonstrated the outstanding performance of GEN compared with the stateoftheart baseline methods in both effectiveness of efficiency. 
Genetic Programming for Reinforcement Learning (GPRL) 
The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on modelbased batch reinforcement learning and genetic programming, which autonomously learns policy equations from preexisting default stateaction trajectory samples. GPRL is compared to a straightforward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing wellperforming, but noninterpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cartpole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing wellperforming interpretable reinforcement learning policies from preexisting default trajectory data. 
Genetic Programming Relevance Vector Machine (GPRVM) 
This paper proposes a hybrid basis function construction method (GPRVM) for Symbolic Regression problem, which combines an extended version of Genetic Programming called Kaizen Programming and Relevance Vector Machine to evolve an optimal set of basis functions. Different from traditional evolutionary algorithms where a single individual is a complete solution, our method proposes a solution based on linear combination of basis functions built from individuals during the evolving process. RVM which is a sparse Bayesian kernel method selects suitable functions to constitute the basis. RVM determines the posterior weight of a function by evaluating its quality and sparsity. The solution produced by GPRVM is a sparse Bayesian linear model of the coefficients of many nonlinear functions. Our hybrid approach is focused on nonlinear whitebox models selecting the right combination of functions to build robust predictions without prior knowledge about data. Experimental results show that GPRVM outperforms conventional methods, which suggest that it is an efficient and accurate technique for solving SR. The computational complexity of GPRVM scales in $O( M^{3})$, where $M$ is the number of functions in the basis set and is typically much smaller than the number $N$ of training patterns. 
GeneticEvolutionary Adam (GADAM) 
Deep neural network learning can be formulated as a nonconvex optimization problem. Existing optimization algorithms, e.g., Adam, can learn the models fast, but may get stuck in local optima easily. In this paper, we introduce a novel optimization algorithm, namely GADAM (GeneticEvolutionary Adam). GADAM learns deep neural network models based on a number of unit models generations by generations: it trains the unit models with Adam, and evolves them to the new generations with genetic algorithm. We will show that GADAM can effectively jump out of the local optima in the learning process to obtain better solutions, and prove that GADAM can also achieve a very fast convergence. Extensive experiments have been done on various benchmark datasets, and the learning results will demonstrate the effectiveness and efficiency of the GADAM algorithm. 
GENI  How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multirelational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNNbased method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicateaware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in realworld KGs with different characteristics, GENI achieves 517% higher NDCG@100 than the state of the art. 
Genie  To understand diverse natural language commands, virtual assistants today are trained with numerous laborintensive, manually annotated sentences. This paper presents a methodology and the Genie toolkit that can handle new compound commands with significantly less manual effort. We advocate formalizing the capability of virtual assistants with a Virtual Assistant Programming Language (VAPL) and using a neural semantic parser to translate natural language into VAPL code. Genie needs only a small realistic set of input sentences for validating the neural model. Developers write templates to synthesize data; Genie uses crowdsourced paraphrases and data augmentation, along with the synthesized data, to train a semantic parser. We also propose design principles that make VAPL languages amenable to natural language translation. We apply these principles to revise ThingTalk, the language used by the Almond virtual assistant. We use Genie to build the first semantic parser that can support compound virtual assistants commands with unquoted freeform parameters. Genie achieves a 62% accuracy on realistic user inputs. We demonstrate Genie’s generality by showing a 19% and 31% improvement over the previous state of the art on a music skill, aggregate functions, and access control. 
GenOja  In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting. We propose a simple and efficient algorithm, GenOja, for these problems. We prove the global convergence of our algorithm, borrowing ideas from the theory of fastmixing Markov chains and twotimescale stochastic approximation, showing that it achieves the optimal rate of convergence. In the process, we develop tools for understanding stochastic processes with Markovian noise which might be of independent interest. 
GenVariScan  Motivation: Advances in nextgeneration sequencing (NGS) methods have enabled researchers and agencies to collect a wide variety of sequencing data across multiple platforms. The motivation behind such an exercise is to analyze these datasets jointly, in order to gain insights into disease prognosis, treatment, and cure. Clustering of such datasets, can provide much needed insight into biological associations. However, the differing scale, and the heterogeneity of the mixed dataset is hurdle for such analyses. Results: The paper proposes a nonparameteric Bayesian approach called GenVariScan for biclustering of highdimensional mixed data. Generalized Linear Models (GLM), and latent variable approaches are utilized to integrate mixed dataset. Sparsity inducing property of Poisson Dirichlet Process (PDP) is used to identify a lower dimensional structure of mixed covariates. We apply our method to Glioblastoma Multiforme (GBM) cancer dataset. We show that cluster detection is aposteriori consistent, as number of covariates and subject grows. As a byproduct, we derive a working value approach to perform beta regression. 
Geographic Information Systems (GIS) 
A geographic information system (GIS) is a computer system designed to capture, store, manipulate, analyze, manage, and present all types of geographical data. The acronym GIS is sometimes used for geographical information science or geospatial information studies to refer to the academic discipline or career of working with geographic information systems and is a large domain within the broader academic discipline of Geoinformatics. 
Geographic Resources Analysis Support System (GRASS) 
GRASS GIS, commonly referred to as GRASS (Geographic Resources Analysis Support System), is a free and open source Geographic Information System (GIS) software suite used for geospatial data management and analysis, image processing, graphics and maps production, spatial modeling, and visualization. GRASS GIS is currently used in academic and commercial settings around the world, as well as by many governmental agencies and environmental consulting companies. It is a founding member of the Open Source Geospatial Foundation (OSGeo). rgrass7 
GeoJSON  GeoJSON is a format for encoding a variety of geographic data structures. A GeoJSON object may represent a geometry, a feature, or a collection of features. GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Features in GeoJSON contain a geometry object and additional properties, and a feature collection represents a list of features. A complete GeoJSON data structure is always an object (in JSON terms). In GeoJSON, an object consists of a collection of name/value pairs — also called members. For each member, the name is always a string. Member values are either a string, number, object, array or one of the literals: true, false, and null. An array consists of elements where each element is a value as described above. 
Geometric Dirichlet Mean  We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the optimization of a geometric loss function, which is a surrogate to the LDA’s likelihood. Our method involves a fast optimization based weighted clustering procedure augmented with geometric corrections, which overcomes the computational and statistical inefficiencies encountered by other techniques based on Gibbs sampling and variational inference, while achieving the accuracy comparable to that of a Gibbs sampler. The topic estimates produced by our method are shown to be statistically consistent under some conditions. The algorithm is evaluated with extensive experiments on simulated and real data. 
Geometric Enclosing Network (GEN) 
Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometrybased optimization approach to address this problem. Orthogonal to current stateoftheart densitybased approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G\left(\bz\right) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easytocontrol optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and realworld datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multimodal data and quality of generated data. 
Geometric Generalization Based ZeroShot Learning Test  Raven’s Progressive Matrices are one of the widely used tests in evaluating the human test taker’s fluid intelligence. Analogously, this paper introduces geometric generalization based zeroshot learning tests to measure the rapid learning ability and the internal consistency of deep generative models. Our empirical research analysis on stateoftheart generative models discern their ability to generalize concepts across classes. In the process, we introduce Infinit World, an evaluable, scalable, multimodal, lightweight dataset and ZeroShot Intelligence Metric ZSI. The proposed tests condenses humanlevel spatial and numerical reasoning tasks to its simplistic geometric forms. The dataset is scalable to a theoretical limit of infinity, in numerical features of the generated geometric figures, image size and in quantity. We systematically analyze stateoftheart model’s internal consistency, identify their bottlenecks and propose a proactive optimization method for fewshot and zeroshot learning. 
Geometric Generative Adversarial Nets (Geometric GAN) 
Generative Adversarial Nets (GANs) represent an important milestone for effective generative models, which has inspired numerous variants seemingly different from each other. One of the main contributions of this paper is to reveal a unified geometric structure in GAN and its variants. Specifically, we show that the adversarial generative model training can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. This geometric intuition reveals the limitations of the existing approaches and leads us to propose a new formulation called geometric GAN using SVM separating hyperplane that maximizes the margin. Our theoretical analysis shows that the geometric GAN converges to a Nash equilibrium between the discriminator and generator. In addition, extensive numerical results show that the superior performance of geometric GAN. 
Geometric Illustration of Neural Networks (GINN) 
This informal technical report details the geometric illustration of decision boundaries for ReLU units in a three layer fully connected neural network. The network is designed and trained to predict pixel intensity from an (x, y) input location. The Geometric Illustration of Neural Networks (GINN) tool was built to visualise and track the points at which ReLU units switch from being active to off (or vice versa) as the network undergoes training. Several phenomenon were observed and are discussed herein. This technical report is a supporting document to the blog post with online demos and is available at http://…/. 
Geometric Mean Metric Learning  We revisit the task of learning a Euclidean metric from data. We approach this problem from first principles and formulate it as a surprisingly simple optimization problem. Indeed, our formulation even admits a closed form solution. This solution possesses several very attractive properties: (i) an innate geometric appeal through the Riemannian geometry of positive definite matrices; (ii) ease of interpretability; and (iii) computational speed several orders of magnitude faster than the widely used LMNN and ITML methods. Furthermore, on standard benchmark datasets, our closedform solution consistently attains higher classification accuracy. 
Geometric Mutual Information (GMI) 
This paper proposes a geometric estimator of dependency between a pair of multivariate samples. The proposed estimator of dependency is based on a randomly permuted geometric graph (the minimal spanning tree) over the two multivariate samples. This estimator converges to a quantity that we call the geometric mutual information (GMI), which is equivalent to the HenzePenrose divergence [1] between the joint distribution of the multivariate samples and the product of the marginals. The GMI has many of the same properties as standard MI but can be estimated from empirical data without density estimation; making it scalable to large datasets. The proposed empirical estimator of GMI is simple to implement, involving the construction of an MST spanning over both the original data and a randomly permuted version of this data. We establish asymptotic convergence of the estimator and convergence rates of the bias and variance for smooth multivariate density functions belonging to a H\'{o}lder class. We demonstrate the advantages of our proposed geometric dependency estimator in a series of experiments. 
Geometric Operator Convolutional Neural Network (GOCNN) 
The Convolutional Neural Network (CNN) has been successfully applied in many fields during recent decades; however it lacks the ability to utilize prior domain knowledge when dealing with many realistic problems. We present a framework called Geometric Operator Convolutional Neural Network (GOCNN) that uses domain knowledge, wherein the kernel of the first convolutional layer is replaced with a kernel generated by a geometric operator function. This framework integrates many conventional geometric operators, which allows it to adapt to a diverse range of problems. Under certain conditions, we theoretically analyze the convergence and the bound of the generalization errors between GOCNNs and common CNNs. Although the geometric operator convolution kernels have fewer trainable parameters than common convolution kernels, the experimental results indicate that GOCNN performs more accurately than common CNN on CIFAR10/100. Furthermore, GOCNN reduces dependence on the amount of training examples and enhances adversarial stability. In the practical task of medically diagnosing bone fractures, GOCNN obtains 3% improvement in terms of the recall. 
Geometric Program (GP) 
A geometric program (GP) is a type of mathematical optimization problem characterized by objective and constraint functions that have a special form. Recently developed solution methods can solve even largescale GPs extremely efficiently and reliably; at the same time a number of practical problems, particularly in circuit design, have been found to be equivalent to (or well approximated by) GPs. Putting these two together, we get effective solutions for the practical problems. The basic approach in GP modeling is to attempt to express a practical problem, such as an engineering analysis or design problem, in GP format. In the best case, this formulation is exact; when this is not possible, we settle for an approximate formulation. 
Geometric Semantic Genetic Programming (GSGP) 
In iterative supervised learning algorithms it is common to reach a point in the search where no further induction seems to be possible with the available data. If the search is continued beyond this point, the risk of overfitting increases significantly. Following the recent developments in inductive semantic stochastic methods, this paper studies the feasibility of using information gathered from the semantic neighborhood to decide when to stop the search. Two semantic stopping criteria are proposed and experimentally assessed in Geometric Semantic Genetic Programming (GSGP) and in the Semantic Learning Machine (SLM) algorithm (the equivalent algorithm for neural networks). The experiments are performed on realworld highdimensional regression datasets. The results show that the proposed semantic stopping criteria are able to detect stopping points that result in a competitive generalization for both GSGP and SLM. This approach also yields computationally efficient algorithms as it allows the evolution of neural networks in less than 3 seconds on average, and of GP trees in at most 10 seconds. The usage of the proposed semantic stopping criteria in conjunction with the computation of optimal mutation/learning steps also results in small trees and neural networks. 
Geometrically Designed Spline Regression  Geometrically Designed Spline (‘GeDS’) Regression is a nonparametric geometrically motivated method for fitting variable knots spline predictor models in one or two independent variables, in the context of generalized (non)linear models. ‘GeDS’ estimates the number and position of the knots and the order of the spline, assuming the response variable has a distribution from the exponential family. A description of the method can be found in Kaishev et al. (2016) <doi:10.1007/s0018001506217> and Dimitrova et al. (2017) <https://…/18460>. GeDS 
Geometry Score  One of the biggest challenges in the research of generative adversarial networks (GANs) is assessing the quality of generated samples and detecting various levels of mode collapse. In this work, we construct a novel measure of performance of a GAN by comparing geometrical properties of the underlying data manifold and the generated one, which provides both qualitative and quantitative means for evaluation. Our algorithm can be applied to datasets of an arbitrary nature and is not limited to visual data. We test the obtained metric on various reallife models and datasets and demonstrate that our method provides new insights into properties of GANs. 
GeometryAware Generative Adversarial Network (GAGAN) 
Deep generative models learned through adversarial training have become increasingly popular for their ability to generate naturalistic image textures. However, apart from the visual texture, the visual appearance of objects is significantly affected by their shape geometry, information which is not taken into account by existing generative models. This paper introduces the GeometryAware Generative Adversarial Network (GAGAN) for incorporating geometric information into the image generation process. Specifically, in GAGAN the generator samples latent variables from the probability space of a statistical shape model. By mapping the output of the generator to a canonical coordinate frame through a differentiable geometric transformation, we enforce the geometry of the objects and add an implicit connection from the prior to the generated object. Experimental results on face generation indicate that the GAGAN can generate realistic images of faces with arbitrary facial attributes such as facial expression, pose, and morphology, that are of better quality compared to current GANbased methods. Finally, our method can be easily incorporated into and improve the quality of the images generated by any existing GAN architecture. 
GeoSay  Automatic extraction of buildings in remote sensing images is an important but challenging task and finds many applications in different fields such as urban planning, navigation and so on. This paper addresses the problem of buildings extraction in very highspatialresolution (VHSR) remote sensing (RS) images, whose spatial resolution is often up to half meters and provides rich information about buildings. Based on the observation that buildings in VHSRRS images are always more distinguishable in geometry than in texture or spectral domain, this paper proposes a geometric building index (GBI) for accurate building extraction, by computing the geometric saliency from VHSRRS images. More precisely, given an image, the geometric saliency is derived from a midlevel geometric representations based on meaningful junctions that can locally describe geometrical structures of images. The resulting GBI is finally measured by integrating the derived geometric saliency of buildings. Experiments on three public and commonly used datasets demonstrate that the proposed GBI achieves the stateoftheart performance and shows impressive generalization capability. Additionally, GBI preserves both the exact position and accurate shape of single buildings compared to existing methods. 
Gephi  Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs. Runs on Windows, Linux and Mac OS X. Gephi is opensource and free. 
GEPPG  In continuous action domains, standard deep reinforcement learning algorithms like DDPG suffer from inefficient exploration when facing sparse or deceptive reward problems. Conversely, evolutionary and developmental methods focusing on exploration like novelty search, qualitydiversity or goal exploration processes are less sample efficient during exploitation. In this paper, we present the GEPPG approach, taking the best of both worlds by sequentially combining two variants of a goal exploration process and two variants of DDPG. We study the learning performance of these components and their combination on a low dimensional deceptive reward problem and on the larger HalfCheetah benchmark. Among other things, we show that DDPG fails on the former and that GEPPG obtains performance above the stateoftheart on the latter. 
GETNET  Change detection (CD) is an important application of remote sensing, which provides timely change information about largescale Earth surface. With the emergence of hyperspectral imagery, CD technology has been greatly promoted, as hyperspectral data with the highspectral resolution are capable of detecting finer changes than using the traditional multispectral imagery. Nevertheless, the high dimension of hyperspectral data makes it difficult to implement traditional CD algorithms. Besides, endmember abundance information at subpixel level is often not fully utilized. In order to better handle high dimension problem and explore abundance information, this paper presents a General Endtoend Twodimensional CNN (GETNET) framework for hyperspectral image change detection (HSICD). The main contributions of this work are threefold: 1) Mixedaffinity matrix that integrates subpixel representation is introduced to mine more crosschannel gradient features and fuse multisource information; 2) 2D CNN is designed to learn the discriminative features effectively from multisource data at a higher level and enhance the generalization ability of the proposed CD algorithm; 3) A new HSICD data set is designed for the objective comparison of different methods. Experimental results on real hyperspectral data sets demonstrate the proposed method outperforms most of the stateofthearts. 
GGQID3  Usually, decision tree induction algorithms are limited to work with non relational data. Given a record, they do not take into account other objects attributes even though they can provide valuable information for the learning task. In this paper we present GGQID3, a multirelational decision tree learning algorithm that uses Generalized Graph Queries (GGQ) as predicates in the decision nodes. GGQs allow to express complex patterns (including cycles) and they can be refined stepbystep. Also, they can evaluate structures (not only single records) and perform Regular Pattern Matching. GGQ are built dynamically (pattern mining) during the GGQID3 tree construction process. We will show how to use GGQID3 to perform multirelational machine learning keeping complexity under control. Finally, some real examples of automatically obtained classification trees and semantic patterns are shown. —– Normalmente, los algoritmos de inducci\’on de \’arboles de decisi\’on trabajan con datos no relacionales. Dado un registro, no tienen en cuenta los atributos de otros objetos a pesar de que \’estos pueden proporcionar informaci\’on \’util para la tarea de aprendizaje. En este art\’iculo presentamos GGQID3, un algoritmo de aprendizaje de \’arboles de decisiones multirelacional que utiliza Generalized Graph Queries (GGQ) como predicados en los nodos de decisi\’on. Los GGQs permiten expresar patrones complejos (incluyendo ciclos) y pueden ser refinados paso a paso. Adem\’as, pueden evaluar estructuras (no solo registros) y llevar a cabo Regular Pattern Matching. En GGQID3, los GGQ son construidos din\’amicamente (pattern mining) durante el proceso de construcci\’on del \’arbol. Adem\’as, se muestran algunos ejemplos reales de \’arboles de clasificaci\’on multirelacionales y patrones sem\’anticos obtenidos autom\’aticamente. 
GGT  Adaptive regularization methods come in diagonal and fullmatrix variants. However, only the former have enjoyed widespread adoption in training largescale deep models. This is due to the computational overhead of manipulating a full matrix in high dimension. In this paper, we show how to make fullmatrix adaptive regularization practical and useful. We present GGT, a truly scalable fullmatrix adaptive optimizer. At the heart of our algorithm is an efficient method for computing the inverse square root of a lowrank matrix. We show that GGT converges to firstorder local minima, providing the first rigorous theoretical analysis of adaptive regularization in nonconvex optimization. In preliminary experiments, GGT trains faster across a variety of synthetic tasks and standard deep learning benchmarks. 
GhostLink  Social influence plays a vital role in shaping a user’s behavior in online communities dealing with items of fine taste like movies, food, and beer. For online recommendation, this implies that users’ preferences and ratings are influenced due to other individuals. Given only timestamped reviews of users, can we find out whoinfluenceswhom, and characteristics of the underlying influence network? Can we use this network to improve recommendation? While prior works in socialaware recommendation have leveraged social interaction by considering the observed social network of users, many communities like Amazon, Beeradvocate, and Ratebeer do not have explicit useruser links. Therefore, we propose GhostLink, an unsupervised probabilistic graphical model, to automatically learn the latent influence network underlying a review community — given only the temporal traces (timestamps) of users’ posts and their content. Based on extensive experiments with four realworld datasets with 13 million reviews, we show that GhostLink improves item recommendation by around 23% over stateoftheart methods that do not consider this influence. As additional usecases, we show that GhostLink can be used to differentiate between users’ latent preferences and influenced ones, as well as to detect influential users based on the learned influence graph. 
GHZ Test  Zeroknowledge and multiprover systems are both central notions in classical and quantum complexity theory. There is, however, little research in quantum multiprover zeroknowledge systems. This paper studies complexitytheoretical aspects of the quantum multiprover zeroknowledge systems. This paper has two results: 1.QMIP* systems with honest zeroknowledge can be converted into general zeroknowledge systems without any assumptions. 2.QMIP* has computational quantum zeroknowledge systems if a natural computational conjecture holds. One of the main tools is a test (called the GHZ test) that uses GHZ states shared by the provers, which prevents the verifier’s attack in the above two results. Another main tool is what we call the Local Hamiltonian based Interactive protocol (LHI protocol). The LHI protocol makes previous research for Local Hamiltonians applicable to check the history state of interactive proofs, and we then apply Broadbent et al.’s zeroknowledge protocol for QMA \cite{BJSW} to quantum multiprover systems in order to obtain the second result. 
Gibbs Sampling  In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution (i.e. from the joint probability distribution of two or more random variables), when direct sampling is difficult. This sequence can be used to approximate the joint distribution (e.g., to generate a histogram of the distribution); to approximate the marginal distribution of one of the variables, or some subset of the variables (for example, the unknown parameters or latent variables); or to compute an integral (such as the expected value of one of the variables). Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled. 
GibbsKlein Algorithm (GK) 
Sampling from lattice Gaussian distribution has emerged as an important problem in coding, decoding and cryptography. In this paper, the classic Gibbs algorithm from Markov chain Monte Carlo (MCMC) methods is demonstrated to be geometrically ergodic for lattice Gaussian sampling, which means the Markov chain arising from it converges exponentially fast to the stationary distribution. Meanwhile, the exponential convergence rate of Markov chain is also derived through the spectral radius of forward operator. Then, a comprehensive analysis regarding to the convergence rate is carried out and two sampling schemes are proposed to further enhance the convergence performance. The first one, referred to as MetropoliswithinGibbs (MWG) algorithm, improves the convergence by refining the state space of the univariate sampling. On the other hand, the blocked strategy of Gibbs algorithm, which performs the sampling over multivariate at each Markov move, is also shown to yield a better convergence rate than the traditional univariate sampling. In order to perform blocked sampling efficiently, GibbsKlein (GK) algorithm is proposed, which samples block by block using Klein’s algorithm. Furthermore, the validity of GK algorithm is demonstrated by showing its ergodicity. Simulation results based on MIMO detections are presented to confirm the convergence gain brought by the proposed Gibbs sampling schemes. 
GibbsNet  Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet sampling from them generally requires an iterative procedure such as blocked Gibbssampling that may require many steps to draw samples from the joint distribution $p(x, z)$. We propose a novel approach to learning the joint distribution between the data and a latent code which uses an adversarially learned iterative procedure to gradually refine the joint distribution, $p(x, z)$, to better match with the data distribution on each step. GibbsNet is the best of both worlds both in theory and in practice. Achieving the speed and simplicity of a directed latent variable model, it is guaranteed (assuming the adversarial game reaches the virtual training criteria global minimum) to produce samples from $p(x, z)$ with only a few sampling iterations. Achieving the expressiveness and flexibility of an undirected latent variable model, GibbsNet does away with the need for an explicit $p(z)$ and has the ability to do attribute prediction, classconditional generation, and joint imageattribute modeling in a single model which is not trained for any of these specific tasks. We show empirically that GibbsNet is able to learn a more complex $p(z)$ and show that this leads to improved inpainting and iterative refinement of $p(x, z)$ for dozens of steps and stable generation without collapse for thousands of steps, despite being trained on only a few steps. 
GIDropout  Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training. Inspired by dropout, this paper presents GIDropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use explicit instructions based on global information of the dataset to guide the training process. With GIDropout, the model is supposed to pay more attention to inapparent features or patterns. Experiments demonstrate the effectiveness of the dropout with global information on seven text classification tasks, including sentiment analysis and topic classification. 
Gini Impurity  Used by the CART (classification and regression tree) algorithm, Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it were randomly labeled according to the distribution of labels in the subset. Gini impurity can be computed by summing the probability of each item being chosen times the probability of a mistake in categorizing that item. It reaches its minimum (zero) when all cases in the node fall into a single target category. 
GirvanNewman Algorithm  The GirvanNewman algorithm detects communities by progressively removing edges from the original network. The connected components of the remaining network are the communities. Instead of trying to construct a measure that tells us which edges are the most central to communities, the GirvanNewman algorithm focuses on edges that are most likely “between” communities. 
GitHub Gist  Tom PrestonWerner presented the new Gist feature at a punk rock Ruby conference in 2008. Gist builds upon that idea by adding version control for code snippets, easy forking, and SSL encryption for private pastes. Because each “gist” is its own Git repository, multiple code snippets can be contained in a single paste and they can be pushed and pulled using Git. Further, forked code can be pushed back to the original author in the form of a patch, so pastes can become more like miniprojects. The main benefit of forking is that it allows you to freely experiment with changes without affecting the original project. Gist is a simple way to share snippets and pastes with others. All gists are Git repositories, so they are automatically versioned, forkable and usable from Git. gistr 
GitHub Hosted R Repository (ghrr) 
This ghrr (for ‘GitHub Hosted R Repository’) uses drat for both insertion of packages, and usage from R. http://…/#introducing_ghrr drat 
GITNet  In several natural language tasks, labeled sequences are available in separate domains (say, languages), but the goal is to label sequences with mixed domain (such as codeswitched text). Or, we may have available models for labeling whole passages (say, with sentiments), which we would like to exploit toward better positionspecific label inference (say, targetdependent sentiment annotation). A key characteristic shared across such tasks is that different positions in a primary instance can benefit from different `experts’ trained from auxiliary data, but labeled primary instances are scarce, and labeling the best expert for each position entails unacceptable cognitive burden. We propose GITNet, a unified positionsensitive multitask recurrent neural network (RNN) architecture for such applications. Auxiliary and primary tasks need not share training instances. Auxiliary RNNs are trained over auxiliary instances. A primary instance is also submitted to each auxiliary RNN, but their state sequences are gated and merged into a novel composite state sequence tailored to the primary inference task. Our approach is in sharp contrast to recent multitask networks like the crossstitch and sluice network, which do not control state transfer at such fine granularity. We demonstrate the superiority of GIRNet using three applications: sentiment classification of codeswitched passages, partofspeech tagging of codeswitched text, and target positionsensitive annotation of sentiment in monolingual passages. In all cases, we establish new stateoftheart performance beyond recent competitive baselines. 
Gittins Index  The Gittins index is a measure of the reward that can be achieved through a given stochastic process with certain properties, namely: the process has an ultimate termination state and evolves with an option, at each intermediate state, of terminating. Upon terminating at a given state, the reward achieved is the sum of the probabilistic expected rewards associated with every state from the actual terminating state to the ultimate terminal state, inclusive. The index is a real scalar. In applied mathematics, the ‘Gittins index’ is a real scalar value associated to the state of a stochastic process with a reward function and with a probability of termination. It is a measure of the reward that can be achieved by the process evolving from that state on, under the probability that it will be terminated in future. The ‘index policy’ induced by the Gittins index, consisting of choosing at any time the stochastic process with the currently highest Gittins index, is the solution of some stopping problems such as the one of dynamic allocation, where a decisionmaker has to maximize the total reward by distributing a limited amount of effort to a number of competing projects, each returning a stochastic reward. If the projects are independent from each other and only one project at a time may evolve, the problem is called multiarmed bandit (one type of Stochastic scheduling problems) and the Gittins index policy is optimal. If multiple projects can evolve, the problem is called Restless bandit and the Gittins index policy is a known good heuristic but no optimal solution exists in general. In fact, in general this problem is NPcomplete and it is generally accepted that no feasible solution can be found. MultiArmed Bandits and the Gittins Index 
GitXiv  arXiv + Github + Links + Discussion: GitXiv is a space to share links to open computer science projects. Countless Github and arXiv links are floating around the web. Its hard to keep track of these gems. GitXiv attempts to solve this problem by offering a collaboratively curated feed of projects. Each project is conveniently presented as arXiv + Github + Links + Discussion. Members can submit their findings and let the community rank and discuss it. A regular newsletter makes it easy to stay uptodate on recent advancements. It´s free and open. 
GLearning  Modelfree reinforcement learning algorithms such as Qlearning perform poorly in the early stages of learning in noisy environments, because much effort is spent on unlearning biased estimates of the stateaction function. The bias comes from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose Glearning, a new offpolicy learning algorithm that regularizes the noise in the space of optimal actions by penalizing deterministic policies at the beginning of the learning. Moreover, it enables naturally incorporating prior distributions over optimal actions when available. The stochastic nature of Glearning also makes it more costeffective than Qlearning in noiseless but explorationrisky domains. We illustrate these ideas in several examples where Glearning results in significant improvements of the learning rate and the learning cost. 
Global Distillation  Deep neural networks are known to suffer from catastrophic forgetting in classincremental learning, where the performance on previous tasks drastically degrades when learning a new task. To alleviate this effect, we propose to leverage a continuous and large stream of unlabeled data in the wild. In particular, to leverage such transient external data effectively, we design a novel classincremental learning scheme with (a) a new distillation loss, termed global distillation, (b) a learning strategy to avoid overfitting to the most recent task, and (c) a sampling strategy for the desired external data. Our experimental results on various datasets, including CIFAR and ImageNet, demonstrate the superiority of the proposed methods over prior methods, particularly when a stream of unlabeled data is accessible: we achieve up to 9.3% of relative performance improvement compared to the stateoftheart method. 
Global Interpreter Lock (GIL) 
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not threadsafe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.) CPython extensions must be GILaware in order to avoid defeating threads. For an explanation, see Global interpreter lock. The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or longrunning operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck. However the GIL degrades performance even when it is not a bottleneck. Summarizing those slides: The system call overhead is significant, especially on multicore hardware. Two threads calling a function may take twice as much time as a single thread calling the function twice. The GIL can cause I/Obound threads to be scheduled ahead of CPUbound threads. And it prevents signals from being delivered. 
Global Reasoning Unit (GloRe Unit) 
Globally modeling and reasoning over relations between regions can be beneficial for many computer vision tasks on both images and videos. Convolutional Neural Networks (CNNs) excel at modeling local relations by convolution operations, but they are typically inefficient at capturing global relations between distant regions and require stacking multiple convolution layers. In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed. After reasoning, relationaware features are distributed back to the original coordinate space for downstream tasks. We further present a highly efficient instantiation of the proposed approach and introduce the Global Reasoning unit (GloRe unit) that implements the coordinateinteraction space mapping by weighted global pooling and weighted broadcasting, and the relation reasoning via graph convolution on a small graph in interaction space. The proposed GloRe unit is lightweight, endtoend trainable and can be easily plugged into existing CNNs for a wide range of tasks. Extensive experiments show our GloRe unit can consistently boost the performance of stateoftheart backbone architectures, including ResNet, ResNeXt, SENet and DPN, for both 2D and 3D CNNs, on image classification, semantic segmentation and video action recognition task. 
Global Secondorder Pooling (GSoP) 
Global Secondorder Pooling Neural Networks 
Global Secondorder Pooling Neural Network  Deep Convolutional Networks (ConvNets) are fundamental to, besides largescale visual recognition, a lot of vision tasks. As the primary goal of the ConvNets is to characterize complex boundaries of thousands of classes in a highdimensional space, it is critical to learn higherorder representations for enhancing nonlinear modeling capability. Recently, Global Secondorder Pooling (GSoP), plugged at the end of networks, has attracted increasing attentions, achieving much better performance than classical, firstorder networks in a variety of vision tasks. However, how to effectively introduce higherorder representation in earlier layers for improving nonlinear capability of ConvNets is still an open problem. In this paper, we propose a novel network model introducing GSoP across from lower to higher layers for exploiting holistic image information throughout a network. Given an input 3D tensor outputted by some previous convolutional layer, we perform GSoP to obtain a covariance matrix which, after nonlinear transformation, is used for tensor scaling along channel dimension. Similarly, we can perform GSoP along spatial dimension for tensor scaling as well. In this way, we can make full use of the secondorder statistics of the holistic image throughout a network. The proposed networks are thoroughly evaluated on largescale ImageNet1K, and experiments have shown that they outperformed nontrivially the counterparts while achieving stateoftheart results. 
Global Sensitivity Analysis (GSA) 
This presentation aims to introduce global sensitivity analysis (SA), targeting an audience unfamiliar with the topic, and to give practical hints about the associated advantages and the effort needed. To this effect, we shall review some techniques for sensitivity analysis, including those that are not global, by applying them to a simple example. This will give the audience a chance to contrast each method’s result against the audience’s own expectation of what the sensitivity pattern for the simple model should be. We shall also try to relate the discourse on the relative importance of model input factors to specific questions, such as ‘Which of the uncertain input factor(s) is so noninfluential that we can safely fix it/them?’ or ‘If we could eliminate the uncertainty in one of the input factors, which factor should we choose to reduce the most the variance of the output?’ In this way, the selection of the method for sensitivity analysis will be put in relation to the framing of the analysis and to the interpretation and presentation of the results. The choice of the output of interest will be discussed in relation to the purpose of the model based analysis. The main methods that we present in this lecture are all related with one another, and are the method of Morris for factors’ screening and the variancebased measures. All are modelfree, in the sense that their application does not rely on special assumptions on the behaviour of the model (such as linearity, monotonicity and additivity of the relationship between input factor and model output). Monte Carlo filtering will be also be discussed to demonstrate the usefulness of global sensitivity analysis in relation to estimation. Global sensitivity analysis: An introduction (PDF Download Available) Global sensitivity analysis for statistical model parameters 
Global Style Token (GST) 
In this work, we propose ‘global style tokens’ (GSTs), a bank of embeddings that are jointly trained within Tacotron, a stateoftheart endtoend speech synthesis system. The embeddings are trained with no explicit labels, yet learn to model a large range of acoustic expressiveness. GSTs lead to a rich set of significant results. The soft interpretable ‘labels’ they generate can be used to control synthesis in novel ways, such as varying speed and speaking style – independently of the text content. They can also be used for style transfer, replicating the speaking style of a single audio clip across an entire longform text corpus. When trained on noisy, unlabeled found data, GSTs learn to factorize noise and speaker identity, providing a path towards highly scalable but robust speech synthesis. 
Global Vectors for Word Representation (GloVe) 
Recent methods for learning vector space representations of words have succeeded in capturing finegrained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a wordword cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition. 
GlobalLocal Collaborative Attentive Module for Cascaded Image Generation  ➘ “MirrorGAN” 
Globally Improved ANT (GIANT) 
For distributed computing environments, we consider the canonical machine learning problem of empirical risk minimization (ERM) with quadratic regularization, and we propose a distributed and communicationefficient Newtontype optimization method. At every iteration, each worker locally finds an Approximate NewTon (ANT) direction, and then it sends this direction to the main driver. The driver, then, averages all the ANT directions received from workers to form a Globally Improved ANT (GIANT) direction. GIANT naturally exploits the tradeoffs between local computations and global communications in that more local computations result in fewer overall rounds of communications. GIANT is highly communication efficient in that, for $d$dimensional data uniformly distributed across $m$ workers, it has $4$ or $6$ rounds of communication and $O (d \log m)$ communication complexity per iteration. Theoretically, we show that GIANT’s convergence rate is faster than firstorder methods and existing distributed Newtontype methods. From a practical pointofview, a highly beneficial feature of GIANT is that it has only one tuning parameter—the iterations of the local solver for computing an ANT direction. This is indeed in sharp contrast with many existing distributed Newtontype methods, as well as popular first order methods, which have several tuning parameters, and whose performance can be greatly affected by the specific choices of such parameters. In this light, we empirically demonstrate the superior performance of GIANT compared with other competing methods. 
GLocalized Anomaly Detection (GLAD) 
We propose an algorithm called GLAD (GLocalized Anomaly Detection) that allows endusers to retain the use of simple and understandable global anomaly detectors by automatically learning their local relevance to specific data instances using label feedback. The key idea is to place a uniform prior over the input feature space for each member of the anomaly detection ensemble via a neural network trained on unlabeled instances, and tune the weights of the neural network to adjust the local relevance of each ensemble member using all labeled instances. Our experiments on synthetic and realworld data show the effectiveness of GLAD in learning the local relevance of ensemble members and discovering anomalies via label feedback. 
GLORL  We consider the recently proposed reinforcement learning (RL) framework of Contextual Markov Decision Processes (CMDP), where the agent has a sequence of episodic interactions with tabular environments chosen from a possibly infinite set. The parameters of these environments depend on a context vector that is available to the agent at the start of each episode. In this paper, we propose a noregret online RL algorithm in the setting where the MDP parameters are obtained from the context using generalized linear models (GLMs). The proposed algorithm \texttt{GLORL} relies on efficient online updates and is also memory efficient. Our analysis of the algorithm gives new results in the logit link case and improves previous bounds in the linear case. Our algorithm uses efficient Online Newton Step updates to build confidence sets. Moreover, for any strongly convex link function, we also show a generic conversion from any online noregret algorithm to confidence sets. 
GLUE Benchmark  The General Language Understanding Evaluation (GLUE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems. GLUE consists of: • A benchmark of nine sentence or sentencepair language understanding tasks built on established existing datasets and selected to cover a diverse range of dataset sizes, text genres, and degrees of difficulty, • A diagnostic dataset designed to evaluate and analyze model performance with respect to a wide range of linguistic phenomena found in natural language, and • A public leaderboard for tracking performance on the benchmark and a dashboard for visualizing the performance of models on the diagnostic set. The format of the GLUE benchmark is modelagnostic, so any system capable of processing sentence and sentence pairs and producing corresponding predictions is eligible to participate. The benchmark tasks are selected so as to favor models that share information across tasks using parameter sharing or other transfer learning techniques. The ultimate goal of GLUE is to drive research in the development of general and robust natural language understanding systems. 
Glue Code  The term glue code is sometimes used to describe implementations of the adapter pattern. It does not serve any use in calculation or computation. Rather it serves as a proxy between otherwise incompatible parts of software, to make them compatible. The standard practice is to keep logic out of the glue code and leave that to the code blocks it connects to. 
GnnExplainer  Graph Neural Networks (GNNs) are a powerful tool for machine learning on graphs. GNNs combine node feature information with the graph structure by using neural networks to pass messages through edges in the graph. However, incorporating both graph structure and feature information leads to complex nonlinear models and explaining predictions made by GNNs remains to be a challenging task. Here we propose GnnExplainer, a general modelagnostic approach for providing interpretable explanations for predictions of any GNNbased model on any graphbased machine learning task (node and graph classification, link prediction). In order to explain a given node’s predicted label, GnnExplainer provides a local interpretation by highlighting relevant features as well as an important subgraph structure by identifying the edges that are most relevant to the prediction. Additionally, the model provides singleinstance explanations when given a single prediction as well as multiinstance explanations that aim to explain predictions for an entire class of instances/nodes. We formalize GnnExplainer as an optimization task that maximizes the mutual information between the prediction of the full model and the prediction of simplified explainer model. We experiment on synthetic as well as realworld data. On synthetic data we demonstrate that our approach is able to highlight relevant topological structures from noisy graphs. We also demonstrate GnnExplainer to provide a better understanding of pretrained models on realworld tasks. GnnExplainer provides a variety of benefits, from the identification of semantically relevant structures to explain predictions to providing guidance when debugging faulty graph neural network models. 
Gnowee  This paper introduces Gnowee, a modular, Pythonbased, opensource hybrid metaheuristic optimization algorithm (Available from https://…/Gnowee ). Gnowee is designed for rapid convergence to nearly globally optimum solutions for complex, constrained nuclear engineering problems with mixedinteger and combinatorial design vectors and highcost, noisy, discontinuous, black box objective function evaluations. Gnowee’s hybrid metaheuristic framework is a new combination of a set of diverse, robust heuristics that appropriately balance diversification and intensification strategies across a wide range of optimization problems. This novel algorithm was specifically developed to optimize complex nuclear design problems; the motivating research problem was the design of material stackups to modify neutron energy spectra to specific targeted spectra for applications in nuclear medicine, technical nuclear forensics, nuclear physics, etc. However, there are a wider range of potential applications for this algorithm both within the nuclear community and beyond. To demonstrate Gnowee’s behavior for a variety of problem types, comparisons between Gnowee and several wellestablished metaheuristic algorithms are made for a set of eighteen continuous, mixedinteger, and combinatorial benchmarks. These results demonstrate Gnoweee to have superior flexibility and convergence characteristics over a wide range of design spaces. We anticipate this wide range of applicability will make this algorithm desirable for many complex engineering applications. 
Gnu Regression Econometrics and TimeSeries Library (gretl) 
Is a crossplatform software package for econometric analysis, written in the C programming language. It is free, opensource software. You may redistribute it and/or modify it under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation. 
GNU Scientific Library (GSL) 
The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. It is free software under the GNU General Public License. The library provides a wide range of mathematical routines such as random number generators, special functions and leastsquares fitting. There are over 1000 functions in total with an extensive test suite. RcppGSL 
Goal Oriented Optimal Design of Experiments (GOODE) 
We develop a framework for goal oriented optimal design of experiments (GOODE) for largescale Bayesian linear inverse problems governed by PDEs. This framework differs from classical Bayesian optimal design of experiments (ODE) in the following sense: we seek experimental designs that minimize the posterior uncertainty in a predicted quantity of interest (QoI) rather than the estimated parameter itself. This is suitable for scenarios in which the solution of an inverse problem is an intermediate step and the estimated parameter is then used to compute a prediction QoI. In such problems, a GOODE approach has two benefits: the designs can avoid wastage of experimental resources by a targeted collection of data, and the resulting design criteria are computationally easier to evaluate due to the often low dimensionality of prediction QoIs. We present two modified design criteria, AGOODE and DGOODE, which are natural analogues of classical Bayesian A and Doptimal criteria. We analyze the connections to other ODE criteria, and provide interpretations for the GOODE criteria by using tools from information theory. Then, we develop an efficient gradientbased optimization framework for solving the GOODE optimization problems. Additionally, we present comprehensive numerical experiments testing the various aspects of the presented approach. The driving application is the optimal placement of sensors to identify the source of contaminants in a diffusion and transport problem. We enforce sparsity of the sensor placements using an $\ell_1$norm penalty approach, and propose a practical strategy for specifying the associated penalty parameter. 
Goedel Machine  Can machines design Can they come up with creative solutions to problems and build tools and artifacts across a wide range of domains Recent advances in the field of computational creativity and formal Artificial General Intelligence (AGI) provide frameworks for machines with the general ability to design. In this paper we propose to integrate a formal computational creativity framework into the G\’odel machine framework. We call this machine a design G\’odel machine. Such a machine could solve a variety of design problems by generating novel concepts. In addition, it could change the way these concepts are generated by modifying itself. The design G\’odel machine is able to improve its initial design program, once it has proven that a modification would increase its return on the utility function. Finally, we sketch out a specific version of the design G\’odel machine which specifically aims at the design of complex software and hardware systems. Future work could be the development of a more formal version of the Design G\’odel machine and a potential implementation. 
GOGGLES  Generating large labeled training data is becoming the biggest bottleneck in building and deploying supervised machine learning models. Recently, data programming has been proposed in the data management community to reduce the human cost in training data generation. Data programming expects users to write a set of labeling functions, each of which is a weak supervision source that labels a subset of data points with betterthanrandom accuracy. However, the success of data programming heavily depends on the quality (in terms of both accuracy and coverage) of the labeling functions that users still need to design manually. We propose affinity coding, a new paradigm for fully automatic generation of training data. In affinity coding, the similarity between the unlabeled instances and prototypes that are derived from the same unlabeled instances serve as signals (or sources of weak supervision) for determining class membership. We term this implicit similarity as the affinity score. Consequently, we can have as many sources of weak supervision as the number of unlabeled data points, without any human input. We also propose a system called GOGGLES that is an implementation of affinity coding for labeling image datasets. GOGGLES features novel techniques for deriving affinity scores from image datasets based on ‘semantic prototypes’ extracted from convolutional neural nets, as well as an expectationmaximization approach for performing class label inference based on the computed affinity scores. Compared to the stateoftheart data programming system Snorkel, GOGGLES exhibits 14.88% average improvement in terms of the quality of labels generated for the binary labeling task. The GOGGLES system is opensourced at https://…/. 
Golomb Ruler Problem  The Golomb ruler problem is defined as follows: Given a positive integer n, locate n marks on a ruler such that the distance between any two distinct pair of marks are different from each other and the total length of the ruler is minimized. The Golomb ruler problem has applications in information theory, astronomy and communications, and it can be seen as a challenge for combinatorial optimization algorithms. Although constructing high quality rulers is wellstudied, proving optimality is a far more challenging task. 
Google AI  At Google AI, we’re conducting research that advances the stateoftheart in the field, applying AI to products and to new domains, and developing tools to ensure that everyone can access AI. Google’s mission is to organize the world’s information and make it universally accessible and useful. AI is helping us do that in exciting new ways, solving problems for our users, our customers, and the world. AI is making it easier for people to do things every day, whether it’s searching for photos of loved ones, breaking down language barriers in Google Translate, typing emails on the go, or getting things done with the Google Assistant. AI also provides new ways of looking at existing problems, from rethinking healthcare to advancing scientific discovery. 
Google AI Platform  AI Platform makes it easy for machine learning developers, data scientists, and dataengineers to take their ML projects from ideation to production and deployment, quicklyand costeffectively. From data engineering to ‘no lockin’ flexibility,AI Platform’s integrated tool chain helps you build and run your own machinelearning applications. AI Platform supports Kubeflow, Google’s opensource platform, which lets you buildportable ML pipelines that you can run onpremises or on Google Cloud withoutsignificant code changes. And you’ll have access to cuttingedge Google AItechnology like TensorFlow, TPUs, and TFX tools as you deploy your AI applications toproduction. 
Google Brain Project  Google Brain is an unofficial name for a deep learning research project at Google. 
Google Cloud Dataflow  Simplified stream and batch data processing, with equal reliability and expressiveness. Cloud Dataflow is a fullymanaged service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness — no more complex workarounds or compromises needed. And with its serverless approach to resource provisioning and management, you have access to virtually limitless capacity to solve your biggest data processing challenges, while paying only for what you use. Cloud Dataflow unlocks transformational use cases across industries, including: · check Clickstream, PointofSale, and segmentation analysis in retail · check Fraud detection in financial services · check Personalized user experience in gaming · check IoT analytics in manufacturing, healthcare, and logistics 
Google Colab  ➚ “Colaboratory” 
Google DeepDream  DeepDream is a computer vision program created by Google engineer Alexander Mordvintsev which uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dreamlike hallucinogenic appearance in the deliberately overprocessed images. Google’s program popularized the term (deep) ‘dreaming’ to refer to the generation of images that produce desired activations in a trained deep network, and the term now refers to a collection of related approaches. 
Google Prediction API  Google’s cloudbased machine learning tools. Google’s machine learning algorithms to analyze data and predict future outcomes using a familiar RESTful interface. 
GooStats  \texttt{GooStats} is a software framework that provides a flexible environment and common tools to implement multivariate statistical analysis. The framework is built upon the \texttt{CERN ROOT}, \texttt{MINUIT} and \texttt{GooFit} packages. Running a multivariate analysis in parallel on graphics processing units yields a huge boost in performance and opens new possibilities. The design and benchmark of \texttt{GooStats} are presented in this article along with illustration of its application to statistical problems. 
GOOWEML  As data streams become more prevalent, the necessity for online algorithms that mine this transient and dynamic data becomes clearer. Multilabel data stream classification is a supervised learning problem where each instance in the data stream is classified into one or more predefined sets of labels. Many methods have been proposed to tackle this problem, including but not limited to ensemblebased methods. Some of these ensemblebased methods are specifically designed to work with certain multilabel base classifiers; some others employ online bagging schemes to build their ensembles. In this study, we introduce a novel online and dynamicallyweighted stacked ensemble for multilabel classification, called GOOWEML, that utilizes spatial modeling to assign optimal weights to its component classifiers. Our model can be used with any existing incremental multilabel classification algorithm as its base classifier. We conduct experiments with 4 GOOWEMLbased multilabel ensembles and 7 baseline models on 7 realworld datasets from diverse areas of interest. Our experiments show that GOOWEML ensembles yield consistently better results in terms of predictive performance in almost all of the datasets, with respect to the other prominent ensemble models. 
GossipbAseD subGradiEnT SVM (GADGET SVM) 
In the era of big data, an important weapon in a machine learning researcher’s arsenal is a scalable Support Vector Machine (SVM) algorithm. SVMs are extensively used for solving classification problems. Traditional algorithms for learning SVMs often scale super linearly with training set size which becomes infeasible very quickly for large data sets. In recent years, scalable algorithms have been designed which study the primal or dual formulations of the problem. This often suggests a way to decompose the problem and facilitate development of distributed algorithms. In this paper, we present a distributed algorithm for learning linear Support Vector Machines in the primal form for binary classification called GossipbAseD subGradiEnT (GADGET) SVM. The algorithm is designed such that it can be executed locally on nodes of a distributed system. Each node processes its local homogeneously partitioned data and learns a primal SVM model. It then gossips with random neighbors about the classifier learnt and uses this information to update the model. Extensive theoretical and empirical results suggest that this anytime algorithm has performance comparable to its centralized and online counterparts. 
GossipGraD  In this paper, we present GossipGraD – a gossip communication protocol based Stochastic Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on largescale systems. The salient features of GossipGraD are: 1) reduction in overall communication complexity from {\Theta}(log(p)) for p compute nodes in wellstudied SGD to O(1), 2) model diffusion such that compute nodes exchange their updates (gradients) indirectly after every log(p) steps, 3) rotation of communication partners for facilitating direct diffusion of gradients, 4) asynchronous distributed shuffle of samples during the feedforward phase in SGD to prevent overfitting, 5) asynchronous communication of gradients for further reducing the communication cost of SGD and GossipGraD. We implement GossipGraD for GPU and CPU clusters and use NVIDIA GPUs (Pascal P100) connected with InfiniBand, and Intel Knights Landing (KNL) connected with Aries network. We evaluate GossipGraD using wellstudied dataset ImageNet1K (~250GB), and widely studied neural network topologies such as GoogLeNet and ResNet50 (current winner of ImageNet Large Scale Visualization Research Challenge (ILSVRC)). Our performance evaluation using both KNL and Pascal GPUs indicates that GossipGraD can achieve perfect efficiency for these datasets and their associated neural network topologies. Specifically, for ResNet50, GossipGraD is able to achieve ~100% compute efficiency using 128 NVIDIA Pascal P100 GPUs – while matching the top1 classification accuracy published in literature. 
GouldenJackson Cluster Method  Finding the generating function for the number of words avoiding, as factors, the members of a prescribed set of ‘dirty words’. 
Gower’s Distance  Idea: Use distance measure between 0 and 1 for each variable and aggregate. gower 
GPdoemd  GPdoemd is an opensource python package for design of experiments for model discrimination that uses Gaussian process surrogate models to approximate and maximise the divergence between marginal predictive distributions of rival mechanistic models. GPdoemd uses the divergence prediction to suggest a maximally informative next experiment. 
GPDRF  Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variablesized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variablesized input data such as trees, graphs, and sequences. We introduce the GPDRF, a novel Bayesian model with an input layer of GPs, followed by DRF layers. The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variablesized input as well as learn deep longrange dependency structures of the data. We provide a novel efficient method to simultaneously infer the posterior of GP’s latent vectors and infer the posterior of DRF’s internal weights and random frequencies. Our experiments show that GPDRF outperforms the standard GP model and DRF model across many datasets. Furthermore, they demonstrate that GPDRF enables improved uncertainty quantification compared to GP and DRF alone, with respect to a Bhattacharyya distance assessment. Source code is available at https://…/GP_DRF. 
GPflowOpt  A novel Python framework for Bayesian optimization known as GPflowOpt is introduced. The package is based on the popular GPflow library for Gaussian processes, leveraging the benefits of TensorFlow including automatic differentiation, parallelization and GPU computations for Bayesian optimization. Design goals focus on a framework that is easy to extend with custom acquisition functions and models. The framework is thoroughly tested and well documented, and provides scalability. The current released version of GPflowOpt includes some standard singleobjective acquisition functions, the stateoftheart maxvalue entropy search, as well as a Bayesian multiobjective approach. Finally, it permits easy use of custom modeling strategies implemented in GPflow. 
GPipe  GPipe is a scalable pipeline parallelism library that enables learning of giant deep neural networks. It partitions network layers across accelerators and pipelines execution to achieve high hardware utilization. It leverages recomputation to minimize activation memory usage. For example, using partitions over 8 accelerators, it is able to train networks that are 25x larger, demonstrating its scalability. It also guarantees that the computed gradients remain consistent regardless of the number of partitions. It achieves an almost linear speed up without any changes in the model parameters: when using 4x more accelerators, training the same model is up to 3.5x faster. We train a 557 million parameters AmoebaNet model on ImageNet and achieve a new stateoftheart 84.3% top1 / 97.0% top5 accuracy on ImageNet. Finally, we use this learned model as an initialization for training 7 different popular image classification datasets and obtain results that exceed the best published ones on 5 of them, including pushing the CIFAR10 accuracy to 99% and CIFAR100 accuracy to 91.3%. Explained: GPipe – Training Giant Neural Nets using Pipeline Parallelism 
GPMaL  Exploratory data analysis is a fundamental aspect of knowledge discovery that aims to find the main characteristics of a dataset. Dimensionality reduction, such as manifold learning, is often used to reduce the number of features in a dataset to a manageable level for human interpretation. Despite this, most manifold learning techniques do not explain anything about the original features nor the true characteristics of a dataset. In this paper, we propose a genetic programming approach to manifold learning called GPMaL which evolves functional mappings from a highdimensional space to a lower dimensional space through the use of interpretable trees. We show that GPMaL is competitive with existing manifold learning algorithms, while producing models that can be interpreted and reused on unseen data. A number of promising future directions of research are found in the process. 
GPU Open Analytics Initiative (GOAI) 
Recently, Continuum Analytics, H2O.ai, and MapD announced the formation of the GPU Open Analytics Initiative (GOAI). GOAIalso joined by BlazingDB, Graphistry and the Gunrock project from the University of California, Davisaims to create open frameworks that allow developers and data scientists to build applications using standard data formats and APIs on GPUs. Bringing standard analytics data formats to GPUs will allow data analytics to be even more efficient, and to take advantage of the high throughput of GPUs. NVIDIA believes this initiative is a key contributor to the continued growth of GPU computing in accelerated analytics. 
GPyTorch  Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on recent developments in machine learning hardware. We present an efficient and general approach to GP inference based on Blackbox MatrixMatrix multiplication (BBMM). BBMM inference uses a modified batched version of the conjugate gradients algorithm to derive all terms required for training and inference in a single call. Adapting this algorithm to complex models simply requires a routine for efficient matrixmatrix multiplication with the kernel and its derivative. In addition, BBMM utilizes a specialized preconditioner that substantially speeds up convergence. In experiments, we show that BBMM efficiently utilizes GPU hardware, speeding up GP inference by an order of magnitude on a variety of popular GP models. Additionally, we provide GPyTorch, a new software platform for scalable Gaussian process inference via BBMM, built on PyTorch. 
GQA  We introduce GQA, a new dataset for realworld visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and robust question engine that leverages scene graph structures to create 22M diverse reasoning questions, all come with functional programs that represent their semantics. We use the programs to gain tight control over the answer distribution and present a new tunable smoothing technique to mitigate language biases. Accompanying the dataset is a suite of new metrics that evaluate essential qualities such as consistency, grounding and plausibility. An extensive analysis is performed for baselines as well as stateoftheart models, providing finegrained results for different question types and topologies. Whereas a blind LSTM obtains mere 42.1%, and strong VQA models achieve 54.1%, human performance tops at 89.3\%, offering ample opportunity for new research to explore. We strongly hope GQA will provide an enabling resource for the next generation of models with enhanced robustness, improved consistency, and deeper semantic understanding for images and language. 
Grabit Model (Grabit) 
We introduce a novel model which is obtained by applying gradient tree boosting to the Tobit model. The so called Grabit model allows for modeling data that consist of a mixture of a continuous part and discrete point masses at the borders. Examples of this include censored data, fractional response data, corner solution response data, rainfall data, and binary classification data where additional information, that is related to the underlying classification mechanism, is available. In contrast to the Tobit model, the Grabit model can account for general forms of nonlinearities and interactions, it is robust against outliers in covariates and scale invariant to monotonic transformations for the covariates, and its predictive performance is not impaired by multicollinearity. We apply the Grabit model for predicting defaults on loans made to Swiss small and mediumsized enterprises (SME), and we obtain a large improvement in predictive performance compared to other stateoftheart approaches. 
Gradient Acceleration in Activation Function (GAAF) 
Dropout has been one of standard approaches to train deep neural networks, and it is known to regularize large models to avoid overfitting. The effect of dropout has been explained by avoiding coadaptation. In this paper, however, we propose a new explanation of why dropout works and propose a new technique to design better activation functions. First, we show that dropout is an optimization technique to push the input towards the saturation area of nonlinear activation function by accelerating gradient information flowing even in the saturation area in backpropagation. Based on this explanation, we propose a new technique for activation functions, gradient acceleration in activation function (GAAF), that accelerates gradients to flow even in the saturation area. Then, input to the activation function can climb onto the saturation area which makes the network more robust because the model converges on a flat region. Experiment results support our explanation of dropout and confirm that the proposed GAAF technique improves performances with expected properties. 
Gradient Adversarial Training  We propose gradient adversarial training, an auxiliary deep learning framework applicable to different machine learning problems. In gradient adversarial training, we leverage a prior belief that in many contexts, simultaneous gradient updates should be statistically indistinguishable from each other. We enforce this consistency using an auxiliary network that classifies the origin of the gradient tensor, and the main network serves as an adversary to the auxiliary network in addition to performing standard taskbased training. We demonstrate gradient adversarial training for three different scenarios: (1) as a defense to adversarial examples we classify gradient tensors and tune them to be agnostic to the class of their corresponding example, (2) for knowledge distillation, we do binary classification of gradient tensors derived from the student or teacher network and tune the student gradient tensor to mimic the teacher’s gradient tensor; and (3) for multitask learning we classify the gradient tensors derived from different task loss functions and tune them to be statistically indistinguishable. For each of the three scenarios we show the potential of gradient adversarial training procedure. Specifically, gradient adversarial training increases the robustness of a network to adversarial attacks, is able to better distill the knowledge from a teacher network to a student network compared to soft targets, and boosts multitask learning by aligning the gradient tensors derived from the task specific loss functions. Overall, our experiments demonstrate that gradient tensors contain latent information about whatever tasks are being trained, and can support diverse machine learning problems when intelligently guided through adversarialization using a auxiliary network. 
Gradient Boost Convolutional Autoencoder with Neural Decision Forest (GrCAN) 
Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, we further combine convolutional autoencoder with neural decision forest, where autoencoder has its advantages in finding the hidden representations of the input data. We develop a gradient boost module and embed it into the proposed convolutional autoencoder with neural decision forest to improve the performance. The idea of gradient boost is to learn and use the residual in the prediction. In addition, we design a structure to learn the parameters of the neural decision forest and gradient boost module at contiguous steps. The extensive experiments on several public datasets demonstrate that our proposed model achieves good efficiency and prediction performance compared with a series of baseline methods. 
Gradient Boosted Feature Selection (GBFS) 
A feature selection algorithm should ideally satisfy four conditions: reliably extract relevant features; be able to identify nonlinear feature interactions; scale linearly with the number of features and dimensions; allow the incorporation of known sparsity structure. In this work we propose a novel feature selection algorithm, Gradient Boosted Feature Selection (GBFS), which satisfies all four of these requirements. The algorithm is flexible, scalable, and surprisingly straightforward to implement as it is based on a modification of Gradient Boosted Trees. We evaluate GBFS on several real world data sets and show that it matches or outperforms other state of the art feature selection algorithms. Yet it scales to larger data set sizes and naturally allows for domainspecific side information. 
Gradient Boosted Regression Trees (GBRT) 

Gradient Boosting (GBDT,MART,TreeNet,BTE) 
Gradient boosting is a machine learning technique for regression problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stagewise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. The gradient boosting method can also be used for classification problems by reducing them to regression with a suitable loss function. 
Gradient Boosting Machine  Gradient boosting machines are a family of powerful machinelearning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling. A theoretical information is complemented with descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. Three practical examples of gradient boosting applications are presented and comprehensively analyzed. gbm 
Gradient Confusion  The goal of this paper is to study why stochastic gradient descent (SGD) is efficient for neural networks, and how neural net design affects SGD. In particular, we investigate how overparameterization — an increase in the number of parameters beyond the number of training data — affects the dynamics of SGD. We introduce a simple concept called gradient confusion. When confusion is high, stochastic gradients produced by different data samples may be negatively correlated, slowing down convergence. But when gradient confusion is low, we show that SGD has better convergence properties than predicted by classical theory. Using theoretical and experimental results, we study how overparameterization affects gradient confusion, and thus the convergence of SGD, on linear models and neural networks. We show that increasing the number of parameters of linear models or increasing the width of neural networks leads to lower gradient confusion, and thus faster and easier model training. We also show how overparameterization by increasing the depth of neural networks results in higher gradient confusion, making deeper models harder to train. Finally, we observe empirically that techniques like batch normalization and skip connections reduce gradient confusion, which helps reduce the training burden of deep networks. 
Gradient Deflection  Overparameterized neural networks generalize well in practice without any explicit regularization. Although it has not been proven yet, empirical evidence suggests that implicit regularization plays a crucial role in deep learning and prevents the network from overfitting. In this work, we introduce the gradient gap deviation and the gradient deflection as statistical measures corresponding to the network curvature and the Hessian matrix to analyze variations of network derivatives with respect to input parameters, and investigate how implicit regularization works in ReLU neural networks from both theoretical and empirical perspectives. Our result reveals that the network output between each pair of input samples is properly controlled by random initialization and stochastic gradient descent to keep interpolating between samples almost straight, which results in low complexity of overparameterized neural networks. 
Gradient Episodic Memory  One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR100 datasets demonstrate the strong performance of GEM when compared to the stateoftheart. 
Gradient Gap Deviation  Overparameterized neural networks generalize well in practice without any explicit regularization. Although it has not been proven yet, empirical evidence suggests that implicit regularization plays a crucial role in deep learning and prevents the network from overfitting. In this work, we introduce the gradient gap deviation and the gradient deflection as statistical measures corresponding to the network curvature and the Hessian matrix to analyze variations of network derivatives with respect to input parameters, and investigate how implicit regularization works in ReLU neural networks from both theoretical and empirical perspectives. Our result reveals that the network output between each pair of input samples is properly controlled by random initialization and stochastic gradient descent to keep interpolating between samples almost straight, which results in low complexity of overparameterized neural networks. 
Gradient Normalization  Deep multitask networks, in which one neural network produces multiple predictive outputs, are more scalable and often better regularized than their singletask counterparts. Such advantages can potentially lead to gains in both speed and performance, but multitask networks are also difficult to train without finding the right balance between tasks. We present a novel gradient normalization (GradNorm) technique which automatically balances the multitask loss function by directly tuning the gradients to equalize task training rates. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting over single networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $\alpha$. Thus, what was once a tedious search process which incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we hope to demonstrate that direct gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning. 
Gradient Projection Classical Sketch (GPCS) 

Gradient Projection Iterative Sketch (GPIS) 
We propose a randomized first order optimization algorithm Gradient Projection Iterative Sketch (GPIS) and an accelerated variant for efficiently solving large scale constrained Least Squares (LS). We provide theoretical convergence analysis for both proposed algorithms and demonstrate our methods’ computational efficiency compared to classical accelerated gradient method, and the state of the art variancereduced stochastic gradient methods through numerical experiments in various large synthetic/real data sets. 
Gradient Regularized Budgeted Boosting  As machine learning transitions increasingly towards real world applications controlling the testtime cost of algorithms becomes more and more crucial. Recent work, such as the Greedy Miser and Speedboost, incorporate testtime budget constraints into the training procedure and learn classifiers that provably stay within budget (in expectation). However, so far, these algorithms are limited to the supervised learning scenario where sufficient amounts of labeled data are available. In this paper we investigate the common scenario where labeled data is scarce but unlabeled data is available in abundance. We propose an algorithm that leverages the unlabeled data (through Laplace smoothing) and learns classifiers with budget constraints. Our model, based on gradient boosted regression trees (GBRT), is, to our knowledge, the first algorithm for semisupervised budgeted learning. 
Gradient Scheduling Algorithm With Global Momentum (GSGM) 
Distributed asynchronous offline training has received widespread attention in recent years because of its high performance on largescale data and complex models. As data are processed from cloudcentric positions to edge locations, a big challenge for distributed systems is how to handle native and natural nonindependent and identically distributed (nonIID) data for training. Previous asynchronous training methods do not have a satisfying performance on nonIID data because it would result in that the training process fluctuates greatly which leads to an abnormal convergence. We propose a gradient scheduling algorithm with global momentum (GSGM) for nonIID data distributed asynchronous training. Our key idea is to schedule the gradients contributed by computing nodes based on a white list so that each training node’s update frequency remains even. Furthermore, our new momentum method can solve the biased gradient problem. GSGM can make model converge effectively, and maintain high availability eventually. Experimental results show that for nonIID data training under the same experimental conditions, GSGM on popular optimization algorithms can achieve an 20% increase in training stability with a slight improvement in accuracy on FashionMnist and CIFAR10 datasets. Meanwhile, when expanding distributed scale on CIFAR100 dataset that results in sparse data distribution, GSGM can perform an 37% improvement on training stability. Moreover, only GSGM can converge well when the number of computing nodes is 30, compared to the stateoftheart distributed asynchronous algorithms. 
Gradient Similarity  Deep neural networks are susceptible to smallbutspecific adversarial perturbations capable of deceiving the network. This vulnerability can lead to potentially harmful consequences in securitycritical applications. To address this vulnerability, we propose a novel metric called \emph{Gradient Similarity} that allows us to capture the influence of training data on test inputs. We show that \emph{Gradient Similarity} behaves differently for normal and adversarial inputs, and enables us to detect a variety of adversarial attacks with a near perfect ROCAUC of 95100\%. Even whitebox adversaries equipped with perfect knowledge of the system cannot bypass our detector easily. On the MNIST dataset, whitebox attacks are either detected with a high ROCAUC of 8796\%, or require very high distortion to bypass our detector. 
GradientCoherent Strong Regularization  Deep neural networks are often prone to overfitting with their numerous parameters, so regularization plays an important role in generalization. L1 and L2 regularizers are common regularization tools in machine learning with their simplicity and effectiveness. However, we observe that imposing strong L1 or L2 regularization on deep neural networks with stochastic gradient descent easily fails, which limits the generalization ability of the underlying neural networks. To understand this phenomenon, we first investigate how and why learning fails when strong regularization is imposed on deep neural networks. We then propose a novel method, gradientcoherent strong regularization, which imposes regularization only when the gradients are kept coherent in the presence of strong regularization. Experiments are performed with multiple deep architectures on three benchmark data sets for image recognition. Experimental results show that our proposed approach indeed endures strong regularization and significantly improves both accuracy and compression, which could not be achieved otherwise. 
Gradual Tuning  In this paper we present an alternative strategy for finetuning the parameters of a network. We named the technique Gradual Tuning. Once trained on a first task, the network is finetuned on a second task by modifying a progressively larger set of the network’s parameters. We test Gradual Tuning on different transfer learning tasks, using networks of different sizes trained with different regularization techniques. The result shows that compared to the usual fine tuning, our approach significantly reduces catastrophic forgetting of the initial task, while still retaining comparable if not better performance on the new task. 
Graduated Symbol Map  A map with symbols that change in size according to the value of the attribute they represent. For example, denser populations might be represented by larger dots, or larger rivers by thicker lines. 
Grafana  Grafana is an open source, feature rich metrics dashboard and graph editor for Graphite, Elasticsearch, OpenTSDB, Prometheus and InfluxDB. The tool for beautiful monitoring and metric analytics & dashboards for Graphite, InfluxDB & Prometheus & More. The analytics platform for all your metrics. Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture. Intro to Grafana: Installation, Configuration, and Building the First Dashboard 
GraFC2T2  Recommending appropriate items to users is crucial in many ecommerce platforms that contain implicit data as users’ browsing, purchasing and streaming history. One common approach consists in selecting the N most relevant items to each user, for a given N, which is called topN recommendation. To do so, recommender systems rely on various kinds of information, like item and user features, past interest of users for items, browsing history and trust between users. However, they often use only one or two such pieces of information, which limits their performance. In this paper, we design and implement GraFC2T2, a general graphbased framework to easily combine and compare various kinds of side information for topN recommendation. It encodes contentbased features, temporal and trust information into a complex graph, and uses personalized PageRank on this graph to perform recommendation. We conduct experiments on Epinions and Ciao datasets, and compare obtained performances using F1score, Hit ratio and MAP evaluation metrics, to systems based on matrix factorization and deep learning. This shows that our framework is convenient for such explorations, and that combining different kinds of information indeed improves recommendation in general. 
Granger Causality  The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect ‘mere’ correlations, but Clive Granger argued that causality in economics could be reflected by measuring the ability of predicting the future values of a time series using past values of another time series. Since the question of ‘true causality’ is deeply philosophical, econometricians assert that the Granger test finds only ‘predictive causality’. A time series X is said to Grangercause Y if it can be shown, usually through a series of ttests and Ftests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y. Granger also stressed that some studies using ‘Granger causality’ testing in areas outside economics reached ‘ridiculous’ conclusions. ‘Of course, many ridiculous papers appeared’, he said in his Nobel Lecture, December 8, 2003. However, it remains a popular method for causality analysis in time series due to its computational simplicity. The original definition of Granger causality does not account for latent confounding effects and does not capture instantaneous and nonlinear causal relationships, though several extensions have been proposed to address these issues. https://…/grangercausalitytest Cointegration & Granger Causality 
Granger Causality Network  We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, nonidentifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the application of MTD to highdimensional multivariate time series. As a baseline, we also formulate a multioutput logistic autoregressive model (mLTD), which while a straightforward extension of autoregressive Bernoulli generalized linear models, has not been previously applied to the analysis of multivariate categorial time series. We develop novel identifiability conditions of the MTD model and compare them to those for mLTD. We further devise novel and efficient optimization algorithm for the MTD based on the new convex formulation, and compare the MTD and mLTD in both simulated and real data experiments. Our approach simultaneously provides a comparison of methods for network inference in categorical time series and opens the door to modern, regularized inference with the MTD model. 
GRap  Finding the best neural network configuration for a given goal can be challenging, especially when it is not possible to assess the output quality of a network automatically. We present GRap, an interactive interface based on Visual Analytics principles for comparing outputs of multiple RNNs for the same training data. GRap enables an iterative result generation process that allows a user to evaluate the outputs with contextual statistics. 
Graph Adversarial Training (GAT) 
Recent efforts show that neural networks are vulnerable to small but intentional perturbations on input features in visual classification tasks. Due to the additional consideration of connections between examples (e.g., articles with citation link tend to be in the same class), graph neural networks could be more sensitive to the perturbations, since the perturbations from connected examples exacerbate the impact on a target example. Adversarial Training (AT), a dynamic regularization technique, can resist the worstcase perturbations on input features and is a promising choice to improve model robustness and generalization. However, existing AT methods focus on standard classification, being less effective when training models on graph since it does not model the impact from connected examples. In this work, we explore adversarial training on graph, aiming to improve the robustness and generalization of models learned on graph. We propose Graph Adversarial Training (GAT), which takes the impact from connected examples into account when learning to construct and resist perturbations. We give a general formulation of GAT, which can be seen as a dynamic regularization scheme based on the graph structure. To demonstrate the utility of GAT, we employ it on a stateoftheart graph neural network model — Graph Convolutional Network (GCN). We conduct experiments on two citation graphs (Citeseer and Cora) and a knowledge graph (NELL), verifying the effectiveness of GAT which outperforms normal training on GCN by 4.51% in node classification accuracy. Codes will be released upon acceptance. 
Graph Attention Network (GAttN) 
This paper targets to a novel but practical recommendation problem named exactK recommendation. It is different from traditional topK recommendation, as it focuses more on (constrained) combinatorial optimization which will optimize to recommend a whole set of K items called card, rather than ranking optimization which assumes that ‘better’ items should be put into top positions. Thus we take the first step to give a formal problem definition, and innovatively reduce it to Maximum Clique Optimization based on graph. To tackle this specific combinatorial optimization problem which is NPhard, we propose Graph Attention Networks (GAttN) with a Multihead Selfattention encoder and a decoder with attention mechanism. It can endtoend learn the joint distribution of the K items and generate an optimal card rather than rank individual items by prediction scores. Then we propose Reinforcement Learning from Demonstrations (RLfD) which combines the advantages in behavior cloning and reinforcement learning, making it sufficient andefficient to train the model. Extensive experiments on three datasets demonstrate the effectiveness of our proposed GAttN with RLfD method, it outperforms several strong baselines with a relative improvement of 7.7% and 4.7% on average in Precision and Hit Ratio respectively, and achieves stateoftheart (SOTA) performance for the exactK recommendation problem. 
Graph Attribute Aggregation Network (GAAN) 
Graph convolutional neural networks (GCNNs) have been attracting increasing research attention due to its great potential in inference over graph structures. However, insufficient effort has been devoted to the aggregation methods between different convolution graph layers. In this paper, we introduce a graph attribute aggregation network (GAAN) architecture. Different from the conventional pooling operations, a graphtransformationbased aggregation strategy, progressive margin folding, PMF, is proposed for integrating graph features. By distinguishing internal and margin elements, we provide an approach for implementing the folding iteratively. And a mechanism is also devised for preserving the local structures during progressively folding. In addition, a hypergraphbased representation is introduced for transferring the aggregated information between different layers. Our experiments applied to the public molecule datasets demonstrate that the proposed GAAN outperforms the existing GCNN models with significant effectiveness. 
Graph Augmented Memory Network (GAMENet) 
Recent progress in deep learning is revolutionizing the healthcare domain including providing solutions to medication recommendations, especially recommending medication combination for patients with complex health conditions. Existing approaches either do not customize based on patient health history, or ignore existing knowledge on drugdrug interactions (DDI) that might lead to adverse outcomes. To fill this gap, we propose the Graph Augmented Memory Networks (GAMENet), which integrates the drugdrug interactions knowledge graph by a memory module implemented as a graph convolutional networks, and models longitudinal patient records as the query. It is trained endtoend to provide safe and personalized recommendation of medication combination. We demonstrate the effectiveness and safety of GAMENet by comparing with several stateoftheart methods on real EHR data. GAMENet outperformed all baselines in all effectiveness measures, and also achieved 3.60% DDI rate reduction from existing EHR data. 
Graph Based SemiSupervised Learning (GSSL) 

Graph Bayesian Optimization  Network structure optimization is a fundamental task in complex network analysis. However, almost all the research on Bayesian optimization is aimed at optimizing the objective functions with vectorial inputs. In this work, we first present a flexible framework, denoted graph Bayesian optimization, to handle arbitrary graphs in the Bayesian optimization community. By combining the proposed framework with graph kernels, it can take full advantage of implicit graph structural features to supplement explicit features guessed according to the experience, such as tags of nodes and any attributes of graphs. The proposed framework can identify which features are more important during the optimization process. We apply the framework to solve four problems including two evaluations and two applications to demonstrate its efficacy and potential applications. 
Graph Branch Distance (GBD) 
Graph similarity search is a common and fundamental operation in graph databases. One of the most popular graph similarity measures is the Graph Edit Distance (GED) mainly because of its broad applicability and high interpretability. Despite its prevalence, exact GED computation is proved to be NPhard, which could result in unsatisfactory computational efficiency on large graphs. However, exactly accurate search results are usually unnecessary for realworld applications especially when the responsiveness is far more important than the accuracy. Thus, in this paper, we propose a novel probabilistic approach to efficiently estimate GED, which is further leveraged for the graph similarity search. Specifically, we first take branches as elementary structures in graphs, and introduce a novel graph similarity measure by comparing branches between graphs, i.e., Graph Branch Distance (GBD), which can be efficiently calculated in polynomial time. Then, we formulate the relationship between GED and GBD by considering branch variations as the result ascribed to graph edit operations, and model this process by probabilistic approaches. By applying our model, the GED between any two graphs can be efficiently estimated by their GBD, and these estimations are finally utilized in the graph similarity search. Extensive experiments show that our approach has better accuracy, efficiency and scalability than other comparable methods in the graph similarity search over real and synthetic data sets. 
Graph Capsule Network (GCAPSCNN) 
Graph Convolutional Neural Networks (GCNNs) are the most recent exciting advancement in deep learning field and their applications are quickly spreading in multicrossdomains including bioinformatics, chemoinformatics, social networks, natural language processing and computer vision. In this paper, we expose and tackle some of the basic weaknesses of a GCNN model with a capsule idea presented in~\cite{hinton2011transforming} and propose our Graph Capsule Network (GCAPSCNN) model. In addition, we design our GCAPSCNN model to solve especially graph classification problem which current GCNN models find challenging. Through extensive experiments, we show that our proposed Graph Capsule Network can significantly outperforms both the existing stateofart deep learning methods and graph kernels on graph classification benchmark datasets. 
Graph Convolution Embedded Long Short Term Memory Network (GCLSTM) 
Dynamic link prediction is a research hot in complex networks area, especially for its wide applications in biology, social network, economy and industry. Compared with static link prediction, dynamic one is much more difficult since network structure evolves over time. Currently most researches focus on static link prediction which cannot achieve expected performance in dynamic network. Aiming at low AUC, high Error Rate, add/remove link prediction difficulty, we propose GCLSTM, a Graph Convolution Network (GC) embedded Long Short Term Memory network (LTSM), for endtoend dynamic link prediction. To the best of our knowledge, it is the first time that GCN embedded LSTM is put forward for link prediction of dynamic networks. GCN in this new deep model is capable of node structure learning of network snapshot for each time slide, while LSTM is responsible for temporal feature learning for network snapshot. Besides, current dynamic link prediction method can only handle removed links, GCLSTM can predict both added or removed link at the same time. Extensive experiments are carried out to testify its performance in aspects of prediction accuracy, Error Rate, add/remove link prediction and key link prediction. The results prove that GCLSTM outperforms current stateofart method. 
Graph Convolutional Gaussian Processes  We propose a novel Bayesian nonparametric method to learn translationinvariant relationships on nonEuclidean domains. The resulting graph convolutional Gaussian processes can be applied to problems in machine learning for which the input observations are functions with domains on general graphs. The structure of these models allows for high dimensional inputs while retaining expressibility, as is the case with convolutional neural networks. We present applications of graph convolutional Gaussian processes to images and triangular meshes, demonstrating their versatility and effectiveness, comparing favorably to existing methods, despite being relatively simple models. 
Graph Convolutional Network  This paper develops a novel graph convolutional network (GCN) framework for fault location in power distribution networks. The proposed approach integrates multiple measurements at different buses while takes system topology into account. The effectiveness of the GCN model is corroborated by the IEEE 123bus benchmark system. Simulation results show that the GCN model significantly outperforms other widelyused machine learning schemes with very high fault location accuracy. In addition, the proposed approach is robust to measurement noise and errors, missing entries, as well as multiple connection possibilities. Finally, data visualization results of two competing neural networks are presented to explore the mechanism of GCN’s superior performance. 
Graph Convolutional Neural Network (Graph CNN) 
Due to the fact much of today’s data can be represented as graphs, there has been a demand for generalizing neural network models for graph data. One recent direction that has shown fruitful results, and therefore growing interest, is the usage of graph convolutional neural networks (GCNs). They have been shown to provide a significant improvement on a wide range of tasks in network analysis, one of which being node representation learning. The task of learning lowdimensional node representations has shown to increase performance on a plethora of other tasks from link prediction and node classification, to community detection and visualization. Simultaneously, signed networks (or graphs having both positive and negative links) have become ubiquitous with the growing popularity of social media. However, since previous GCN models have primarily focused on unsigned networks (or graphs consisting of only positive links), it is unclear how they could be applied to signed networks due to the challenges presented by negative links. The primary challenges are based on negative links having not only a different semantic meaning as compared to positive links, but their principles are inherently different and they form complex relations with positive links. Therefore we propose a dedicated and principled effort that utilizes balance theory to correctly aggregate and propagate the information across layers of a signed GCN model. We perform empirical experiments comparing our proposed signed GCN against stateoftheart baselines for learning node representations in signed networks. More specifically, our experiments are performed on four realworld datasets for the classical link sign prediction problem that is commonly used as the benchmark for signed network embeddings algorithms. 
Graph Convolutional Recurrent Neural Network (GCRNN) 
Graph processes model a number of important problems such as identifying the epicenter of an earthquake or predicting weather. In this paper, we propose a Graph Convolutional Recurrent Neural Network (GCRNN) architecture specifically tailored to deal with these problems. GCRNNs use convolutional filter banks to keep the number of trainable parameters independent of the size of the graph and of the time sequences considered. We also put forward Gated GCRNNs, a timegated variation of GCRNNs akin to LSTMs. When compared with GNNs and another graph recurrent architecture in experiments using both synthetic and realword data, GCRNNs significantly improve performance while using considerably less parameters. 
Graph Cube  In a paper from the University of Illinois at UrbanaChampaign this time in collaboration with Microsoft and Google, a novel data warehousing model called Graph Cube is introduced. Based on a restricted graph model (e.g., no attributes on edges) introduced as multidimensional network (with the dimensions being the vertex attributes), they define the notion of an aggregate network (called cuboid). A graph cube constitutes then the set of all possible aggregations of the original network. 
Graph Database  In computing, a graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. A graph database is any storage system that provides indexfree adjacency. This means that every element contains a direct pointer to its adjacent elements and no index lookups are necessary. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases. 
Graph Database of Academic Literature (GrapAL) 
We introduce GrapAL (Graph database of Academic Literature), a versatile tool for exploring and investigating scientific literature which satisfies a variety of use cases and information needs requested by researchers. At the core of GrapAL is a Neo4j graph database with an intuitive schema and a simple query language. In this paper, we describe the basic elements of GrapAL, how to use it, and several use cases such as finding experts on a given topic for peer reviewing, discovering indirect connections between biomedical entities, and computing citationbased metrics. 
Graph DiffusionEmbedding Network (GDEN) 
We present a novel graph diffusionembedding networks (GDEN) for graph structured data. GDEN is motivated by our closedform formulation on regularized feature diffusion on graph. GDEN integrates both regularized feature diffusion and lowdimensional embedding simultaneously in a unified network model. Moreover, based on GDEN, we can naturally deal with structured data with multiple graph structures. Experiments on semisupervised learning tasks on several benchmark datasets demonstrate the better performance of the proposed GDEN when comparing with the traditional GCN models. 
Graph Dynamical Network  Understanding the dynamical processes that govern the performance of functional materials is essential for the design of next generation materials to tackle global energy and environmental challenges. Many of these processes involve the dynamics of individual atoms or small molecules in condensed phases, e.g. lithium ions in electrolytes, water molecules in membranes, molten atoms at interfaces, etc., which are difficult to understand due to the complexity of local environments. In this work, we develop graph dynamical networks, an unsupervised learning approach for understanding atomic scale dynamics in arbitrary phases and environments from molecular dynamics simulations. We demonstrate that learning the local dynamics of atoms can be significantly easier than the global dynamics of the entire material system using a toy system. We also apply the method to learn the dynamics of two different systems — silicon atoms at liquidsolid interfaces, and lithium ions in amorphous polymer electrolytes, and show that our approach gains important dynamical information that is otherwise difficult to obtain. With the large amounts of molecular dynamics data generated everyday in nearly every aspect of materials design, this approach provides a broadly useful, automated tool to understand atomic scale dynamics in material systems. 
Graph Element Network  We explore the use of graph neural networks (GNNs) to model spatial processes in which there is a priori graphical structure. Similar to finite element analysis, we assign nodes of a GNN to spatial locations and use a computational process defined on the graph to model the relationship between an initial function defined over a space and a resulting function in the same space. We use GNNs as a computational substrate, and show that the locations of the nodes in space as well as their connectivity can be optimized to focus on the most complex parts of the space. Moreover, this representational strategy allows the learned inputoutput relationship to generalize over the size of the underlying space and run the same model at different levels of precision, trading computation for accuracy. We demonstrate this method on a traditional PDE problem, a physical prediction problem from robotics, and a problem of learning to predict scene images from novel viewpoints. 
Graph Feature Network (GFN) 
Graph Neural Nets (GNNs) have received increasing attentions, partially due to their superior performance in many node and graph classification tasks. However, there is a lack of understanding on what they are learning and how sophisticated the learned graph functions are. In this work, we first propose Graph Feature Network (GFN), a simple lightweight neural net defined on a set of graph augmented features. We then propose a dissection of GNNs on graph classification into two parts: 1) the graph filtering, where graphbased neighbor aggregations are performed, and 2) the set function, where a set of hidden node features are composed for prediction. To test the importance of these two parts separately, we prove and leverage the connection that GFN can be derived by linearizing graph filtering part of GNN. Empirically we perform evaluations on common graph classification benchmarks. To our surprise, we find that, despite the simplification, GFN could match or exceed the best accuracies produced by recently proposed GNNs, with a fraction of computation cost. Our results provide new perspectives on both the functions that GNNs learned and the current benchmarks for evaluating them. 
Graph Fourier Transform (GFT) 
In this paper, we propose a new regressionbased algorithm to compute Graph Fourier Transform (GFT). Our algorithm allows different regularizations to be included when computing the GFT analysis components, so that the resulting components can be tuned for a specific task. We propose using the lasso penalty in our proposed framework to obtain analysis components with sparse loadings. We show that the components from this proposed {\em sparse GFT} can identify and select correlated signal sources into subgraphs, and perform frequency analysis {\em locally} within these subgraphs of correlated sources. Using real network traffic datasets, we demonstrate that sparse GFT can achieve outstanding performance in an anomaly detection task. 
Graph Function Library  A graph abstraction layer with an objectoriented programming interface has been introduced, which enables the implementation of custom graph algorithms for example within a stored procedure. A set of parameterizable implementations of frequentlyused algorithms will be provided in the form of a Graph Function Library for application developers to choose from. 
Graph Hierarchical Convolutional Recurrent Neural Network (GHCRNN) 
The prediction of urban vehicle flow and speed can greatly facilitate people’s travel, and also can provide reasonable advice for the decisionmaking of relevant government departments. However, due to the spatial, temporal and hierarchy of vehicle flow and many influencing factors such as weather, it is difficult to prediction. Most of the existing research methods are to extract spatial structure information on the road network and extract time series information from the historical data. However, when extracting spatial features, these methods have higher time and space complexity, and incorporate a lot of noise. It is difficult to apply on large graphs, and only considers the influence of surrounding connected road nodes on the central node, ignoring a very important hierarchical relationship, namely, similar information of similar node features and road network structures. In response to these problems, this paper proposes the Graph Hierarchical Convolutional Recurrent Neural Network (GHCRNN) model. The model uses GCN (Graph Convolutional Networks) to extract spatial feature, GRU (Gated Recurrent Units) to extract temporal feature, and uses the learnable Pooling to extract hierarchical information, eliminate redundant information and reduce complexity. Applying this model to the vehicle flow and speed data of Shenzhen and Los Angeles has been well verified, and the time and memory consumption are effectively reduced under the compared precision. 
Graph Information Criterion (GIC) 
statGraph 
Graph Information Ratio  We introduce the notion of information ratio $\text{Ir}(H/G)$ between two (simple, undirected) graphs $G$ and $H$, defined as the supremum of ratios $k/n$ such that there exists a mapping between the strong products $G^k$ to $H^n$ that preserves nonadjacency. Operationally speaking, the information ratio is the maximal number of source symbols per channel use that can be reliably sent over a channel with a confusion graph $H$, where reliability is measured w.r.t. a source confusion graph $G$. Various results are provided, including in particular lower and upper bounds on $\text{Ir}(H/G)$ in terms of different graph properties, inequalities and identities for behavior under strong product and disjoint union, relations to graph cores, and notions of graph criticality. Informally speaking, $\text{Ir}(H/G)$ can be interpreted as a measure of similarity between $G$ and $H$. We make this notion precise by introducing the concept of information equivalence between graphs, a more quantitative version of homomorphic equivalence. We then describe a natural partial ordering over the space of information equivalence classes, and endow it with a suitable metric structure that is contractive under the strong product. Various examples and open problems are discussed. 
Graph Kernel Library (GraKeL) 
The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Graph kernels have recently emerged as a promising approach to this problem. There are now many kernels, each focusing on different structural aspects of graphs. Here, we present GraKeL, a library that unifies several graph kernels into a common framework. The library is written in Python and is build on top of scikitlearn. It is simple to use and can be naturally combined with scikitlearn’s modules to build a complete machine learning pipeline for tasks such as graph classification and clustering. The code is BSD licensed and is available at: https://…/GraKeL. 
Graph Laplacian Mixture Model  Graph learning methods have recently been receiving increasing interest as means to infer structure in datasets. Most of the recent approaches focus on different relationships between a graph and data sample distributions, mostly in settings where all available relate to the same graph. This is, however, not always the case, as data is often available in mixed form, yielding the need for methods that are able to cope with mixture data and learn multiple graphs. We propose a novel generative model that explains a collection of distinct data naturally living on different graphs. We assume the mapping of data to graphs is not known and investigate the problem of jointly clustering a set of data and learning a graph for each of the clusters. Experiments in both synthetic and realworld datasets demonstrate promising performance both in terms of data clustering, as well as multiple graph inference from mixture data. 
Graph Learning  The construction of a meaningful graph topology plays a crucial role in the effective representation, processing, analysis and visualization of structured data. When a natural choice of the graph is not readily available from the datasets, it is thus desirable to infer or learn a graph topology from the data. In this tutorial overview, we survey solutions to the problem of graph learning, including classical viewpoints from statistics and physics, and more recent approaches that adopt a graph signal processing (GSP) perspective. We further emphasize the conceptual similarities and differences between classical and GSP graph inference methods and highlight the potential advantage of the latter in a number of theoretical and practical scenarios. We conclude with several open issues and challenges that are keys to the design of future signal processing and machine learning algorithms for learning graphs from data. 
Graph Learning Neural Network  Semisupervised classification on graphstructured data has received increasing attention, where labels are only available for a small subset of data such as social networks and citation networks. This problem is challenging due to the irregularity of graphs. Graph convolutional neural networks (GCN) have been recently proposed to address such kinds of problems, which feed the graph topology into the network to guide operations such as graph convolution. Nevertheless, in most cases where the graphs are not given, they are empirically constructed manually, which tends to be suboptimal. Hence, we propose Graph Learning Neural Networks (GLNN), which exploits the optimization of graphs (the adjacency matrix in particular) and integrates into the GCN for semisupervised node classification. Leveraging on spectral graph theory, this essentially combines both graph learning and graph convolution into a unified framework. Specifically, we represent features of social/citation networks as graph signals, and propose the objective of graph learning from the graphsignal prior, sparsity constraint and properties of a valid adjacency matrix via maximum a posteriori estimation. The optimization objective is then integrated into the loss function of the GCN, leading to joint learning of the adjacency matrix and highlevel features. Experimental results show that our proposed GLNN outperforms stateoftheart approaches over widely adopted social network datasets and citation network datasets. 
Graph LearningConvolutional Network (GLCN) 
Recently, graph Convolutional Neural Networks (graph CNNs) have been widely used for graph data representation and semisupervised learning tasks. However, existing graph CNNs generally use a fixed graph which may be not optimal for semisupervised learning tasks. In this paper, we propose a novel Graph LearningConvolutional Network (GLCN) for graph data representation and semisupervised learning. The aim of GLCN is to learn an optimal graph structure that best serves graph CNNs for semisupervised learning by integrating both graph learning and graph convolution together in a unified network architecture. The main advantage is that in GLCN, both given labels and the estimated labels are incorporated and thus can provide useful ‘weakly’ supervised information to refine (or learn) the graph construction and also to facilitate the graph convolution operation in GLCN for unknown label estimation. Experimental results on seven benchmarks demonstrate that GLCN significantly outperforms stateoftheart traditional fixed structure based graph CNNs. 
Graph Markov Neural Network (GMNN) 
This paper studies semisupervised object classification in relational data, which is a fundamental problem in relational data modeling. The problem has been extensively studied in the literature of both statistical relational learning (e.g. relational Markov networks) and graph neural networks (e.g. graph convolutional networks). Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through endtoend training. In this paper, we propose the Graph Markov Neural Network (GMNN) that combines the advantages of both worlds. A GMNN models the joint distribution of object labels with a conditional random field, which can be effectively trained with the variational EM algorithm. In the Estep, one graph neural network learns effective object representations for approximating the posterior distributions of object labels. In the Mstep, another graph neural network is used to model the local label dependency. Experiments on object classification, link classification, and unsupervised node representation learning show that GMNN achieves stateoftheart results. 
Graph Matching based Partial Label Learning (GMPLL) 
Partial Label Learning (PLL) aims to learn from the data where each training example is associated with a set of candidate labels, among which only one is correct. The key to deal with such problem is to disambiguate the candidate label sets and obtain the correct assignments between instances and their candidate labels. In this paper, we interpret such assignments as instancetolabel matchings, and reformulate the task of PLL as a matching selection problem. To model such problem, we propose a novel Graph Matching based Partial Label Learning (GMPLL) framework, where Graph Matching (GM) scheme is incorporated owing to its excellent capability of exploiting the instance and label relationship. Meanwhile, since conventional onetoone GM algorithm does not satisfy the constraint of PLL problem that multiple instances may correspond to the same label, we extend a traditional onetoone probabilistic matching algorithm to the manytoone constraint, and make the proposed framework accommodate to the PLL problem. Moreover, we also propose a relaxed matching prediction model, which can improve the prediction accuracy via GM strategy. Extensive experiments on both artificial and realworld data sets demonstrate that the proposed method can achieve superior or comparable performance against the stateoftheart methods. 
Graph Neural Architecture Search Method (GraphNAS) 
Graph Neural Networks (GNNs) have been popularly used for analyzing nonEuclidean data such as social network data and biological data. Despite their success, the design of graph neural networks requires a lot of manual work and domain knowledge. In this paper, we propose a Graph Neural Architecture Search method (GraphNAS for short) that enables automatic search of the best graph neural architecture based on reinforcement learning. Specifically, GraphNAS first uses a recurrent network to generate variablelength strings that describe the architectures of graph neural networks, and then trains the recurrent network with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation data set. Extensive experimental results on node classification tasks in both transductive and inductive learning settings demonstrate that GraphNAS can achieve consistently better performance on the Cora, Citeseer, Pubmed citation network, and proteinprotein interaction network. On node classification tasks, GraphNAS can design a novel network architecture that rivals the best humaninvented architecture in terms of test set accuracy. 
Graph Neural Network (GNN) 
Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an mdimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities. 
Graph Neural Process (GNP) 
We introduce Graph Neural Processes (GNP), inspired by the recent work in conditional and latent neural processes. A Graph Neural Process is defined as a Conditional Neural Process that operates on arbitrary graph data. It takes features of sparsely observed context points as input, and outputs a distribution over target points. We demonstrate graph neural processes in edge imputation and discuss benefits and drawbacks of the method for other application areas. One major benefit of GNPs is the ability to quantify uncertainty in deep learning on graph structures. An additional benefit of this method is the ability to extend graph neural networks to inputs of dynamic sized graphs. 
Graph NodeFeature Convolution  Graph convolutional network (GCN) is an emerging neural network approach. It learns new representation of a node by aggregating feature vectors of all neighbors in the aggregation process without considering whether the neighbors or features are useful or not. Recent methods have improved solutions by sampling a fixed size set of neighbors, or assigning different weights to different neighbors in the aggregation process, but features within a feature vector are still treated equally in the aggregation process. In this paper, we introduce a new convolution operation on regular size feature maps constructed from features of a fixed node bandwidth via sampling to get the firstlevel node representation, which is then passed to a standard GCN to learn the secondlevel node representation. Experiments show that our method outperforms competing methods in semisupervised node classification tasks. Furthermore, our method opens new doors for exploring new GCN architectures, particularly deeper GCN models. 
Graph of Graphs (GoG) 
Graphs are general and powerful data representations which can model complex realworld phenomena, ranging from chemical compounds to social networks; however, effective feature extraction from graphs is not a trivial task, and much work has been done in the field of machine learning and data mining. The recent advances in graph neural networks have made automatic and flexible feature extraction from graphs possible and have improved the predictive performance significantly. In this paper, we go further with this line of research and address a more general problem of learning with a graph of graphs (GoG) consisting of an external graph and internal graphs, where each node in the external graph has an internal graph structure. We propose a dual convolutional neural network that extracts node representations by combining the external and internal graph structures in an endtoend manner. Experiments on link prediction tasks using several chemical network datasets demonstrate the effectiveness of the proposed method. 
Graph Optimized Convolutional Network (GOCN) 
Graph Convolutional Networks (GCNs) have been widely studied for graph data representation and learning tasks. Existing GCNs generally use a fixed single graph which may lead to weak suboptimal for data representation/learning and are also hard to deal with multiple graphs. To address these issues, we propose a novel Graph Optimized Convolutional Network (GOCN) for graph data representation and learning. Our GOCN is motivated based on our reinterpretation of graph convolution from a regularization/optimization framework. The core idea of GOCN is to formulate graph optimization and graph convolutional representation into a unified framework and thus conducts both of them cooperatively to boost their respective performance in GCN learning scheme. Moreover, based on the proposed unified graph optimizationconvolution framework, we propose a novel Multiple Graph Optimized Convolutional Network (MGOCN) to naturally address the data with multiple graphs. Experimental results demonstrate the effectiveness and benefit of the proposed GOCN and MGOCN. 
Graph Partition Neural Network  We present graph partition neural networks (GPNN), an extension of graph neural networks (GNNs) able to handle extremely large graphs. GPNNs alternate between locally propagating information between nodes in small subgraphs and globally propagating information between the subgraphs. To efficiently partition graphs, we experiment with several partitioning algorithms and also propose a novel variant for fast processing of large scale graphs. We extensively test our model on a variety of semisupervised node classification tasks. Experimental results indicate that GPNNs are either superior or comparable to stateoftheart methods on a wide variety of datasets for graphbased semisupervised classification. We also show that GPNNs can achieve similar performance as standard GNNs with fewer propagation steps. 
Graph Pattern Entity Ranking Model (GRank) 
Knowledge graphs have evolved rapidly in recent years and their usefulness has been demonstrated in many artificial intelligence tasks. However, knowledge graphs often have lots of missing facts. To solve this problem, many knowledge graph embedding models have been developed to populate knowledge graphs and these have shown outstanding performance. However, knowledge graph embedding models are socalled black boxes, and the user does not know how the information in a knowledge graph is processed and the models can be difficult to interpret. In this paper, we utilize graph patterns in a knowledge graph to overcome such problems. Our proposed model, the {\it graph pattern entity ranking model} (GRank), constructs an entity ranking system for each graph pattern and evaluates them using a ranking measure. By doing so, we can find graph patterns which are useful for predicting facts. Then, we perform link prediction tasks on standard datasets to evaluate our GRank method. We show that our approach outperforms other stateoftheart approaches such as ComplEx and TorusE for standard metrics such as HITS@{\it n} and MRR. Moreover, our model is easily interpretable because the output facts are described by graph patterns. 
Graph Pattern Polynomial  We study the time complexity of induced subgraph isomorphism problems where the pattern graph is fixed. The earliest known example of an improvement over trivial algorithms is by Itai and Rodeh (1978) who sped up triangle detection in graphs using fast matrix multiplication. This algorithm was generalized by Ne\v{s}et\v{r}il and Poljak (1985) to speed up detection of kcliques. Improved algorithms are known for certain smallsized patterns. For example, a lineartime algorithm is known for detecting length4 paths. In this paper, we give the first pattern detection algorithm that improves upon Ne\v{s}et\v{r}il and Poljak’s algorithm for arbitrarily large pattern graphs (not cliques). The algorithm is obtained by reducing the induced subgraph isomorphism problem to the problem of detecting multilinear terms in constantdegree polynomials. We show that the same technique can be used to reduce the induced subgraph isomorphism problem of many pattern graphs to constructing arithmetic circuits computing homomorphism polynomials of these pattern graphs. Using this, we obtain faster combinatorial algorithms (algorithms that do not use fast matrix multiplication) for kpaths and kcycles. We also obtain faster algorithms for 5paths and 5cycles that match the runtime for triangle detection. We show that these algorithms are expressible using polynomial families that we call graph pattern polynomial families. We then define a notion of reduction among these polynomials that allows us to compare the complexity of various pattern detection problems within this framework. For example, we show that the induced subgraph isomorphism polynomial for any pattern that contains a kclique is harder than the induced subgraph isomorphism polynomial for kclique. An analogue of this theorem is not known with respect to general algorithmic hardness. 
Graph Pooling (gPool) 
We consider the problem of representation learning for graph data. Convolutional neural networks can naturally operate on images, but have significant challenges in dealing with graph data. Given images are special cases of graphs with nodes lie on 2D lattices, graph embedding tasks have a natural correspondence with image pixelwise prediction tasks such as segmentation. While encoderdecoder architectures like UNets have been successfully applied on many image pixelwise prediction tasks, similar methods are lacking for graph data. This is due to the fact that pooling and upsampling operations are not natural on graph data. To address these challenges, we propose novel graph pooling (gPool) and unpooling (gUnpool) operations in this work. The gPool layer adaptively selects some nodes to form a smaller graph based on their scalar projection values on a trainable projection vector. We further propose the gUnpool layer as the inverse operation of the gPool layer. The gUnpool layer restores the graph into its original structure using the position information of nodes selected in the corresponding gPool layer. Based on our proposed gPool and gUnpool layers, we develop an encoderdecoder model on graph, known as the graph UNets. Our experimental results on node classification and graph classification tasks demonstrate that our methods achieve consistently better performance than previous models. 
Graph Processing Framework for Large Dynamic Graphs (BLADYG) 
Recently, distributed processing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, GraphLab, and Trinity. These systems can be divided into two categories: (1) vertexcentric and (2) blockcentric approaches. In vertexcentric approaches, each vertex corresponds to a process, and message are exchanged among vertices. In blockcentric approaches, the unit of computation is a block, a connected subgraph of the graph, and message exchanges occur among blocks. In this paper, we are considering the issues of scale and dynamism in the case of blockcentric approaches. We present bladyg, a blockcentric framework that addresses the issue of dynamism in largescale graphs. We present an implementation of BLADYG on top of akka framework. We experimentally evaluate the performance of the proposed framework. 
Graph Processing over Partitions (GPOP) 
The past decade has seen development of many sharedmemory graph processing frameworks intended to reduce the effort for creating high performance parallel applications. However, their programming models, based on Vertexcentric or Edgecentric paradigms suffer from several issues, such as poor cache utilization, irregular memory accesses, heavy use of synchronization primitives or theoretical inefficiency, that deteriorate the performance and scalability of applications. Recently, a cacheefficient partitioncentric paradigm was proposed for computing PageRank. In this paper, we generalize this approach to develop a novel Partitioncentric Programming Model(PPM) that is cacheefficient, scalable and workefficient. We implement PPM as part of Graph Processing over Partitions(GPOP) framework that can efficiently execute a variety of algorithms. GPOP dramatically improves the cache performance by exploiting locality of partitioning. It achieves high scalability by enabling completely lock and atomic free computation. Its builtin analytical performance models enable it to use a hybrid of source and destination centric communication modes in a way that ensures workefficiency of each iteration and simultaneously boosts high bandwidth sequential memory accesses. GPOP framework completely abstracts away underlying parallelism and programming model details from the user. It provides an easy to program set of APIs with the ability to selectively continue the active vertex set across iterations, which is not intrinsically supported by the current frameworks. We extensively evaluate the performance of GPOP for a variety of graph algorithms, using several large datasets. We observe that GPOP incurs upto 8.6x and 5.2x less L2 cache misses compared to Ligra and GraphMat, respectively. In terms of execution time, GPOP is upto 19x and 6.1x faster than Ligra and GraphMat, respectively. 
Graph Recurrent Neural Network (GRNN) 
The era of data deluge has sparked the interest in graphbased learning methods in a number of disciplines such as sociology, biology, neuroscience, or engineering. In this paper, we introduce a graph recurrent neural network (GRNN) for scalable semisupervised learning from multirelational data. Key aspects of the novel GRNN architecture are the use of multirelational graphs, the dynamic adaptation to the different relations via learnable weights, and the consideration of graphbased regularizers to promote smoothness and alleviate overparametrization. Our ultimate goal is to design a powerful learning architecture able to: discover complex and highly nonlinear data associations, combine (and select) multiple types of relations, and scale gracefully with respect to the size of the graph. Numerical tests with real data sets corroborate the design goals and illustrate the performance gains relative to competing alternatives. 
Graph Similarity Computation via Convolutional Neural Network (GSimCNN) 
We introduce GSimCNN (Graph Similarity Computation via Convolutional Neural Networks) for predicting the similarity score between two graphs. As the core operation of graph similarity search, pairwise graph similarity computation is a challenging problem due to the NPhard nature of computing many graph distance/similarity metrics. We demonstrate our model using the Graph Edit Distance (GED) as the example metric. Experiments on three real graph datasets demonstrate that our model achieves the stateoftheart performance on graph similarity search. 
Graph sketchingbased Massive Data Clustering (DBMSTClu) 
In this paper, we address the problem of recovering arbitraryshaped data clusters from massive datasets. We present DBMSTClu a new densitybased nonparametric method working on a limited number of linear measurements i.e. a sketched version of the similarity graph $G$ between the $N$ objects to cluster. Unlike $k$means, $k$medians or $k$medoids algorithms, it does not fail at distinguishing clusters with particular structures. No input parameter is needed contrarily to DBSCAN or the Spectral Clustering method. DBMSTClu as a graphbased technique relies on the similarity graph $G$ which costs theoretically $O(N^2)$ in memory. However, our algorithm follows the dynamic semistreaming model by handling $G$ as a stream of edge weight updates and sketches it in one pass over the data into a compact structure requiring $O(\operatorname{poly} \operatorname{log} (N))$ space. Thanks to the property of the Minimum Spanning Tree (MST) for expressing the underlying structure of a graph, our algorithm successfully detects the right number of nonconvex clusters by recovering an approximate MST from the graph sketch of $G$. We provide theoretical guarantees on the quality of the clustering partition and also demonstrate its advantage over the existing stateoftheart on several datasets. 
Graph Spanner  A graph spanner is a fundamental graph structure that faithfully preserves the pairwise distances in the input graph up to a small multiplicative stretch. The common objective in the computation of spanners is to achieve the bestknown existential sizestretch tradeoff efficiently. Classical models and algorithmic analysis of graph spanners essentially assume that the algorithm can read the input graph, construct the desired spanner, and write the answer to the output tape. However, when considering massive graphs containing millions or even billions of nodes not only the input graph, but also the output spanner might be too large for a single processor to store. 
Graph Stream Sketch (GSS) 
A graph stream is a continuous sequence of data items, in which each item indicates an edge, including its two endpoints and edge weight. It forms a dynamic graph that changes with every item in the stream. Graph streams play important roles in cyber security, social networks, cloud troubleshooting systems and other fields. Due to the vast volume and high update speed of graph streams, traditional data structures for graph storage such as the adjacency matrix and the adjacency list are no longer sufficient. However, prior art of graph stream summarization, like CM sketches, gSketches, TCM and gMatrix, either supports limited kinds of queries or suffers from poor accuracy of query results. In this paper, we propose a novel Graph Stream Sketch (GSS for short) to summarize the graph streams, which has the linear space cost (O(E), E is the edge set of the graph) and the constant update time complexity (O(1)) and supports all kinds of queries over graph streams with the controllable errors. Both theoretical analysis and experiment results confirm the superiority of our solution with regard to the time/space complexity and query results’ precision compared with the stateoftheart. 
Graph Structured Recurrent Neural Network (GSRNN) 
We present a generic framework for spatiotemporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multiscaled framework is a seamless coupling of two major components: a selfexciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting. 
Graph Variogram  Irregularly sampling a spatially stationary random field does not yield a graph stationary signal in general. Based on this observation, we build a definition of graph stationarity based on intrinsic stationarity, a less restrictive definition of classical stationarity. We introduce the concept of graph variogram, a novel tool for measuring spatial intrinsic stationarity at local and global scales for irregularly sampled signals by selecting subgraphs of local neighborhoods. Graph variograms are extensions of variograms used for signals defined on continuous Euclidean space. Our experiments with intrinsically stationary signals sampled on a graph, demonstrate that graph variograms yield estimates with small bias of true theoretical models, while being robust to sampling variation of the space. 
Graph Warp Module (GWM) 
Recently, Graph Neural Networks (GNNs) are trending in the machine learning community as a family of architectures that specializes in capturing the features of graphrelated datasets, such as those pertaining to social networks and chemical structures. Unlike for other families of the networks, the representation power of GNNs has much room for improvement, and many graph networks to date suffer from the problem of underfitting. In this paper we will introduce a Graph Warp Module, a supernodebased auxiliary network module that can be attached to a wide variety of existing GNNs in order to improve the representation power of the original networks. Through extensive experiments on molecular graph datasets, we will show that our GWM indeed alleviates the underfitting problem for various existing networks, and that it can even help create a network with the stateoftheart generalization performance. 
Graph Wavelet Neural Network (GWNN) 
We present graph wavelet neural network (GWNN), a novel graph convolutional neural network (CNN), leveraging graph wavelet transform to address the shortcomings of previous spectral graph CNN methods that depend on graph Fourier transform. Different from graph Fourier transform, graph wavelet transform can be obtained via a fast algorithm without requiring matrix eigendecomposition with high computational cost. Moreover, graph wavelets are sparse and localized in vertex domain, offering high efficiency and good interpretability for graph convolution. The proposed GWNN significantly outperforms previous spectral graph CNNs in the task of graphbased semisupervised classification on three benchmark datasets: Cora, Citeseer and Pubmed. 
Graph Weighted Model (GWM) 
Graph Weighted Models (GWMs) have recently been proposed as a natural generalization of weighted automata over strings and trees to arbitrary families of labeled graphs (and hypergraphs). A GWM generically associates a labeled graph with a tensor network and computes a value by successive contractions directed by its edges. 
Graph2Seq  Celebrated \emph{Sequence to Sequence learning (Seq2Seq)} and its fruitful variants are powerful models to achieve excellent performance on the tasks that map sequences to sequences. However, these are many machine learning tasks with inputs naturally represented in a form of graphs, which imposes significant challenges to existing Seq2Seq models for lossless conversion from its graph form to the sequence. In this work, we present a general endtoend approach to map the input graph to a sequence of vectors, and then another attentionbased LSTM to decode the target sequence from these vectors. Specifically, to address inevitable information loss for data conversion, we introduce a novel graphtosequence neural network model that follows the encoderdecoder architecture. Our method first uses an improved graphbased neural network to generate the node and graph embeddings by a novel aggregation strategy to incorporate the edge direction information into the node embeddings. We also propose an attention based mechanism that aligns node embeddings and decoding sequence to better cope with large graphs. Experimental results on bAbI task, Shortest Path Task, and Natural Language Generation Task demonstrate that our model achieves the stateoftheart performance and significantly outperforms other baselines. We also show that with the proposed aggregation strategy, our proposed model is able to quickly converge to good performance. 
graph2vec  Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn datadriven distributed representations of arbitrary sized graphs. graph2vec’s embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches. Our experiments on several benchmark and large realworld datasets show that graph2vec achieves significant improvements in classification and clustering accuracies over substructure representation learning approaches and are competitive with stateoftheart graph kernels. 
GraphAdaptive Pruning (GAP) 
In this work, we propose a graphadaptive pruning (GAP) method for efficient inference of convolutional neural networks (CNNs). In this method, the network is viewed as a computational graph, in which the vertices denote the computation nodes and edges represent the information flow. Through topology analysis, GAP is capable of adapting to different network structures, especially the widely used cross connections and multipath data flow in recent novel convolutional models. The models can be adaptively pruned at vertexlevel as well as edgelevel without any postprocessing, thus GAP can directly get practical model compression and inference speedup. Moreover, it does not need any customized computation library or hardware support. Finetuning is conducted after pruning to restore the model performance. In the finetuning step, we adopt a selftaught knowledge distillation (KD) strategy by utilizing information from the original model, through which, the performance of the optimized model can be sufficiently improved, without introduction of any other teacher model. Experimental results show the proposed GAP can achieve promising result to make inference more efficient, e.g., for ResNeXt29 on CIFAR10, it can get 13X model compression and 4.3X practical speedup with marginal loss of accuracy. 
Graphbased Activity Regularization (GAR) 
In this paper, we propose a novel graphbased approach for semisupervised learning problems, which considers an adaptive adjacency of the examples throughout the unsupervised portion of the training. Adjacency of the examples is inferred using the predictions of a neural network model which is first initialized by a supervised pretraining. These predictions are then updated according to a novel unsupervised objective which regularizes another adjacency, now linking the output nodes. Regularizing the adjacency of the output nodes, inferred from the predictions of the network, creates an easier optimization problem and ultimately provides that the predictions of the network turn into the optimal embedding. Ultimately, the proposed framework provides an effective and scalable graphbased solution which is natural to the operational mechanism of deep neural networks. Our results show stateoftheart performance within semisupervised learning with the highest accuracies reported to date in the literature for SVHN and NORB datasets. 
GraphBased Broad BehaviorAware Network (GBBAN) 
In this paper, we propose a heuristic recommendation system for interactive news, called the graphbased broad behavioraware network (GBBAN). Different from most of existing work, our network considers six behaviors that may potentially be conducted by users, including unclick, click, like, follow, comment, and share. Further, we introduce the core and coritivity concept from graph theory into the system to measure the concentration degree of interests of each user, which we show can help to improve the performance even further if it’s considered. There are three critical steps in our recommendation system. First, we build a structured userdependent interaction behavior graph for multilevel and multicategory data as a preprocessing step. This graph constructs the data sources and knowledge information which will be used in GBBAN through representation learning. Second, for each user node on the graph, we calculate its core and coritivity and then add the pair as a new feature associated to this user. According to the definition of core and coritivity, this userdependent feature provides useful insights into the concentration degree of his/her interests and affects the tradeoff between accuracy and diversity of the personalized recommendation. Last, we represent item (news) information by entity semantics and environment semantics; design a multichannel convolutional neural network called GCNN to learn the semantic information and an attentionbased LSTM to learn user’s behavior representation; combine with previous concentration feature and input into another two fully connected layers to finish the classification task. The whole network consists of the final GBBAN. Through comparing with baselines and several variates of itself, our proposed method shows the superior performance in extensive experiments. 
GraphBased Collaborative Filtering (GCF) 
Introducing consumed items as users’ implicit feedback in matrix factorization (MF) method, SVD++ is one of the most effective collaborative filtering methods for personalized recommender systems. Though powerful, SVD++ has two limitations: (i). only userside implicit feedback is utilized, whereas itemside implicit feedback, which can also enrich item representations, is not leveraged;(ii). in SVD++, the interacted items are equally weighted when combining the implicit feedback, which can not reflect user’s true preferences accurately. To tackle the above limitations, in this paper we propose Graphbased collaborative filtering (GCF) model, Weighted Graphbased collaborative filtering (WGCF) model and Attentive Graphbased collaborative filtering (AGCF) model, which (i). generalize the implicit feedback to item side based on the useritem bipartite graph; (ii). flexibly learn the weights of individuals in the implicit feedback hence improve the model’s capacity. Comprehensive experiments show that our proposed models outperform stateoftheart models.For sparse implicit feedback scenarios, additional improvement is further achieved by leveraging the steptwo implicit feedback information. 
GraphBased Global Reasoning Network  Globally modeling and reasoning over relations between regions can be beneficial for many computer vision tasks on both images and videos. Convolutional Neural Networks (CNNs) excel at modeling local relations by convolution operations, but they are typically inefficient at capturing global relations between distant regions and require stacking multiple convolution layers. In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed. After reasoning, relationaware features are distributed back to the original coordinate space for downstream tasks. We further present a highly efficient instantiation of the proposed approach and introduce the Global Reasoning unit (GloRe unit) that implements the coordinateinteraction space mapping by weighted global pooling and weighted broadcasting, and the relation reasoning via graph convolution on a small graph in interaction space. The proposed GloRe unit is lightweight, endtoend trainable and can be easily plugged into existing CNNs for a wide range of tasks. Extensive experiments show our GloRe unit can consistently boost the performance of stateoftheart backbone architectures, including ResNet, ResNeXt, SENet and DPN, for both 2D and 3D CNNs, on image classification, semantic segmentation and video action recognition task. 
GraphBLAS  An effort to define standard building blocks for Graph Algorithms in the language of Linear Algebra. 
GraphBolt  Graphs are found in a plethora of domains, including online social networks, the World Wide Web and the study of epidemics, to name a few. With the advent of greater volumes of information and the need for continuously updated results under temporal constraints, it is necessary to explore novel approaches that further enable performance improvements. In the scope of stream processing over graphs, we research the tradeoffs between result accuracy and the speedup of approximate computation techniques. We believe this to be a natural path towards these performance improvements. Herein we present GraphBolt, through which we conducted our research. It is an innovative model for approximate graph processing, implemented in Apache Flink. We analyze our model and evaluate it with the case study of the PageRank algorithm, perhaps the most famous measure of vertex centrality used to rank websites in search engine results. In light of our model, we discuss the challenges driven by relations between result accuracy and potential performance gains. Our experiments show that GraphBolt can reduce computational time by over 50% while achieving result quality above 95% when compared to results of the traditional version of PageRank without any summarization or approximation techniques. 
GraphCage  Efficient Graph processing is challenging because of the irregularity of graph algorithms. Using GPUs to accelerate irregular graph algorithms is even more difficult to be efficient, since GPU’s highly structured SIMT architecture is not a natural fit for irregular applications. With lots of previous efforts spent on subtly mapping graph algorithms onto the GPU, the performance of graph processing on GPUs is still highly memorylatency bound, leading to low utilization of compute resources. Random memory accesses generated by the sparse graph data structure are the major causes of this significant memory access latency. Simply applying the conventional cache blocking technique proposed for matrix computation have limited benefit due to the significant overhead on the GPU. We propose GraphCage, a cache centric optimization framework for highly efficient graph processing on GPUs. We first present a throughputoriented cache blocking scheme (TOCAB) in both push and pull directions. Comparing with conventional cache blocking which suffers repeated accesses when processing large graphs on GPUs, TOCAB is specifically optimized for the GPU architecture to reduce this overhead and improve memory access efficiency. To integrate our scheme into stateoftheart implementations without significant overhead, we coordinate TOCAB with load balancing strategies by considering the sparsity of subgraphs. To enable cache blocking for traversalbased algorithms, we consider the benefit and overhead in different iterations with different working set sizes, and apply TOCAB for topologydriven kernels in pull direction. Evaluation shows that GraphCage can improve performance by 2 ~ 4x compared to hand optimized implementations and stateoftheart frameworks (e.g. CuSha and Gunrock), with less memory consumption than CuSha. 
GraphChain  Blockchain technology is ushering in another break out year, the challenge of blockchain still remains to be solved. This paper analyzes the features of Bitcoin and BitcoinNG system based on blockchian, proposes an improved method of implementing blockchain systems by replacing the structure of the original chain with the graph data structure. It was named GraphChain. Each block represents a transaction and contains the balance status of the traders. Additionally, as everyone knows all the transactions in Bitcoin system will be baled by only one miner that will result in a lot of wasted effort, so another way to improve resource utilization is to change the original way to compete for miner to election and parallel mining. Researchers simulated blockchain with graph structure and parallel mining through python, and suggested the conceptual new graph model which can improve both capacity and performance. 
GraphConnect  Deep neural networks have proved very successful in domains where large training sets are available, but when the number of training samples is small, their performance suffers from overfitting. Prior methods of reducing overfitting such as weight decay, Dropout and DropConnect are dataindependent. This paper proposes a new method, GraphConnect, that is datadependent, and is motivated by the observation that data of interest lie close to a manifold. The new method encourages the relationships between the learned decisions to resemble a graph representing the manifold structure. Essentially GraphConnect is designed to learn attributes that are present in data samples in contrast to weight decay, Dropout and DropConnect which are simply designed to make it more difficult to fit to random error or noise. Empirical Rademacher complexity is used to connect the generalization error of the neural network to spectral properties of the graph learned from the input data. This framework is used to show that GraphConnect is superior to weight decay. Experimental results on several benchmark datasets validate the theoretical analysis, and show that when the number of training samples is small, GraphConnect is able to significantly improve performance over weight decay. 
Graphene  We introduce Graphene, an Open IE system whose goal is to generate accurate, meaningful and complete propositions that may facilitate a variety of downstream semantic applications. For this purpose, we transform syntactically complex input sentences into clean, compact structures in the form of core facts and accompanying contexts, while identifying the rhetorical relations that hold between them in order to maintain their semantic relationship. In that way, we preserve the context of the relational tuples extracted from a source sentence, generating a novel lightweight semantic representation for Open IE that enhances the expressiveness of the extracted propositions. 
GraphGAN  The goal of graph representation learning is to embed each vertex in a graph into a lowdimensional vector space. Existing graph representation learning methods can be classified into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper, we propose GraphGAN, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a gametheoretical minimax game. Specifically, for a given vertex, the generative model tries to fit its underlying true connectivity distribution over all other vertices and produces ‘fake’ samples to fool the discriminative model, while the discriminative model tries to detect whether the sampled vertex is from ground truth or generated by the generative model. With the competition between these two models, both of them can alternately and iteratively boost their performance. Moreover, when considering the implementation of generative model, we propose a novel graph softmax to overcome the limitations of traditional softmax function, which can be proven satisfying desirable properties of normalization, graph structure awareness, and computational efficiency. Through extensive experiments on realworld datasets, we demonstrate that GraphGAN achieves substantial gains in a variety of applications, including link prediction, node classification, and recommendation, over stateoftheart baselines. 
GraphGuided Fused LASSO (GFLASSO) 
Let X be a matrix of size n × p , with n observations and p predictors and Y a matrix of size n × k, with the same n observations and k responses, say, 1390 distinct electronics purchase records in 73 countries, to predict the ratings of 50 Netflix productions over all 73 countries. Models well poised for modeling pairs of highdimensional datasets include orthogonal twoway Partial Least Squares (O2PLS), Canonical Correlation Analysis (CCA) and CoInertia Analysis (CIA), all of which involving matrix decomposition. Additionally, since these models are based on latent variables (that is, projections based on the original predictors), the computational efficiency comes at a cost of interpretability. However, this tradeoff does not always pay off, and can be reverted with the direct prediction of k individual responses from selected features in X, in a unified regression framework that takes into account the relationships among the responses. Mathematically, the GFLASSO borrows the regularization of the LASSO discussed above and builds the model on the graph dependency structure underlying Y, as quantified by the k × k correlation matrix (that is the ‘strength of association’ that you read about earlier). As a result, similar (or dissimilar) responses will be explained by a similar (or dissimilar) subset of selected predictors. 
GraphH  It is common for realworld applications to analyze big graphs using distributed graph processing systems. Popular inmemory systems require an enormous amount of resources to handle big graphs. While several outofcore systems have been proposed recently for processing big graphs using secondary storage, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable high performance big graph analytics in small clusters. Specifically, we design a twostage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (GatherApply Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular inmemory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed outofcore systems, such as GraphD and Chaos when processing big graphs. 
Graphical Causal Models  A species of the broader genus of graphical models, especially intended to help with problems of causal inference. 
Graphical Generative Adversarial Network (GraphicalGAN) 
We propose Graphical Generative Adversarial Networks (GraphicalGAN) to model structured data. GraphicalGAN conjoins the power of Bayesian networks on compactly representing the dependency structures among random variables and that of generative adversarial networks on learning expressive dependency functions. We introduce a structured recognition model to infer the posterior distribution of latent variables given observations. We propose two alternative divergence minimization approaches to learn the generative model and recognition model jointly. The first one treats all variables as a whole, while the second one utilizes the structural information by checking the individual local factors defined by the generative model and works better in practice. Finally, we present two important instances of GraphicalGAN, i.e. Gaussian Mixture GAN (GMGAN) and State Space GAN (SSGAN), which can successfully learn the discrete and temporal structures on visual datasets, respectively. 
Graphical Inference in ObservedHidden Variable Merged Seeded Network (GIOHMS) 
Discovery of communities in complex networks is a topic of considerable recent interest within the complex systems community. Due to the dynamic and rapidly evolving nature of largescale networks, like online social networks, the notion of stronger local and global interactions among the nodes in communities has become harder to capture. In this paper, we present a novel graphical inference method – GIOHMS (Graphical Inference in ObservedHidden variable Merged Seeded network) to solve the problem of overlapping community detection. The novelty of our approach is in transforming the complex and dense network of interest into an observedhidden merged seeded(OHMS) network, which preserves the important community properties of the network. We further utilize a graphical inference method (Bayesian Markov Random Field) to extract communities. The superiority of our approach lies in two main observations: 1) The extracted OHMS network excludes many weaker connections, thus leading to a higher accuracy of inference 2) The graphical inference step operates on a smaller network, thus having much lower execution time. We demonstrate that our method outperforms the accuracy of other baseline algorithms like OSLOM, DEMON, and LEMON. To further improve execution time, we have a multithreaded implementation and demonstrate significant speedup compared to stateoftheart algorithms. 
Graphical Kernel System (GKS) 
The Graphical Kernel System (GKS) is a document produced by the International Standards Organization (ISO) which defines a common interface to interactive computer graphics for application programs. GKS has been designed by a group of experts representing the national standards institutions of most major industrialized countries. The full standard provides functional specifications for some 200 subroutines which perform graphics input and output in a device independent way. Application programs can thus move freely between different graphics devices and different host computers. For the first time graphics programs have become genuinely portable. 
Graphical Markov Models (GMM) 
A central aspect of statistical science is the assessment of dependence among stochastic variables. The familiar concepts of correlation, regression, and prediction are special cases, and identification of causal relationships ultimately rests on representations of multivariate dependence. Graphical Markov models (GMM) use graphs, either undirected, directed, or mixed, to represent multivariate dependences in a visual and computationally efficient manner. A GMM is usually constructed by specifying local dependences for each variable, equivalently, node of the graph in terms of its immediate neighbors and/or parents by means of undirected and/or directed edges. This simple local specification can represent a highly varied and complex system of multivariate dependences by means of the global structure of the graph, thereby obtaining efficiency in modeling, inference, and probabilistic calculations. For a fixed graph, equivalently model, the classical methods of statistical inference may be utilized. In many applied domains, however, such as expert systems for medical diagnosis or weather forecasting, or the analysis of geneexpression data, the graph is unknown and is itself the first goal of the analysis. This poses numerous challenges, including the following: · The numbers of possible graphs and models grow superexponentially in the number of variables. · Distinct graphs G may be Markov equivalent = statistically indistinguishable. · Conversely, the same graph may possess different Markov interpretations. ggm 
Graphical Model  A graphical model is a probabilistic model for which a graph denotes the conditional dependence structure between random variables. They are commonly used in probability theory, statistics – particularly Bayesian statistics – and machine learning. Generally, probabilistic graphical models use a graphbased representation as the foundation for encoding a complete distribution over a multidimensional space and a graph that is a compact or factorized representation of a set of independences that hold in the specific distribution. Two branches of graphical representations of distributions are commonly used, namely, Bayesian networks and Markov networks. Both families encompass the properties of factorization and independences, but they differ in the set of independences they can encode and the factorization of the distribution that they induce. 
GraphIE  Most modern Information Extraction (IE) systems are implemented as sequential taggers and focus on modelling local dependencies. Nonlocal and nonsequential context is, however, a valuable source of information to improve predictions. In this paper, we introduce GraphIE, a framework that operates over a graph representing both local and nonlocal dependencies between textual units (i.e. words or sentences). The algorithm propagates information between connected nodes through graph convolutions and exploits the richer representation to improve word level predictions. The framework is evaluated on three different tasks, namely social media, textual and visual information extraction. Results show that GraphIE outperforms a competitive baseline (BiLSTM+CRF) in all tasks by a significant margin. 
Graphite  Graphs are a fundamental abstraction for modeling relational data. However, graphs are discrete and combinatorial in nature, and learning representations suitable for machine learning tasks poses statistical and computational challenges. In this work, we propose Graphite an algorithmic framework for unsupervised learning of representations over nodes in a graph using deep latent variable generative models. Our model is based on variational autoencoders (VAE), and differs from existing VAE frameworks for data modalities such as images, speech, and text in the use of graph neural networks for parameterizing both the generative model (i.e., decoder) and inference model (i.e., encoder). The use of graph neural networks directly incorporates inductive biases due to the spatial, local structure of graphs directly in the generative model. Moreover, we draw novel connections between graph neural networks and approximate inference via kernel embeddings of distributions. We demonstrate empirically that Graphite outperforms stateoftheart approaches for the tasks of density estimation, link prediction, and node classification on synthetic and benchmark datasets. 
GraphMP  Recent studies showed that singlemachine graph processing systems can be as highly competitive as clusterbased approaches on largescale problems. While several outofcore graph processing systems and computation models have been proposed, the high disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertexcentric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge shards on disk. Third, we use a compressed edge cache mechanism to fully utilize the available memory of a machine to reduce the amount of disk accesses for edges. Extensive evaluations have shown that GraphMP could outperform existing singlemachine outofcore systems such as GraphChi, XStream and GridGraph by up to 51, and can be as highly competitive as distributed graph engines like Pregel+, PowerGraph and Chaos. 
Graphonomy  Prior highlytuned human parsing models tend to fit towards each dataset in a specific domain or with discrepant label granularity, and can hardly be adapted to other human parsing tasks without extensive retraining. In this paper, we aim to learn a single universal human parsing model that can tackle all kinds of human parsing needs by unifying label annotations from different domains or at various levels of granularity. This poses many fundamental learning challenges, e.g. discovering underlying semantic structures among different label granularity, performing proper transfer learning across different image domains, and identifying and utilizing label redundancies across related tasks. To address these challenges, we propose a new universal human parsing agent, named ‘Graphonomy’, which incorporates hierarchical graph transfer learning upon the conventional parsing network to encode the underlying label semantic structures and propagate relevant semantic information. In particular, Graphonomy first learns and propagates compact highlevel graph representation among the labels within one dataset via IntraGraph Reasoning, and then transfers semantic information across multiple datasets via InterGraph Transfer. Various graph transfer dependencies (\eg, similarity, linguistic knowledge) between different datasets are analyzed and encoded to enhance graph transfer capability. By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity. Experimental results show Graphonomy effectively achieves the stateoftheart results on three human parsing benchmarks as well as advantageous universal human parsing performance. 
GraphQL  1. A query language for your API: GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools. 2. Ask for what you need, get exactly that: Send a GraphQL query to your API and get exactly what you need, nothing more and nothing less. GraphQL queries always return predictable results. Apps using GraphQL are fast and stable because they control the data they get, not the server. 3. Get many resources in a single request: GraphQL queries access not just the properties of one resource but also smoothly follow references between them. While typical REST APIs require loading from multiple URLs, GraphQL APIs get all the data your app needs in a single request. Apps using GraphQL can be quick even on slow mobile network connections. 4. Describe what’s possible with a type system: GraphQL APIs are organized in terms of types and fields, not endpoints. Access the full capabilities of your data from a single endpoint. GraphQL uses types to ensure Apps only ask for what’s possible and provide clear and helpful errors. Apps can use types to avoid writing manual parsing code. 5. Evolve your API without versions: Add new fields and types to your GraphQL API without impacting existing queries. Aging fields can be deprecated and hidden from tools. By using a single evolving version, GraphQL APIs give apps continuous access to new features and encourage cleaner, more maintainable server code. 6. Bring your own data and code: GraphQL creates a uniform API across your entire application without being limited by a specific storage engine. Write GraphQL APIs that leverage your existing data and code with GraphQL engines available in many languages. You provide functions for each field in the type system, and GraphQL calls them with optimal concurrency. Why GraphQL is the future of APIs 
GraphRec  In recent years, Graph Neural Networks (GNNs), which can naturally integrate node information and topological structure, have been demonstrated to be powerful in learning on graph data. These advantages of GNNs provide great potential to advance social recommendation since data in social recommender systems can be represented as useruser social graph and useritem graph; and learning latent factors of users and items is the key. However, building social recommender systems based on GNNs faces challenges. For example, the useritem graph encodes both interactions and their associated opinions; social relations have heterogeneous strengths; users involve in two graphs (e.g., the useruser social graph and the useritem graph). To address the three aforementioned challenges simultaneously, in this paper, we present a novel graph neural network framework (GraphRec) for social recommendations. In particular, we provide a principled approach to jointly capture interactions and opinions in the useritem graph and propose the framework GraphRec, which coherently models two graphs and heterogeneous strengths. Extensive experiments on two realworld datasets demonstrate the effectiveness of the proposed framework GraphRec. 
GraphRegularized Multiview Canonical Correlation Analysis (GMCCA) 
Multiview canonical correlation analysis (MCCA) seeks latent lowdimensional representations encountered with multiview data of shared entities (a.k.a. common sources). However, existing MCCA approaches do not exploit the geometry of the common sources, which may be available \emph{a priori}, or can be constructed using certain domain knowledge. This prior information about the common sources can be encoded by a graph, and be invoked as a regularizer to enrich the maximum variance MCCA framework. In this context, the present paper’s novel graphregularized Multiview canonical correlation analysis (G) MCCA approach minimizes the distance between the wanted canonical variables and the common lowdimensional representations, while accounting for graphinduced knowledge of the common sources. Relying on a function capturing the extent lowdimensional representations of the multiple views are similar, a generalization bound of GMCCA is established based on Rademacher’s complexity. Tailored for setups where the number of data pairs is smaller than the data vector dimensions, a graphregularized dual MCCA approach is also developed. To further deal with nonlinearities present in the data, graphregularized kernel MCCA variants are put forward too. Interestingly, solutions of the graphregularized linear, dual, and kernel MCCA, are all provided in terms of generalized eigenvalue decomposition. Several corroborating numerical tests using real datasets are provided to showcase the merits of the graphregularized MCCA variants relative to several competing alternatives including MCCA, Laplacianregularized MCCA, and (graphregularized) PCA. 
GraphRNN  Modeling and generating graphs is fundamental for studying networks in biology, engineering, and social sciences. However, modeling complex distributions over graphs and then efficiently sampling from these distributions is challenging due to the nonunique, highdimensional nature of graphs and the complex, nonlocal dependencies that exist between edges in a given graph. Here we propose GraphRNN, a deep autoregressive model that addresses the above challenges and approximates any distribution of graphs with minimal assumptions about their structure. GraphRNN learns to generate graphs by training on a representative set of graphs and decomposes the graph generation process into a sequence of node and edge formations, conditioned on the graph structure generated so far. In order to quantitatively evaluate the performance of GraphRNN, we introduce a benchmark suite of datasets, baselines and novel evaluation metrics based on Maximum Mean Discrepancy, which measure distances between sets of graphs. Our experiments show that GraphRNN significantly outperforms all baselines, learning to generate diverse graphs that match the structural characteristics of a target set, while also scaling to graphs 50 times larger than previous deep models. 
GraphSAGE  As an efficient and scalable graph neural network, GraphSAGE has enabled an inductive capability for inferring unseen nodes or graphs by aggregating subsampled local neighborhoods and by learning in a minibatch gradient descent fashion. The neighborhood sampling used in GraphSAGE is effective in order to improve computing and memory efficiency when inferring a batch of target nodes with diverse degrees in parallel. Despite this advantage, the default uniform sampling suffers from high variance in training and inference, leading to suboptimum accuracy. We propose a new datadriven sampling approach to reason about the realvalued importance of a neighborhood by a nonlinear regressor, and to use the value as a criterion for subsampling neighborhoods. The regressor is learned using a valuebased reinforcement learning. The implied importance for each combination of vertex and neighborhood is inductively extracted from the negative classification loss output of GraphSAGE. As a result, in an inductive node classification benchmark using three datasets, our method enhanced the baseline using the uniform sampling, outperforming recent variants of a graph neural network in accuracy. 
GraphSE^2  In this paper, we propose GraphSE$^2$, an encrypted graph database for online social network services to address massive data breaches. GraphSE$^2$ preserves the functionality of social search, a key enabler for quality social network services, where social search queries are conducted on a largescale social graph and meanwhile perform set and computational operations on usergenerated contents. To enable efficient privacypreserving social search, GraphSE$^2$ provides an encrypted structural data model to facilitate parallel and encrypted graph data access. It is also designed to decompose complex social search queries into atomic operations and realise them via interchangeable protocols in a fast and scalable manner. We build GraphSE$^2$ with various queries supported in the Facebook graph search engine and implement a fullfledged prototype. Extensive evaluations on Azure Cloud demonstrate that GraphSE$^2$ is practical for querying a social graph with a million of users. 
GraphSGAN  We investigate how generative adversarial nets (GANs) can help semisupervised learning on graphs. We first provide insights on working principles of adversarial learning over graphs and then present GraphSGAN, a novel approach to semisupervised learning on graphs. In GraphSGAN, generator and classifier networks play a novel competitive game. At equilibrium, generator generates fake samples in lowdensity areas between subgraphs. In order to discriminate fake samples from the real, classifier implicitly takes the density property of subgraph into consideration. An efficient adversarial learning algorithm has been developed to improve traditional normalized graph Laplacian regularization with a theoretical guarantee. Experimental results on several different genres of datasets show that the proposed GraphSGAN significantly outperforms several stateoftheart methods. GraphSGAN can be also trained using minibatch, thus enjoys the scalability advantage. 
GraphSparse Logistic Regression  We introduce GraphSparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val idate this algorithm against synthetic data and benchmark it against L1regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package. 
GraphStructured Contrastive Loss  We present a fullysupervized method for learning to segment data structured by an adjacency graph. We introduce the graphstructured contrastive loss, a loss function structured by a ground truth segmentation. It promotes learning vertex embeddings which are homogeneous within desired segments, and have high contrast at their interface. Thus, computing a piecewiseconstant approximation of such embeddings produces a graphpartition close to the objective segmentation. This loss is fully backpropagable, which allows us to learn vertex embeddings with deep learning algorithms. We evaluate our methods on a 3D point cloud oversegmentation task, defining a new stateoftheart by a large margin. These results are based on the published work of Landrieu and Boussaha 2019. 
Graphtropy  A new conceptual foundation for the notion of ‘information’ is proposed, based on the concept of a ‘distinction graph’: a graph in which two nodes are connected iff they cannot be distinguished by a particular observer. The ‘graphtropy’ of a distinction graph is defined as the average connection probability of two nodes; in the case where the distinction graph is a composed of disconnected components that are fully connected subgraphs, this is equivalent to Ellerman’s logical entropy, which has straightforward relationships to Shannon entropy. Probabilistic distinction graphs and probabilistic graphtropy are also considered, as well as connections between graphtropy and thermodynamic and quantum entropy. The semantics of the Second Law of Thermodynamics and the Maximum Entropy Production Principle are unfolded in a novel way, via analysis of the cognitive processes underlying the making of distinction graphs This evokes an interpretation in which complex intelligence is seen to correspond to states of consciousness with intermediate graphtropy, which are associated with memory imperfections that violate the assumptions leading to derivation of the Second Law. In the case where nodes of a distinction graph are labeled by computable entities, graphtropy is shown to be monotonically related to the average algorithmic information of the nodes (relative to to the algorithmic information of the observer). A quantummechanical version of distinction graphs is considered, in which distinctions can exist in a superposed state; this yields to graphtropy as a measure of the impurity of a mixed state, and to a concept of ‘quangraphtropy.’ Finally, a novel computational model called Dynamic Distinction Graphs (DDGs) is formulated, via enhancing distinction graphs with additional links expressing causal implications, enabling a distinctionbased model of ‘observers.’ 
GraphTSNE  We present GraphTSNE, a novel visualization technique for graphstructured data based on tSNE. The growing interest in graphstructured data increases the importance of gaining human insight into such datasets by means of visualization. However, among the most popular visualization techniques, classical tSNE is not suitable on such datasets because it has no mechanism to make use of information from graph connectivity. On the other hand, standard graph visualization techniques, such as Laplacian Eigenmaps, have no mechanism to make use of information from node features. Our proposed method GraphTSNE is able to produce visualizations which account for both graph connectivity and node features. It is based on scalable and unsupervised training of a graph convolutional network on a modified tSNE loss. By assembling a suite of evaluation metrics, we demonstrate that our method produces desirable visualizations on three benchmark datasets. 
GraphVite  Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity and effectiveness in a variety of applications. Most of existing node embedding algorithms and systems are capable of processing networks with hundreds of thousands or a few millions of nodes. However, how to scale them to networks that have tens of millions or even hundreds of millions of nodes remains a challenging problem. In this paper, we propose GraphVite, a highperformance CPUGPU hybrid system for training node embeddings, by cooptimizing the algorithm and the system. On the CPU end, augmented edge samples are parallelly generated by random walks in an online fashion on the network, and serve as the training data. On the GPU end, a novel parallel negative sampling is proposed to leverage multiple GPUs to train node embeddings simultaneously, without much data transfer and synchronization. Moreover, an efficient collaboration strategy is proposed to further reduce the synchronization cost between CPUs and GPUs. Experiments on multiple realworld networks show that GraphVite is super efficient. It takes only about one minute for a network with 1 million nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20 hours for a network with 66 million nodes and 1.8 billion edges. Compared to the current fastest system, GraphVite is about 50 times faster without any sacrifice on performance. 
Graphviz  Graphviz (short for Graph Visualization Software) is a package of opensource tools initiated by AT&T Labs Research for drawing graphs specified in DOT language scripts. It also provides libraries for software applications to use the tools. Graphviz is free software licensed under the Eclipse Public License. https://…/viz.js 
GraphX  GraphX is Apache Spark’s API for graphs and graphparallel computation. ➚ “Apache Spark” https://…/graphx.pdf 
GrAPL  In this paper, we introduce a new online decision making paradigm that we call Thresholding Graph Bandits. The main goal is to efficiently identify a subset of arms in a multiarmed bandit problem whose means are above a specified threshold. While traditionally in such problems, the arms are assumed to be independent, in our paradigm we further suppose that we have access to the similarity between the arms in the form of a graph, allowing us gain information about the arm means in fewer samples. Such settings play a key role in a wide range of modern decision making problems where rapid decisions need to be made in spite of the large number of options available at each time. We present GrAPL, a novel algorithm for the thresholding graph bandit problem. We demonstrate theoretically that this algorithm is effective in taking advantage of the graph structure when available and the reward function homophily (that strongly connected arms have similar rewards) when favorable. We confirm these theoretical findings via experiments on both synthetic and real data. 
Grasp Quality Spatial Transformer Network (GQSTN) 
Grasping is a fundamental robotic task needed for the deployment of household robots or furthering warehouse automation. However, few approaches are able to perform grasp detection in real time (frame rate). To this effect, we present Grasp Quality Spatial Transformer Network (GQSTN), a oneshot grasp detection network. Being based on the Spatial Transformer Network (STN), it produces not only a grasp configuration, but also directly outputs a depth image centered at this configuration. By connecting our architecture to an externallytrained grasp robustness evaluation network, we can train efficiently to satisfy a robustness metric via the backpropagation of the gradient emanating from the evaluation network. This removes the difficulty of training detection networks on sparsely annotated databases, a common issue in grasping. We further propose to use this robustness classifier to compare approaches, being more reliable than the traditional rectangle metric. Our GQSTN is able to detect robust grasps on the depth images of the DexNet 2.0 dataset with 92.4 % accuracy in a single pass of the network. We finally demonstrate in a physical benchmark that our method can propose robust grasps more often than previous samplingbased methods, while being more than 60 times faster. 
Gravitational Clustering  The downfall of many supervised learning algorithms, such as neural networks, is the inherent need for a large amount of training data. Although there is a lot of buzz about big data, there is still the problem of doing classification from a small dataset. Other methods such as support vector machines, although capable of dealing with few samples, are inherently binary classifiers, and are in need of learning strategies such as One vs All in the case of multiclassification. In the presence of a large number of classes this can become problematic. In this paper we present, a novel approach to supervised learning through the method of clustering. Unlike traditional methods such as KMeans, Gravitational Clustering does not require the initial number of clusters, and automatically builds the clusters, individual samples can be arbitrarily weighted and it requires only few samples while staying resilient to overfitting. 
Graybox Adversarial Training  Adversarial samples are perturbed inputs crafted to mislead the machine learning systems. A training mechanism, called adversarial training, which presents adversarial samples along with clean samples has been introduced to learn robust models. In order to scale adversarial training for large datasets, these perturbations can only be crafted using fast and simple methods (e.g., gradient ascent). However, it is shown that adversarial training converges to a degenerate minimum, where the model appears to be robust by generating weaker adversaries. As a result, the models are vulnerable to simple blackbox attacks. In this paper we, (i) demonstrate the shortcomings of existing evaluation policy, (ii) introduce novel variants of whitebox and blackbox attacks, dubbed graybox adversarial attacks’ based on which we propose novel evaluation method to assess the robustness of the learned models, and (iii) propose a novel variant of adversarial training, named Graybox Adversarial Training’ that uses intermediate versions of the models to seed the adversaries. Experimental evaluation demonstrates that the models trained using our method exhibit better robustness compared to both undefended and adversarially trained model 
Greedy Algorithm  A greedy algorithm is an algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum. In many problems, a greedy strategy does not in general produce an optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that approximate a global optimal solution in a reasonable time. For example, a greedy strategy for the traveling salesman problem (which is of a high computational complexity) is the following heuristic: ‘At each stage visit an unvisited city nearest to the current city’. This heuristic need not find a best solution, but terminates in a reasonable number of steps; finding an optimal solution typically requires unreasonably many steps. In mathematical optimization, greedy algorithms solve combinatorial problems having the properties of ➘ “Matroid”s. 
Greedy Algorithm for Robust deNoising (GARD) 

Greedy Neural Architecture Search (GNAS) 
A key problem in deep multiattribute learning is to effectively discover the interattribute correlation structures. Typically, the conventional deep multiattribute learning approaches follow the pipeline of manually designing the network architectures based on taskspecific expertise prior knowledge and careful network tunings, leading to the inflexibility for various complicated scenarios in practice. Motivated by addressing this problem, we propose an efficient greedy neural architecture search approach (GNAS) to automatically discover the optimal treelike deep architecture for multiattribute learning. In a greedy manner, GNAS divides the optimization of global architecture into the optimizations of individual connections step by step. By iteratively updating the local architectures, the global treelike architecture gets converged where the bottom layers are shared across relevant attributes and the branches in top layers more encode attributespecific features. Experiments on three benchmark multiattribute datasets show the effectiveness and compactness of neural architectures derived by GNAS, and also demonstrate the efficiency of GNAS in searching neural architectures. 
Greedy Randomized Adaptive Search Procedures (GRASP) 
The greedy randomized adaptive search procedure (also known as GRASP) is a metaheuristic algorithm commonly applied to combinatorial optimization problems. GRASP typically consists of iterations made up from successive constructions of a greedy randomized solution and subsequent iterative improvements of it through a local search. The greedy randomized solutions are generated by adding elements to the problem’s solution set from a list of elements ranked by a greedy function according to the quality of the solution they will achieve. To obtain variability in the candidate set of greedy solutions, wellranked candidate elements are often placed in a restricted candidate list (also known as RCL), and chosen at random when building up the solution. This kind of greedy randomized construction method is also known as a semigreedy heuristic, first described in Hart and Shogan (1987). GRASP was first introduced in Feo and Resende (1989). Survey papers on GRASP include Feo and Resende (1995), Pitsoulis and Resende (2002), and Resende and Ribeiro (2003). An annotated bibliography of GRASP can be found in Festa, G. C Resende (2002). 
Greedy Shallow Network  We present a novel greedy approach to obtain a single layer neural network approximation to a target function with the use of a ReLU activation function. In our approach we construct a shallow network by utilizing a greedy algorithm where the set of possible inner weights acts as a parametrization of the prescribed dictionary. To facilitate the greedy selection we employ an integral representation of the network, based on the ridgelet transform, that significantly reduces the cardinality of the dictionary and hence promotes feasibility of the proposed method. Our approach allows for the construction of efficient architectures which can be treated either as improved initializations to be used in place of randombased alternatives, or as fullytrained networks, thus potentially nullifying the need for training and/or calibrating based on backpropagation. Numerical experiments demonstrate the tenability of the proposed concept and its advantages compared to the classical techniques for training and constructing neural networks. 
Greenhouse  Greenhouse – a zeropositive machine learning system for timeseries anomaly detection. 
Greenplum Database (GPDB) 
The Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced costbased query optimizer delivering high analytical query performance on large data volumes. The Greenplum project is released under the Apache 2 license. We want to thank all our current community contributors and are really interested in all new potential contributions. For the Greenplum Database community no contribution is too small, we encourage all types of contributions. 
Greenwald and Khanna Algorithm  Copulas for Streaming Data 
greta  greta lets us build statistical models interactively in R, and then sample from them by MCMC. We build greta models with greta array objects, which behave much like R’s array, matrix and vector objects for numeric data. Like those numeric data objects, greta arrays can be manipulated with functions and mathematical operators to create new greta arrays. The key difference between greta arrays and numeric data objects is that when you do something to a greta array, greta doesn’t calculate the values of the new greta array. Instead, it just remembers what operation to do, and works out the size and shape of the result. 
Gretel  We consider the problem of path inference: given a path prefix, i.e., a partially observed sequence of nodes in a graph, we want to predict which nodes are in the missing suffix. In particular, we focus on natural paths occurring as a byproduct of the interaction of an agent with a network—a driver on the transportation network, an information seeker in Wikipedia, or a client in an online shop. Our interest is sparked by the realization that, in contrast to shortestpath problems, natural paths are usually not optimal in any graphtheoretic sense, but might still follow predictable patterns. Our main contribution is a graph neural network called Gretel. Conditioned on a path prefix, this network can efficiently extrapolate path suffixes, evaluate path likelihood, and sample from the future path distribution. Our experiments with GPS traces on a road network and usernavigation paths in Wikipedia confirm that Gretel is able to adapt to graphs with very different properties, while also comparing favorably to previous solutions. 
Grey Box Model  In mathematics, statistics, and computational modelling, a grey box model combines a partial theoretical structure with data to complete the model. The theoretical structure may vary from information on the smoothness of results, to models that need only parameter values from data or existing literature. Thus, almost all models are grey box models as opposed to black box where no model form is assumed or white box models that are purely theoretical. Some models assume a special form such as a linear regression or neural network. These have special analysis methods. In particular linear regression techniques are much more efficient than most nonlinear techniques. The model can be deterministic or stochastic (i.e. containing random components) depending on its planned use. 
Grey Machine Learning  A brief introduction to the Grey Machine Learning 
Grey Machine Learning Model based Variable Separable (VSGML) 
The Grey Machine Learning Model based Variable Separable (VSGML) is presented in this paper. The VSGML’s function set is composed of the variable separable function. The DivideandConquer architecture based Radial Basis Function (DCRBF) Network is constructed to implement VSGML. This DCRBF is composed of several subRBF networks which takes each subspace as its input. The output of DCRBF is the sum of each subRBF networks’ output. The algorithm of DCRBF is given and its approximation ability also is discussed in this paper. The experimental results have shown that the DCRBF is outperforms the conventional RBF. A Grey Machine Learning Model with application to time series. Available from: https://…ing_Model_with_application_to_time_series [accessed May 07 2018]. 
Grid Computing  Grid computing is the use of widely distributed computer resources to reach a common goal. The grid can be thought of as a distributed system with noninteractive workloads that involve a large number of files. Grid computing is distinguished from conventional highperformance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed (thus not physically coupled) than cluster computers.[1] Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with generalpurpose grid middleware software libraries. Grid sizes can be quite large.[2] Grids are a form of distributed computing whereby a ‘super virtual computer’ is composed of many networked loosely coupled computers acting together to perform large tasks. For certain applications, distributed or grid computing can be seen as a special type of parallel computing that relies on complete computers (with onboard CPUs, storage, power supplies, network interfaces, etc.) connected to a computer network (private or public) by a conventional network interface, such as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many processors connected by a local highspeed computer bus. 
Grid Search  The de facto standard way of performing hyperparameter optimization is grid search, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by crossvalidation on the training set or evaluation on a heldout validation set. Since the parameter space of a machine learner may include realvalued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search. 
Grid Spectral Mixture Kernel (GSM) 
Gaussian processes (GP) for machine learning have been studied systematically over the past two decades and they are by now widely used in a number of diverse applications. However, GP kernel design and the associated hyperparameter optimization are still hard and to a large extend open problems. In this paper, we consider the task of GP regression for time series modeling and analysis. The underlying stationary kernel can be approximated arbitrarily close by a new proposed grid spectral mixture (GSM) kernel, which turns out to be a linear combination of lowrank subkernels. In the case where a large number of the subkernels are used, either the Nystr\'{o}m or the random Fourier feature approximations can be adopted to deal efficiently with the computational demands. The unknown GP hyperparameters consist of the nonnegative weights of all subkernels as well as the noise variance; their estimation is performed via the maximumlikelihood (ML) estimation framework. Two efficient numerical optimization methods for solving the unknown hyperparameters are derived, including a sequential majorizationminimization (MM) method and a nonlinearly constrained alternating direction of multiplier method (ADMM). The MM matches perfectly with the proven lowrank property of the proposed GSM subkernels and turns out to be a part of efficiency, stable, and efficient solver, while the ADMM has the potential to generate better local minimum in terms of the test MSE. Experimental results, based on various classic time series data sets, corroborate that the proposed GSM kernelbased GP regression model outperforms several salient competitors of similar kind in terms of prediction meansquarederror and numerical stability. 
GridNet  This paper presents GridNet, a new Convolutional Neural Network (CNN) architecture for semantic image segmentation (full scene labelling). Classical neural networks are implemented as one stream from the input to the output with subsampling operators applied in the stream in order to reduce the feature maps size and to increase the receptive field for the final prediction. However, for semantic image segmentation, where the task consists in providing a semantic class to each pixel of an image, feature maps reduction is harmful because it leads to a resolution loss in the output prediction. To tackle this problem, our GridNet follows a grid pattern allowing multiple interconnected streams to work at different resolutions. We show that our network generalizes many well known networks such as convdeconv, residual or UNet networks. GridNet is trained from scratch and achieves competitive results on the Cityscapes dataset. 
Gridster.js  Gridster is a jQuery plugin that allows building intuitive draggable layouts from elements spanning multiple columns. You can even dynamically add and remove elements from the grid. It is on par with sliced bread, or possibly better. MIT licensed. Drag and Drop Visuals in your Interactive Dashboard 
GromovWasserstein Distance  Modeling datasets as metric spaces seems to be natural for some applications and concepts revolving around the GromovHausdorff distance – a notion of distance between compact metric spaces – provide a useful language for expressing properties of data and shape analysis methods. In many situations, however, this is not enough, and one must incorporate other sources of information into the model, with ‘weights’ attached to each point being one of them. This gives rise to the idea of representing data as metric measure spaces, which are metric spaces endowed with a probability measure. In terms of a distance, the GromovHausdorff metric is replaced with the GromovWasserstein metric. 
Grounded Recurrent Neural Network (GRNN) 
In this work, we present the Grounded Recurrent Neural Network (GRNN), a recurrent neural network architecture for multilabel prediction which explicitly ties labels to specific dimensions of the recurrent hidden state (we call this process ‘grounding’). The approach is particularly wellsuited for extracting large numbers of concepts from text. We apply the new model to address an important problem in healthcare of understanding what medical concepts are discussed in clinical text. Using a publicly available dataset derived from Intensive Care Units, we learn to label a patient’s diagnoses and procedures from their discharge summary. Our evaluation shows a clear advantage to using our proposed architecture over a variety of strong baselines. 
Group Equivariant Capsule Network  We present group equivariant capsule networks, a framework to introduce guaranteed equivariance and invariance properties to the capsule network idea. We restrict pose vectors and learned transformations to be elements of a group, which allows us to prove equivariance of pose vectors and invariance of activations under application of the group law. Requirements are a modified spatial aggregation method for capsules and a generic routing by agreement algorithm with abstract rules, which we both present in this work. Further, we connect our equivariant capsule networks with work from the field of group convolutional networks, which consist of convolutions that are equivariant under applications of the group law. Through this connection, we are able to provide intuitions of how both methods relate and are able to combine both approaches in one deep neural network architecture, combining the strengths from both fields. The resulting framework allows sparse evaluation of feature maps defined over groups, provides control over specific equivariance and invariance properties and can use routing by agreement instead of pooling operations. It provides interpretable and equivariant representation vectors as output capsules, which disentangle evidence of object existence from its pose. 
Group Fused Multinomial Regression  gfmR 
Group Method of Data Handling (GMDH) 
Group method of data handling (GMDH) is a family of inductive algorithms for computerbased mathematical modeling of multiparametric datasets that features fully automatic structural and parametric optimization of models. GMDH is used in such fields as data mining, knowledge discovery, prediction, complex systems modeling, optimization and pattern recognition. GMDH algorithms are characterized by inductive procedure that performs sortingout of gradually complicated polynomial models and selecting the best solution by means of the socalled external criterion. GMDH,GMDH2 
Group Normalization (GN) 
Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pretraining to finetuning. GN can outperform or compete with its BNbased counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries. 
Grouped Merging Net (GMNet) 
Deep Convolutional Neural Networks (CNNs) are capable of learning unprecedentedly effective features from images. Some researchers have struggled to enhance the parameters’ efficiency using grouped convolution. However, the relation between the optimal number of convolutional groups and the recognition performance remains an open problem. In this paper, we propose a series of Basic Units (BUs) and a twolevel merging strategy to construct deep CNNs, referred to as a joint Grouped Merging Net (GMNet), which can produce joint grouped and reused deep features while maintaining the feature discriminability for classification tasks. Our GMNet architectures with the proposed BU_A (dense connection) and BU_B (straight mapping) lead to significant reduction in the number of network parameters and obtain performance improvement in image classification tasks. Extensive experiments are conducted to validate the superior performance of the GMNet than the stateofthearts on the benchmark datasets, e.g., MNIST, CIFAR10, CIFAR100 and SVHN. 
GroupedLAG (GLAG) 
Gradientbased distributed learning in Parameter Server (PS) computing architectures is subject to random delays due to straggling worker nodes, as well as to possible communication bottlenecks between PS and workers. Solutions have been recently proposed to separately address these impairments based on the ideas of gradient coding, worker grouping, and adaptive worker selection. This paper provides a unified analysis of these techniques in terms of wallclock time, communication, and computation complexity measures. Furthermore, in order to combine the benefits of gradient coding and grouping in terms of robustness to stragglers with the communication and computation load gains of adaptive selection, novel strategies, named Lazily Aggregated Gradient Coding (LAGC) and GroupedLAG (GLAG), are introduced. Analysis and results show that GLAG provides the best wallclock time and communication performance, while maintaining a low computational cost, for two representative distributions of the computing times of the worker nodes. 
GroupFused Graphical Lasso (GFGL) 
We consider the consistency properties of a regularised estimator for the simultaneous identification of both changepoints and graphical dependency structure in multivariate timeseries. Traditionally, estimation of Gaussian Graphical Models (GGM) is performed in an i.i.d setting. More recently, such models have been extended to allow for changes in the distribution, but only where changepoints are known apriori. In this work, we study the GroupFused Graphical Lasso (GFGL) which penalises partialcorrelations with an L1 penalty while simultaneously inducing blockwise smoothness over time to detect multiple changepoints. We present a proof of consistency for the estimator, both in terms of changepoints, and the structure of the graphical models in each segment. 
GroupRemMap Penalty  Expression quantitative trait loci (eQTLs) are genomic loci that regulate expression levels of mRNAs or proteins. Understanding these regulatory provides important clues to biological pathways that underlie diseases. In this paper, we propose a new statistical method, GroupRemMap, for identifying eQTLs. We model the relationship between gene expression and single nucleotide variants (SNVs) through multivariate linear regression models, in which gene expression levels are responses and SNV genotypes are predictors. To handle the highdimensionality as well as to incorporate the intrinsic group structure of SNVs, we introduce a new regularization scheme to (1) control the overall sparsity of the model; (2) encourage the group selection of SNVs from the same gene; and (3) facilitate the detection of transhubeQTLs. We apply the proposed method to the colorectal and breast cancer data sets from The Cancer Genome Atlas (TCGA), and identify several biologically interesting eQTLs. These ndings may provide insight into biological processes associated with cancers and generate hypotheses for future studies. groupRemMap 
Growth Curve Analysis (GCA) 
Growth curve analysis (GCA) is a multilevel regression technique designed for analysis of time course or longitudinal data. A major advantage of this approach is that it can be used to simultaneously analyze both grouplevel effects (e.g., experimental manipulations) and individuallevel effects (i.e., individual differences). 
Growth Hacking  Growth hacking is a marketing technique developed by technology startups which uses creativity, analytical thinking, and social metrics to sell products and gain exposure. It can be seen as part of the online marketing ecosystem, as in many cases growth hackers are simply good at using techniques such as search engine optimization, website analytics, content marketing and A/B testing which are already mainstream. Growth hackers focus on lowcost and innovative alternatives to traditional marketing, e.g. utilizing social media and viral marketing instead of buying advertising through more traditional media such as radio, newspaper, and television. Growth hacking is particularly important for startups, as it allows for a ‘lean’ launch that focuses on ‘growth first, budgets second.’ Facebook, Twitter, LinkedIn, AirBnB and Dropbox are all companies that use growth hacking techniques. 
Grubbs Test  Grubbs’ test (named after Frank E. Grubbs, who published the test in 1950), also known as the maximum normed residual test or extreme studentized deviate test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population. outliers 
GSimCNN  Graph Edit Distance (GED) computation is a core operation of many widelyused graph applications, such as graph classification, graph matching, and graph similarity search. However, computing the exact GED between two graphs is NPcomplete. Most current approximate algorithms are based on solving a combinatorial optimization problem, which involves complicated design and high time complexity. In this paper, we propose a novel endtoend neural network based approach to GED approximation, aiming to alleviate the computational burden while preserving good performance. The proposed approach, named GSimCNN, turns GED computation into a learning problem. Each graph is considered as a set of nodes, represented by learnable embedding vectors. The GED computation is then considered as a twoset matching problem, where a higher matching score leads to a lower GED. A Convolutional Neural Network (CNN) based approach is proposed to tackle the set matching problem. We test our algorithm on three real graph datasets, and our model achieves significant performance enhancement against stateoftheart approximate GED computation algorithms. 
GSMOTE  Imbalanced Learning is an important learning algorithm for the classification models, which have enjoyed much popularity on many applications. Typically, imbalanced learning algorithms can be partitioned into two types, i.e., data level approaches and algorithm level approaches. In this paper, the focus is to develop a robust synthetic minority oversampling technique which falls the umbrella of data level approaches. On one hand, we proposed a method to generate synthetic samples in a high dimensional feature space, instead of a linear sampling space. On the other hand, in the proposed imbalanced learning framework, Gaussian Mixture Model is employed to distinguish the outliers from minority class instances and filter out the synthetic majority class instances. Last and more importantly, an adaptive optimization method is proposed to optimize these parameters in sampling process. By doing so, an effectiveness and efficiency imbalanced learning framework is developed. 
Guaranteed Sequential Trajectory Optimization (GuSTO) 
Sequential Convex Programming (SCP) has recently seen a surge of interest as a tool for trajectory optimization. However, most available methods lack rigorous performance guarantees and they are often tailored to specific optimal control setups. In this paper, we present GuSTO (Guaranteed Sequential Trajectory Optimization), an algorithmic framework to solve trajectory optimization problems for controlaffine systems with drift. GuSTO generalizes earlier SCPbased methods for trajectory optimization (by addressing, for example, goalset constraints and problems with either fixed or free final time) and enjoys theoretical convergence guarantees in terms of convergence to, at least, a stationary point. The theoretical analysis is further leveraged to devise an accelerated implementation of GuSTO, which originally infuses ideas from indirect optimal control into an SCP context. Numerical experiments on a variety of trajectory optimization setups show that GuSTO generally outperforms current stateoftheart approaches in terms of success rates, solution quality, and computation times. 
Guided Attention for Sparsity Learning (GASL) 
The main goal of network pruning is imposing sparsity on the neural network by increasing the number of parameters with zero value in order to reduce the architecture size and the computational speedup. In most of the previous research works, sparsity is imposed stochastically without considering any prior knowledge of the weights distribution or other internal network characteristics. Enforcing too much sparsity may induce accuracy drop due to the fact that a lot of important elements might have been eliminated. In this paper, we propose Guided Attention for Sparsity Learning (GASL) to achieve (1) model compression by having less number of elements and speedup; (2) prevent the accuracy drop by supervising the sparsity operation via a guided attention mechanism and (3) introduce a generic mechanism that can be adapted for any type of architecture; Our work is aimed at providing a framework based on interpretable attention mechanisms for imposing structured and nonstructured sparsity in deep neural networks. For Cifar100 experiments, we achieved the stateoftheart sparsity level and 2.91x speedup with competitive accuracy compared to the best method. For MNIST and LeNet architecture we also achieved the highest sparsity and speedup level. 
Guided Complement Entropy (GCE) 
Model robustness has been an important issue, since adding small adversarial perturbations to images is sufficient to drive the model accuracy down to nearly zero. In this paper, we propose a new training objective ‘Guided Complement Entropy’ (GCE) that has dual desirable effects: (a) neutralizing the predicted probabilities of incorrect classes, and (b) maximizing the predicted probability of the groundtruth class, particularly when (a) is achieved. Training with GCE encourages models to learn latent representations where samples of different classes form distinct clusters, which we argue, improves the model robustness against adversarial perturbations. Furthermore, compared with the stateofthearts trained with crossentropy, same models trained with GCE achieve significant improvements on the robustness against whitebox adversarial attacks, both with and without adversarial training. When no attack is present, training with GCE also outperforms crossentropy in terms of model accuracy. 
Guided Dropout  Dropout is often used in deep neural networks to prevent overfitting. Conventionally, dropout training invokes \textit{random drop} of nodes from the hidden layers of a Neural Network. It is our hypothesis that a guided selection of nodes for intelligent dropout can lead to better generalization as compared to the traditional dropout. In this research, we propose ‘guided dropout’ for training deep neural network which drop nodes by measuring the strength of each node. We also demonstrate that conventional dropout is a specific case of the proposed guided dropout. Experimental evaluation on multiple datasets including MNIST, CIFAR10, CIFAR100, SVHN, and Tiny ImageNet demonstrate the efficacy of the proposed guided dropout. 
Guided Dynamic Routing  Previous studies have shown that neural machine translation (NMT) models can benefit from modeling translated (Past) and untranslated (Future) source contents as recurrent states (Zheng et al., 2018). However, the recurrent process is less interpretable. In this paper, we propose to model Past and Future by Capsule Network (Hinton et al.,2011), which provides an explicit separation of source words into groups of Past and Future by the process of partstowholes assignment. The assignment is learned with a novel variant of routingbyagreement mechanism (Sabour et al., 2017), namely Guided Dynamic Routing, in which what to translate at current decoding step guides the routing process to assign each source word to its associated group represented by a capsule, and to refine the representation of the capsule dynamically and iteratively. Experiments on translation tasks of three language pairs show that our model achieves substantial improvements over both RNMT and Transformer. Extensive analysis further verifies that our method does recognize translated and untranslated content as expected, and produces better and more adequate translations. 
Guided Evolutionary Strategies  Many applications in machine learning require optimizing a function whose true gradient is unknown, but where surrogate gradient information (directions that may be correlated with, but not necessarily identical to, the true gradient) is available instead. This arises when an approximate gradient is easier to compute than the full gradient (e.g. in metalearning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e.g. in certain reinforcement learning applications, or when using synthetic gradients). We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search. We define a search distribution for evolutionary strategies that is elongated along a guiding subspace spanned by the surrogate gradients. This allows us to estimate a descent direction which can then be passed to a firstorder optimizer. We analytically and numerically characterize the tradeoffs that result from tuning how strongly the search distribution is stretched along the guiding subspace, and we use this to derive a setting of the hyperparameters that works well across problems. Finally, we apply our method to example problems including truncated unrolled optimization and a synthetic gradient problem, demonstrating improvement over both standard evolutionary strategies and firstorder methods that directly follow the surrogate gradient. We provide a demo of Guided ES at: https://…/guidedevolutionarystrategies. 
Guided Labeling  Over the last couple of years, deep learning and especially convolutional neural networks have become one of the work horses of computer vision. One limiting factor for the applicability of supervised deep learning to more areas is the need for large, manually labeled datasets. In this paper we propose an easy to implement method we call guided labeling, which automatically determines which samples from an unlabeled dataset should be labeled. We show that using this procedure, the amount of samples that need to be labeled is reduced considerably in comparison to labeling images arbitrarily. 
Guided Local Search (GLS) 
Guided Local Search is a metaheuristic search method. A metaheuristic method is a method that sits on top of a local search algorithm to change its behavior. Guided Local Search builds up penalties during a search. It uses penalties to help local search algorithms escape from local minimal and plateaus. When the given local search algorithm settles in a local optimum, GLS modifies the objective function using a specific scheme (explained below). Then the local search will operate using an augmented objective function, which is designed to bring the search out of the local optimum. The key is in the way that the objective function is modified. 
Guided TeamPartitioning (GTP) 
A long line of literature has focused on the problem of selecting a team of individuals from a large pool of candidates, such that certain constraints are respected, and a given objective function is maximized. Even though extant research has successfully considered diverse families of objective functions and constraints, one of the most common limitations is the focus on the singleteam paradigm. Despite its welldocumented applications in multiple domains, this paradigm is not appropriate when the teambuilder needs to partition the entire population into multiple teams. Teampartitioning tasks are very common in an educational setting, in which the teacher has to partition the students in her class into teams for collaborative projects. The task also emerges in the context of organizations, when managers need to partition the workforce into teams with specific properties to tackle relevant projects. In this work, we extend the team formation literature by introducing the Guided TeamPartitioning (GTP) problem, which asks for the partitioning of a population into teams such that the centroid of each team is as close as possible to a given target vector. As we describe in detail in our work, this formulation allows the teambuilder to control the composition of the produced teams and has natural applications in practical settings. Algorithms for the GTP need to simultaneously consider the composition of multiple nonoverlapping teams that compete for the same population of candidates. This makes the problem considerably more challenging than formulations that focus on the optimization of a single team. In fact, we prove that GTP is NPhard to solve and even to approximate. The complexity of the problem motivates us to consider efficient algorithmic heuristics, which we evaluate via experiments on both real and synthetic datasets. 
Guided Zoom  We propose Guided Zoom, an approach that utilizes spatial grounding to make more informed predictions. It does so by making sure the model has ‘the right reasons’ for a prediction, being defined as reasons that are coherent with those used to make similar correct decisions at training time. The reason/evidence upon which a deep neural network makes a prediction is defined to be the spatial grounding, in the pixel space, for a specific class conditional probability in the model output. Guided Zoom questions how reasonable the evidence used to make a prediction is. In stateoftheart deep singlelabel classification models, the topk (k = 2, 3, 4, …) accuracy is usually significantly higher than the top1 accuracy. This is more evident in finegrained datasets, where differences between classes are quite subtle. We show that Guided Zoom results in the refinement of a model’s classification accuracy on three finegrained classification datasets. We also explore the complementarity of different grounding techniques, by comparing their ensemble to an adversarial erasing approach that iteratively reveals the next most discriminative evidence. 
GuideR  This article presents GuideR, a userguided rule induction algorithm, which overcomes the largest limitation of the existing methodsthe lack of the possibility to introduce user’s preferences or domain knowledge to the rule learning process. Automatic selection of attributes and attribute ranges often leads to the situation in which resulting rules do not contain interesting information. We propose an induction algorithm which takes into account user’s requirements. Our method uses the sequential covering approach and is suitable for classification, regression, and survival analysis problems. The effectiveness of the algorithm in all these tasks has been verified experimentally, confirming guided rule induction to be a powerful data analysis tool. 
Guider Network  Sequence generation with reinforcement learning (RL) has received significant attention recently. However, a challenge with such methods is the sparsereward problem in the RL training process, in which a scalar guiding signal is often only available after an entire sequence has been generated. This type of sparse reward tends to ignore the global structural information of a sequence, causing generation of sequences that are semantically inconsistent. In this paper, we present a modelbased RL approach to overcome this issue. Specifically, we propose a novel guider network to model the sequencegeneration environment, which can assist nextword prediction and provide intermediate rewards for generator optimization. Extensive experiments show that the proposed method leads to improved performance for both unconditional and conditional sequencegeneration tasks. 
Gumbel Graph Network  In this work, we present Gumbel Graph Network, a modelfree deep learning framework for dynamics learning and network reconstruction from the observed time series data. Our method requires no prior knowledge about underlying dynamics and has shown the stateoftheart performance in three typical dynamical systems on complex networks. 
Gumbel Subset Sampling (GSS) 
Geometric deep learning is increasingly important thanks to the popularity of 3D sensors. Inspired by the recent advances in NLP domain, the selfattention transformer is introduced to consume the point clouds. We develop Point Attention Transformers (PATs), using a parameterefficient Group Shuffle Attention (GSA) to replace the costly MultiHead Attention. We demonstrate its ability to process sizevarying inputs, and prove its permutation equivariance. Besides, prior work uses heuristics dependence on the input data (e.g., Furthest Point Sampling) to hierarchically select subsets of input points. Thereby, we for the first time propose an endtoend learnable and taskagnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points. Equipped with GumbelSoftmax, it produces a ‘soft’ continuous subset in training phase, and a ‘hard’ discrete subset in test phase. By selecting representative subsets in a hierarchical fashion, the networks learn a stronger representation of the input sets with lower computation cost. Experiments on classification and segmentation benchmarks show the effectiveness and efficiency of our methods. Furthermore, we propose a novel application, to process event camera stream as point clouds, and achieve a stateoftheart performance on DVS128 Gesture Dataset. 