Gabor Convolutional Networks (GCNs,Gabor CNN) 
Steerable properties dominate the design of traditional filters, e.g., Gabor filters, and endow features the capability of dealing with spatial transformations. However, such excellent properties have not been well explored in the popular deep convolutional neural networks (DCNNs). In this paper, we propose a new deep model, termed Gabor Convolutional Networks (GCNs or Gabor CNNs), which incorporates Gabor filters into DCNNs to enhance the resistance of deep learned features to the orientation and scale changes. By only manipulating the basic element of DCNNs based on Gabor filters, i.e., the convolution operator, GCNs can be easily implemented and are compatible with any popular deep learning architecture. Experimental results demonstrate the super capability of our algorithm in recognizing objects, where the scale and rotation changes occur frequently. The proposed GCNs have much fewer learnable network parameters, and thus is easier to train with an endtoend pipeline. To encourage further developments, the source code is released at Github. 
GaleShapley Algorithm  GaleShapley Algorithm is a solution for the Stable Marriage Problem. In 1962, David Gale and Lloyd Shapley proved that, for any equal number of men and women, it is always possible to solve the SMP and make all marriages stable. They presented an algorithm to do so. The GaleShapley algorithm involves a number of ’rounds’ (or ‘iterations’). In the first round, first a) each unengaged man proposes to the woman he prefers most, and then b) each woman replies ‘maybe’ to her suitor she most prefers and ‘no’ to all other suitors. She is then provisionally ‘engaged’ to the suitor she most prefers so far, and that suitor is likewise provisionally engaged to her. In each subsequent round, first a) each unengaged man proposes to the mostpreferred woman to whom he has not yet proposed (regardless of whether the woman is already engaged), and then b) each woman replies ‘maybe’ to her suitor she most prefers (whether her existing provisional partner or someone else) and rejects the rest (again, perhaps including her current provisional partner). The provisional nature of engagements preserves the right of an alreadyengaged woman to ‘trade up’ (and, in the process, to ‘jilt’ her untilthen partner). The runtime complexity of this algorithm is O(n^2) where n is number of men or women. This algorithm guarantees that: · Everyone gets married: At the end, there cannot be a man and a woman both unengaged, as he must have proposed to her at some point (since a man will eventually propose to everyone, if necessary) and, being proposed to, she would necessarily be engaged (to someone) thereafter. · The marriages are stable: Let Alice be a woman and Bob be a man who are both engaged, but not to each other. Upon completion of the algorithm, it is not possible for both Alice and Bob to prefer each other over their current partners. If Bob prefers Alice to his current partner, he must have proposed to Alice before he proposed to his current partner. If Alice accepted his proposal, yet is not married to him at the end, she must have dumped him for someone she likes more, and therefore doesn’t like Bob more than her current partner. If Alice rejected his proposal, she was already with someone she liked more than Bob. 
Game Theory  Game theory is the study of strategic decision making. Specifically, it is ‘the study of mathematical models of conflict and cooperation between intelligent rational decisionmakers.’ An alternative term suggested ‘as a more descriptive name for the discipline’ is interactive decision theory. Game theory is mainly used in economics, political science, and psychology, as well as logic, computer science, and biology. The subject first addressed zerosum games, such that one person’s gains exactly equal net losses of the other participant or participants. Today, however, game theory applies to a wide range of behavioral relations, and has developed into an umbrella term for the logical side of decision science, including both humans and nonhumans (e.g. computers, animals). Modern game theory began with the idea regarding the existence of mixedstrategy equilibria in twoperson zerosum games and its proof by John von Neumann. Von Neumann’s original proof used Brouwer fixedpoint theorem on continuous mappings into compact convex sets, which became a standard method in game theory and mathematical economics. His paper was followed by the 1944 book Theory of Games and Economic Behavior, cowritten with Oskar Morgenstern, which considered cooperative games of several players. The second edition of this book provided an axiomatic theory of expected utility, which allowed mathematical statisticians and economists to treat decisionmaking under uncertainty. This theory was developed extensively in the 1950s by many scholars. Game theory was later explicitly applied to biology in the 1970s, although similar developments go back at least as far as the 1930s. Game theory has been widely recognized as an important tool in many fields. With the Nobel Memorial Prize in Economic Sciences going to game theorist Jean Tirole in 2014, eleven gametheorists have now won the economics Nobel Prize. John Maynard Smith was awarded the Crafoord Prize for his application of game theory to biology. 
Gamification  Gamification is the use of game thinking and game mechanics in nongame contexts to engage users in solving problems. Gamification has been studied and applied in several domains, such as to improve user engagement, physical exercise, return on investment, data quality, timeliness, and learning. A review of research on gamification shows that most studies on gamification find positive effects from gamification 
Gamma Divergence  The gammadivergence is a generalization of the KullbackLeibler divergence with the power index gamma. It employs the power transformation of density functions, instead of the logarithmic transformation employed by the KullbackLeibler divergence. rsggm 
GammaPoisson Shrinker (GPS) 
➘ “MultiItem Gamma Poisson Shrinker” openEBGM 
GAN Qlearning  Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation. However, there are many different ways in which one can leverage the distributional approach to reinforcement learning. In this paper, we propose GAN Qlearning, a novel distributional RL method based on generative adversarial networks (GANs) and analyze its performance in simple tabular environments, as well as OpenAI Gym. We empirically show that our algorithm leverages the flexibility and blackbox approach of deep learning models while providing a viable alternative to other stateoftheart methods. 
Gang of GANs  Traditional generative adversarial networks (GAN) and many of its variants are trained by minimizing the KL or JSdivergence loss that measures how close the generated data distribution is from the true data distribution. A recent advance called the WGAN based on Wasserstein distance can improve on the KL and JSdivergence based GANs, and alleviate the gradient vanishing, instability, and mode collapse issues that are common in the GAN training. In this work, we aim at improving on the WGAN by first generalizing its discriminator loss to a marginbased one, which leads to a better discriminator, and in turn a better generator, and then carrying out a progressive training paradigm involving multiple GANs to contribute to the maximum margin ranking loss so that the GAN at later stages will improve upon early stages. We call this method Gang of GANs (GoGAN). We have shown theoretically that the proposed GoGAN can reduce the gap between the true data distribution and the generated data distribution by at least half in an optimally trained WGAN. We have also proposed a new way of measuring GAN quality which is based on image completion tasks. We have evaluated our method on four visual datasets: CelebA, LSUN Bedroom, CIFAR10, and 50KSSFF, and have seen both visual and quantitative improvement over baseline WGAN. 
GANtest  Generative adversarial networks (GANs) are one of the most popular methods for generating images today. While impressive results have been validated by visual inspection, a number of quantitative criteria have emerged only recently. We argue here that the existing ones are insufficient and need to be in adequation with the task at hand. In this paper we introduce two measures based on image classification—GANtrain and GANtest, which approximate the recall (diversity) and precision (quality of the image) of GANs respectively. We evaluate a number of recent GAN approaches based on these two measures and demonstrate a clear difference in performance. Furthermore, we observe that the increasing difficulty of the dataset, from CIFAR10 over CIFAR100 to ImageNet, shows an inverse correlation with the quality of the GANs, as clearly evident from our measures. 
GANtrain  Generative adversarial networks (GANs) are one of the most popular methods for generating images today. While impressive results have been validated by visual inspection, a number of quantitative criteria have emerged only recently. We argue here that the existing ones are insufficient and need to be in adequation with the task at hand. In this paper we introduce two measures based on image classification—GANtrain and GANtest, which approximate the recall (diversity) and precision (quality of the image) of GANs respectively. We evaluate a number of recent GAN approaches based on these two measures and demonstrate a clear difference in performance. Furthermore, we observe that the increasing difficulty of the dataset, from CIFAR10 over CIFAR100 to ImageNet, shows an inverse correlation with the quality of the GANs, as clearly evident from our measures. 
GappedKmer Support Vector Machine  Oligomers of length k, or kmers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, kmers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific kmer becomes very small, and kmer counts approach a binary variable, with most kmers absent and a few present once. Thus, any statistical learning approach using kmers as features becomes susceptible to noisy training set kmer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped kmers, a new classifier, gkmSVM, and a general method for robust estimation of kmer frequencies. To make the method applicable to largescale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmerSVM and alternative approaches, our gkmSVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkmSVM consistently outperforms kmerSVM on human ENCODE ChIPseq datasets, and further demonstrate the general utility of our method using a NaiveBayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. gkmSVM 
Gas Station Problem  In the gas station problem we want to find the cheapest path between two vertices of an $n$vertex graph. Our car has a specific fuel capacity and at each vertex we can fill our car with gas, with the fuel cost depending on the vertex. 
Gated Attention Network (GaAN) 
We propose a new network architecture, Gated Attention Networks (GaAN), for learning on graphs. Unlike the traditional multihead attention mechanism, which equally consumes all attention heads, GaAN uses a convolutional subnetwork to control each attention head’s importance. We demonstrate the effectiveness of GaAN on the inductive node classification problem. Moreover, with GaAN as a building block, we construct the Graph Gated Recurrent Unit (GGRU) to address the traffic speed forecasting problem. Extensive experiments on three realworld datasets show that our GaAN framework achieves stateoftheart results on both tasks. 
Gated Graph Neural Network  GraphtoSequence Learning using Gated Graph Neural Networks 
Gated Linear Network  This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss. Rather than relying on nonlinear transfer functions, our method gains representational power by the use of data conditioning. We state under general conditions a learnable capacity theorem that shows this approach can in principle learn any bounded Borelmeasurable function on a compact subset of euclidean space; the result is stronger than many universality results for connectionist architectures because we provide both the model and the learning procedure for which convergence is guaranteed. 
Gated Path Planning Network  Value Iteration Networks (VINs) are effective differentiable path planning modules that can be used by agents to perform navigation while still maintaining endtoend differentiability of the entire architecture. Despite their effectiveness, they suffer from several disadvantages including training instability, random seed sensitivity, and other optimization problems. In this work, we reframe VINs as recurrentconvolutional networks which demonstrates that VINs couple recurrent convolutions with an unconventional maxpooling activation. From this perspective, we argue that standard gated recurrent update equations could potentially alleviate the optimization issues plaguing VIN. The resulting architecture, which we call the Gated Path Planning Network, is shown to empirically outperform VIN on a variety of metrics such as learning speed, hyperparameter sensitivity, iteration count, and even generalization. Furthermore, we show that this performance gap is consistent across different maze transition types, maze sizes and even show success on a challenging 3D environment, where the planner is only provided with firstperson RGB images. 
Gated Recurrent Neural Tensor Network  Recurrent Neural Networks (RNNs), which are a powerful scheme for modeling temporal and sequential data need to capture longterm dependencies on datasets and represent them in hidden layers with a powerful model to capture more information from inputs. For modeling longterm dependencies in a dataset, the gating mechanism concept can help RNNs remember and forget previous information. Representing the hidden layers of an RNN with more expressive operations (i.e., tensor products) helps it learn a more complex relationship between the current input and the previous hidden layer information. These ideas can generally improve RNN performances. In this paper, we proposed a novel RNN architecture that combine the concepts of gating mechanism and the tensor product into a single model. By combining these two concepts into a single RNN, our proposed models learn longterm dependencies by modeling with gating units and obtain more expressive and direct interaction between input and hidden layers using a tensor product on 3dimensional array (tensor) weight parameters. We use Long Short Term Memory (LSTM) RNN and Gated Recurrent Unit (GRU) RNN and combine them with a tensor product inside their formulations. Our proposed RNNs, which are called a LongShort Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product. We conducted experiments with our proposed models on wordlevel and characterlevel language modeling tasks and revealed that our proposed models significantly improved their performance compared to our baseline models. 
Gated Recurrent Unit (GRU) 
Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long shortterm memory (LSTM). However, GRUs have been shown to exhibit better performance on smaller datasets. They have fewer parameters than LSTM, as they lack an output gate. 
Gaussian Graphical Model (GGM) 
A Gaussian graphical model is a graph in which all random variables are continuous and jointly Gaussian. This model corresponds to the multivariate normal distribution for N variables. Conditional independence in a Gaussian graphical model is simply reflected in the zero entries of the precision matrix. MGL 
Gaussian image entropy and piecewise stationary time series analysis (SPEV) 
Visionbased methods for visibility estimation can play a critical role in reducing traffic accidents caused by fog and haze. To overcome the disadvantages of current visibility estimation methods, we present a novel datadriven approach based on Gaussian image entropy and piecewise stationary time series analysis (SPEV). This is the first time that Gaussian image entropy is used for estimating atmospheric visibility. To lessen the impact of landscape and sunshine illuminance on visibility estimation, we used region of interest (ROI) analysis and took into account relative ratios of image entropy, to improve estimation accuracy. We assume fog and haze cause blurred images and that fog and haze can be considered as a piecewise stationary signal. We used piecewise stationary time series analysis to construct the piecewise causal relationship between image entropy and visibility. To obtain a realworld visibility measure during fog and haze, a subjective assessment was established through a study with 36 subjects who performed visibility observations. Finally, a total of two million videos were used for training the SPEV model and validate its effectiveness. The videos were collected from the constantly foggy and hazy Tongqi expressway in Jiangsu, China. The contrast model of visibility estimation was used for algorithm performance comparison, and the validation results of the SPEV model were encouraging as 99.14% of the relative errors were less than 10%. 
Gaussian Markov Random Field (GMRF) 
http://…/GMRFbook 
Gaussian Means (GMeans) 
The Gmeans algorithm starts with a small number of kmeans centers, and grows the number of centers. Each iteration of the algorithm splits into two those centers whose data appear not to come from a Gaussian distribution. Between each round of splitting, we run kmeans on the entire dataset and all the centers to refine the current solution. We can initialize with just k = 1, or we can choose some larger value of k if we have some prior knowledge about the range of k. 
Gaussian Mixture GAN (GMGAN) 
Generative Adversarial Networks (GANs) have been shown to produce realistically looking synthetic images with remarkable success, yet their performance seems less impressive when the training set is highly diverse. In order to provide a better fit to the target data distribution when the dataset includes many different classes, we propose a variant of the basic GAN model, called Gaussian Mixture GAN (GMGAN), where the probability distribution over the latent space is a mixture of Gaussians. We also propose a supervised variant which is capable of conditional sample synthesis. In order to evaluate the model’s performance, we propose a new scoring method which separately takes into account two (typically conflicting) measures – diversity vs. quality of the generated data. Through a series of empirical experiments, using both synthetic and realworld datasets, we quantitatively show that GMGANs outperform baselines, both when evaluated using the commonly used Inception Score, and when evaluated using our own alternative scoring method. In addition, we qualitatively demonstrate how the \textit{unsupervised} variant of GMGAN tends to map latent vectors sampled from different Gaussians in the latent space to samples of different classes in the data space. We show how this phenomenon can be exploited for the task of unsupervised clustering, and provide quantitative evaluation showing the superiority of our method for the unsupervised clustering of image datasets. Finally, we demonstrate a feature which further sets our model apart from other GAN models: the option to control the qualitydiversity tradeoff by altering, posttraining, the probability distribution of the latent space. This allows one to sample higher quality and lower diversity samples, or vice versa, according to one’s needs. 
Gaussian Mixture Model (GMM) 
A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. GMMs are commonly used as a parametric model of the probability distribution of continuous measurements or features in a biometric system, such as vocaltract related spectral features in a speaker recognition system. GMM parameters are estimated from training data using the iterative ExpectationMaximization (EM) algorithm or Maximum A Posteriori (MAP) estimation from a welltrained prior model. AdaptGauss 
Gaussian Multivariance  ➘ “Total Distance Multivariance” 
Gaussian Naive Bayes  When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a Gaussian distribution. For example, suppose the training data contain a continuous attribute, x. We first segment the data by the class, and then compute the mean and variance of x in each class. 
Gaussian Process (GP) 
In probability theory and statistics, a Gaussian process is a stochastic process whose realizations consist of random values associated with every point in a range of times (or of space) such that each such random variable has a normal distribution. Moreover, every finite collection of those random variables has a multivariate normal distribution. The concept of Gaussian processes is named after Carl Friedrich Gauss because it is based on the notion of the normal distribution which is often called the Gaussian distribution. In fact, one way of thinking of a Gaussian process is as an infinitedimensional generalization of the multivariate normal distribution. 
Gaussian Process Autoregressive Regression Model (GPAR) 
Multioutput regression models must exploit dependencies between outputs to maximise predictive performance. The application of Gaussian processes (GPs) to this setting typically yields models that are computationally demanding and have limited representational power. We present the Gaussian Process Autoregressive Regression (GPAR) model, a scalable multioutput GP model that is able to capture nonlinear, possibly inputvarying, dependencies between outputs in a simple and tractable way: the product rule is used to decompose the joint distribution over the outputs into a set of conditionals, each of which is modelled by a standard GP. GPAR’s efficacy is demonstrated on a variety of synthetic and realworld problems, outperforming existing GP models and achieving stateoftheart performance on the tasks with existing benchmarks. 
Gaussian Process Latent Variable Alignment Learning  We present a model that can automatically learn alignments between highdimensional data in an unsupervised manner. Learning alignments is an illconstrained problem as there are many different ways of defining a good alignment. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. We derive a probabilistic model built on nonparametric priors that allows for flexible warps while at the same time providing means to specify interpretable constraints. We show results on several datasets, including different motion capture sequences and show that the suggested model outperform the classical algorithmic approaches to the alignment task. 
Gaussian Process Regression (GPR) 
Gaussian process regression (GPR) is an even finer approach than this. Rather than claiming f(x) relates to some specific models (e.g. f(x)=mx+c), a Gaussian process can represent f(x) obliquely, but rigorously, by letting the data ‘speak’ more clearly for themselves. GPR is still a form of supervised learning, but the training data are harnessed in a subtler way. As such, GPR is a less ‘parametric’ tool. However, it’s not completely freeform, and if we’re unwilling to make even basic assumptions about f(x), then more general techniques should be considered, including those underpinned by the principle of maximum entropy; Chapter 6 of Sivia and Skilling (2006) offers an introduction. nsgp 
GaussMarkov Theorem  In statistics, the GaussMarkov theorem, named after Carl Friedrich Gauss and Andrey Markov, states that in a linear regression model in which the errors have expectation zero and are uncorrelated and have equal variances, the best linear unbiased estimator (BLUE) of the coefficients is given by the ordinary least squares (OLS) estimator. Here ‘best’ means giving the lowest variance of the estimate, as compared to other unbiased, linear estimators. The errors don’t need to be normal, nor do they need to be independent and identically distributed (only uncorrelated and homoscedastic). The hypothesis that the estimator be unbiased cannot be dropped, since otherwise estimators better than OLS exist. See for examples the JamesStein estimator (which also drops linearity) or ridge regression. 
Gauss–Newton Algorithm (GNA) 
The GaussNewton algorithm is a method used to solve nonlinear least squares problems. It is a modification of Newton’s method for finding a minimum of a function. Unlike Newton’s method, the GaussNewton algorithm can only be used to minimize a sum of squared function values, but it has the advantage that second derivatives, which can be challenging to compute, are not required. Nonlinear least squares problems arise for instance in nonlinear regression, where parameters in a model are sought such that the model is in good agreement with available observations. 
gcForest  In this paper, we propose gcForest, a decision tree ensemble approach with performance highly competitive to deep neural networks. In contrast to deep neural networks which require great effort in hyperparameter tuning, gcForest is much easier to train. Actually, even when gcForest is applied to different data from different domains, excellent performance can be achieved by almost same settings of hyperparameters. The training process of gcForest is efficient and scalable. In our experiments its training time running on a PC is comparable to that of deep neural networks running with GPU facilities, and the efficiency advantage may be more apparent because gcForest is naturally apt to parallel implementation. Furthermore, in contrast to deep neural networks which require largescale training data, gcForest can work well even when there are only smallscale training data. Moreover, as a treebased approach, gcForest should be easier for theoretical analysis than deep neural networks. 
Gear Training  The training of Deep Neural Networks usually needs tremendous computing resources. Therefore many deep models are trained in large cluster instead of single machine or GPU. Though major researchs at present try to run whole model on all machines by using asynchronous asynchronous stochastic gradient descent (ASGD), we present a new approach to train deep model parallely — split the model and then seperately train different parts of it in different speed. 
Gelly  Gelly is a Java Graph API for Flink. It contains a set of methods and utilities which aim to simplify the development of graph analysis applications in Flink. In Gelly, graphs can be transformed and modified using highlevel functions similar to the ones provided by the batch processing API. Gelly provides methods to create, transform and modify graphs, as well as a library of graph algorithms. ➚ “Apache Flink” Research and Development Roadmap for Flink Gelly 
GenAttack  Deep neural networks (DNNs) are vulnerable to adversarial examples, even in the blackbox case, where the attacker is limited to solely query access. Existing blackbox approaches to generating adversarial examples typically require a significant amount of queries, either for training a substitute network or estimating gradients from the output scores. We introduce GenAttack, a gradientfree optimization technique which uses genetic algorithms for synthesizing adversarial examples in the blackbox setting. Our experiments on the MNIST, CIFAR10, and ImageNet datasets show that GenAttack can successfully generate visually imperceptible adversarial examples against stateoftheart image recognition models with orders of magnitude fewer queries than existing approaches. For example, in our CIFAR10 experiments, GenAttack required roughly 2,568 times less queries than the current stateoftheart blackbox attack. Furthermore, we show that GenAttack can successfully attack both the stateoftheart ImageNet defense, ensemble adversarial training, and nondifferentiable, randomized input transformation defenses. GenAttack’s success against ensemble adversarial training demonstrates that its query efficiency enables it to exploit the defense’s weakness to direct blackbox attacks. GenAttack’s success against nondifferentiable input transformations indicates that its gradientfree nature enables it to be applicable against defenses which perform gradient masking/obfuscation to confuse the attacker. Our results suggest that populationbased optimization opens up a promising area of research into effective gradientfree blackbox attacks. 
General Algorithmic Search (GAS) 
In this paper we present a metaheuristic for global optimization called General Algorithmic Search (GAS). Specifically, GAS is a stochastic, singleobjective method that evolves a swarm of agents in search of a global extremum. Numerical simulations with a sample of 31 test functions show that GAS outperforms Basin Hopping, Cuckoo Search, and Differential Evolution, especially in concurrent optimization, i.e., when several runs with different initial settings are executed and the first best wins. Python codes of all algorithms and complementary information are available online. 
General Architecture for Text Engineering (GATE) 
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including information extraction in many languages. GATE has been compared to NLTK, R and RapidMiner. As well as being widely used in its own right, it forms the basis of the KIM semantic platform. GATE community and research has been involved in several European research projects including TAO, SEKT, NeOn, MediaCampaign, Musing, ServiceFinder, LIRICS and KnowledgeWeb, as well as many other projects. As of May 28, 2011, 881 people are on the gateusers mailing list at SourceForge.net, and 111,932 downloads from SourceForge are recorded since the project moved to SourceForge in 2005. The paper “GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications” has received over 800 citations in the seven years since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide, include “Building Search Applications: Lucene, LingPipe, and Gate”, by Manu Konchady, and “Introduction to Linguistic Annotation and Text Analytics”, by Graham Wilcock. 
General Graph Representation Learning Framework (DeepGL) 
This paper presents a general graph representation learning framework called DeepGL for learning deep node and edge representations from large (attributed) graphs. In particular, DeepGL begins by deriving a set of base features (e.g., graphlet features) and automatically learns a multilayered hierarchical graph representation where each successive layer leverages the output from the previous layer to learn features of a higherorder. Contrary to previous work, DeepGL learns relational functions (each representing a feature) that generalize acrossnetworks and therefore useful for graphbased transfer learning tasks. Moreover, DeepGL naturally supports attributed graphs, learns interpretable features, and is spaceefficient (by learning sparse feature vectors). In addition, DeepGL is expressive, flexible with many interchangeable components, efficient with a time complexity of $\mathcal{O}(E)$, and scalable for large networks via an efficient parallel implementation. Compared with the stateoftheart method, DeepGL is (1) effective for acrossnetwork transfer learning tasks and attributed graph representation learning, (2) spaceefficient requiring up to 6x less memory, (3) fast with up to 182x speedup in runtime performance, and (4) accurate with an average improvement of 20% or more on many learning tasks. 
General Language Understanding Evaluation Benchmark (GLUE) 
For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is modelagnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a handcrafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multitask and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems. 
General Likelihood Uncertainty Estimation (GLUE) 
The GLUE methodology (Beven and Binley 1992) rejects the idea of one single optimal solution and adopts the concept of equifinality of models, parameters and variables (Beven and Binley 1992; Beven 1993). Equifinality originates from the imperfect knowledge of the system under consideration, and many sets of models, parameters and variables may therefore be considered equal or almost equal simulators of the system. Using the GLUE analysis, the prior set of models, parameters and variables is divided into a set of nonacceptable solutions and a set of acceptable solutions. The GLUE methodology deals with the variable degree of membership of the sets. The degree of membership is determined by assessing the extent to which solutions fit the model, which in turn is determined by subjective likelihood functions. RGLUEANN 
Generalised Method of Codifferential Descent (GMCD) 
This paper is devoted to a detailed convergence analysis of the method of codifferential descent (MCD) developed by professor V.F. Demyanov for solving a large class of nonsmooth nonconvex optimization problems. We propose a generalization of the MCD that is more suitable for applications than the original method, and that utilizes only a part of a codifferential on every iteration, which allows one to reduce the overall complexity of the method. With the use of some general results on uniformly codifferentiable functions obtained in this paper, we prove the global convergence of the generalized MCD in the infinite dimensional case. Also, we propose and analyse a quadratic regularization of the MCD, which is the first general method for minimizing a codifferentiable function over a convex set. Apart from convergence analysis, we also discuss the robustness of the MCD with respect to computational errors, possible step size rules, and a choice of parameters of the algorithm. In the end of the paper we estimate a rate of convergence of the MCD for a class of nonsmooth nonconvex functions that arises, in particular, in cluster analysis. We prove that under some general assumptions the method converges with linear rate, and it convergence quadratically, provided a certain first order sufficient optimality condition holds true. 
Generalization Error  The generalization error of a machine learning model is a function that measures how well a learning machine generalizes to unseen data. It is measured as the distance between the error on the training set and the test set and is averaged over the entire set of possible training data that can be generated after each iteration of the learning process. It has this name because this function indicates the capacity of a machine that learns with the specified algorithm to infer a rule (or generalize) that is used by the teacher machine to generate data based only on a few examples. 
Generalization Error Analysis  Domain generalization is the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distributionfree, kernelbased approach that predicts a classifier from the marginal distribution of features, by leveraging the trends present in related classification tasks. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on synthetic data and three real data applications demonstrate the superiority of the method with respect to a pooling strategy. 
Generalization Tower Network (GTN) 
Deep learning (DL) advances stateoftheart reinforcement learning (RL), by incorporating deep neural networks in learning representations from the input to RL. However, the conventional deep neural network architecture is limited in learning representations for multitask RL (MTRL), as multiple tasks can refer to different kinds of representations. In this paper, we thus propose a novel deep neural network architecture, namely generalization tower network (GTN), which can achieve MTRL within a single learned model. Specifically, the architecture of GTN is composed of both horizontal and vertical streams. In our GTN architecture, horizontal streams are used to learn representation shared in similar tasks. In contrast, the vertical streams are introduced to be more suitable for handling diverse tasks, which encodes hierarchical shared knowledge of these tasks. The effectiveness of the introduced vertical stream is validated by experimental results. Experimental results further verify that our GTN architecture is able to advance the stateoftheart MTRL, via being tested on 51 Atari games. 
Generalized Additive Mixed Model (GAMM) 
gammSlice 
Generalized Additive Models (GAM) 
In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. GAMs were originally developed by Trevor Hastie and Robert Tibshirani to blend properties of generalized linear models with additive models. https://…additivepartibackgroundandrationale GAM: The Predictive Modeling Silver Bullet gamsel 
Generalized Additive Models for Location, Scale and Shape (GAMLSS) 
This paper introduces generalized additive models for location, scale and shape (GAMLSS) as a modeling framework for analyzing treatment effects beyond the mean. By relating each parameter of the response distribution to explanatory variables, GAMLSS model the treatment effect on the whole conditional distribution. Additionally, any nonnormal outcome and nonlinear effects of explanatory variables can be incorporated. We elaborate on the combination of GAMLSS with program evaluation methods in economics and provide a practical guide to the usage of GAMLSS by reanalyzing data from the \textit{Progresa} program. Contrary to expectations, no significant effects of a cash transfer on the conditional inequality level between treatment and control group are found. 
Generalized Autoregressive Conditional Heteroscedasticity (GARCH) 
If an autoregressive moving average model (ARMA model) is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH, Bollerslev (1986)) model. mfGARCH 
Generalized Autoregressive Moving Average Models (GARMA) 
A class of generalized autoregressive moving average (GARMA) models is developed that extends the univariate Gaussian ARMA time series model to a flexible observationdriven model for nonGaussian time series data. The dependent variable is assumed to have a conditional exponential family distribution given the past history of the process. The model estimation is carried out using an iteratively reweighted least squares algorithm. Properties of the model, including stationarity and marginal moments, are either derived explicitly or investigated using Monte Carlo simulation. The relationship of the GARMA model to other models is shown, including the autoregressive models of Zeger and Qaqish, the moving average models of Li, and the reparameterized generalized autoregressive conditional heteroscedastic GARCH model (providing the formula for its fourth marginal moment not previously derived). The model is demonstrated by the application of the GARMA model with a negative binomial conditional distribution to a wellknown time series dataset of poliomyelitis counts. VGAM 
Generalized Boosted Regression Models  This R package (gbm) implements extensions to Freund and Schapire’s AdaBoost algorithm and J. Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, logistic, Poisson, Cox proportional hazards partial likelihood, multinomial, tdistribution, AdaBoost exponential loss, Learning to Rank, and Huberized hinge loss. gbm 
Generalized Canonical Polyadic Tensor Decomposition (GCP) 
Tensor decomposition is a fundamental unsupervised machine learning method in data science, with applications including network analysis and sensor data processing. This work develops a generalized canonical polyadic (GCP) lowrank tensor decomposition that allows other loss functions besides squared error. For instance, we can use logistic loss or KullbackLeibler divergence, enabling tensor decomposition for binary or count data. We present a variety statisticallymotivated loss functions for various scenarios. We provide a generalized framework for computing gradients and handling missing data that enables the use of standard optimization methods for fitting the model. We demonstrate the flexibility of GCP on several realworld examples including interactions in a social network, neural activity in a mouse, and monthly rainfall measurements in India. 
Generalized Data Representation (GDR) 
Deep learning (DL)based autoencoder is a potential architecture to implement endtoend communication systems. In this letter, we first give a brief introduction to the autoencoderrepresented communication system. Then, we propose a novel generalized data representation (GDR) aiming to improve the data rate of DLbased communication systems. Finally, simulation results show that the proposed GDR scheme has lower training complexity, comparable block error rate performance and higher channel capacity than the conventional onehot vector scheme. Furthermore, we investigate the effect of signaltonoise ratio (SNR) in DLbased communication systems and prove that training at a high SNR could produce a good training performance for autoencoder. 
Generalized Discrimination Score  The Generalized Discrimination Score is a generic forecast verification framework which can be applied to any of the following verification contexts: dichotomous, polychotomous (ordinal and nominal), continuous, probabilistic, and ensemble. A comprehensive description of the Generalized Discrimination Score, including all equations used in this package, is provided by Mason and Weigel (2009) <doi:10.1175/MWRD1005069.1> afc 
Generalized Dissimilarity Modeling (GDM) 
Generalized dissimilarity modelling (GDM) is a statistical technique for analysing and predicting spatial patterns of turnover in community composition (beta diversity) across large regions. gdm 
Generalized Dynamic Principal Components (GDPC) 
Brillinger defined dynamic principal components (DPC) for time series based on a reconstruction criterion. He gave a very elegant theoretical solution and proposed an estimator which is consistent under stationarity. Here, we propose a new enterally empirical approach to DPC. The main differences with the existing methodsmainly Brillinger procedureare (1) the DPC we propose need not be a linear combination of the observations and (2) it can be based on a variety of loss functions including robust ones. Unlike Brillinger, we do not establish any consistency results; however, contrary to Brillinger’s, which has a very strong stationarity flavor, our concept aims at a better adaptation to possible nonstationary features of the series. We also present a robust version of our procedure that allows to estimate the DPC when the series have outlier contamination. We give iterative algorithms to compute the proposed procedures that can be used with a large number of variables. Our nonrobust and robust procedures are illustrated with real datasets. Supplementary materials for this article are available online. Consistency of Generalized Dynamic Principal Components in Dynamic Factor Models gdpc 
Generalized Entropy Agglomeration (GEA) 
Entropy Agglomeration (EA) is a hierarchical clustering algorithm introduced in 2013. Here, we generalize it to define Generalized Entropy Agglomeration (GEA) that can work with multiset blocks and blocks with rational occurrence numbers. We also introduce a numerical categorization procedure to apply GEA to numerical datasets. The software REBUS 2.0 is published with these capabilities: http://…/rebus2 
Generalized Estimation Equation (GEE) 
In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unknown correlation between outcomes. Parameter estimates from the GEE are consistent even when the covariance structure is misspecified, under mild regularity conditions. The focus of the GEE is on estimating the average response over the population (‘populationaveraged’ effects) rather than the regression parameters that would enable prediction of the effect of changing one or more covariates on a given individual. GEEs are usually used in conjunction with HuberWhite standard error estimates, also known as ‘robust standard error’ or ‘sandwich variance’ estimates. In the case of a linear model with a working independence variance structure, these are known as ‘heteroscedasticity consistent standard error’ estimators. Indeed, the GEE unified several independent formulations of these standard error estimators in a general framework. GEEs belong to a class of semiparametric regression techniques because they rely on specification of only the first two moments. Under correct model specification and mild regularity conditions, parameter estimates from GEEs are consistent. They are a popular alternative to the likelihoodbased generalized linear mixed model which is more sensitive to variance structure specification. They are commonly used in large epidemiological studies, especially multisite cohort studies because they can handle many types of unmeasured dependence between outcomes. mmmgee 
Generalized Gaussian Kernel Adaptive Filtering  The present paper proposes generalized Gaussian kernel adaptive filtering, where the kernel parameters are adaptive and datadriven. The Gaussian kernel is parametrized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of the scalar width parameter. These parameters are adaptively updated on the basis of a proposed leastsquaretype rule to minimize the estimation error. The main contribution of this paper is to establish update rules for precision matrices on the SPD manifold in order to keep their symmetric positivedefiniteness. Different from conventional kernel adaptive filters, the proposed regressor is a superposition of Gaussian kernels with all different parameters, which makes such regressor more flexible. The kernel adaptive filtering algorithm is established together with a l1regularized least squares to avoid overfitting and the increase of dimensionality of the dictionary. Experimental results confirm the validity of the proposed method. 
Generalized Graded Unfolding Model (GGUM) 
The generalized graded unfolding model (GGUM) is developed. This model allows for either binary or graded responses and generalizes previous item response models for unfolding in two useful ways. First, it implements a discrimination parameter that varies across items, which allows items to discriminate among respondents in different ways. Second, the GGUM permits response category threshold parameters to vary across items. Amarginal maximum likelihood algorithm is implemented to estimate GGUM item parameters, whereas person parameters are derived from an expected a posteriori technique. The applicability of the GGUM to common attitude testing situations is illustrated with real data on student attitudes toward abortion. http://…/gbm2.pdf ScoreGGUM 
Generalized Hyperbolic Distributions (GH) 
The generalised hyperbolic distribution (GH) is a continuous probability distribution defined as the normal variancemean mixture where the mixing distribution is the generalized inverse Gaussian distribution. Its probability density function is given in terms of modified Bessel function of the second kind. As the name suggests it is of a very general form, being the superclass of, among others, the Student’s tdistribution, the Laplace distribution, the hyperbolic distribution, the normalinverse Gaussian distribution and the variancegamma distribution. It is mainly applied to areas that require sufficient probability of farfield behaviour, which it can model due to its semiheavy tails – a property the normal distribution does not possess. The generalised hyperbolic distribution is often used in economics, with particular application in the fields of modelling financial markets and risk management, due to its semiheavy tails. This class is closed under linear operations. 
Generalized Integration Model  Integrates individuallevel data and summary statistics under a generalized linear model framework. gim 
Generalized Integrative Principal Component Analysis (GIPCA) 
Highdimensional multisource data are encountered in many fields. Despite recent developments on the integrative dimension reduction of such data, most existing methods cannot easily accommodate data of multiple types (e.g., binary or countvalued). Moreover, multisource data often have blockwise missing structure, i.e., data in one or more sources may be completely unobserved for a sample. The heterogeneous data types and presence of blockwise missing data pose significant challenges to the integration of multisource data and further statistical analyses. In this paper, we develop a lowrank method, called Generalized Integrative Principal Component Analysis (GIPCA), for the simultaneous dimension reduction and imputation of multisource blockwise missing data, where different sources may have different data types. We also devise an adapted BIC criterion for rank estimation. Comprehensive simulation studies demonstrate the efficacy of the proposed method in terms of rank estimation, signal recovery, and missing data imputation. We apply GIPCA to a mortality study. We achieve accurate blockwise missing data imputation and identify intriguing latent mortality rate patterns with sociological relevance. 
Generalized Kalman Smoothing  Statespace smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example RauchTungStriebel and MayneFraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model. These classical formulations fall short in many important circumstances. For instance, smoothers obtained using quadratic penalties can fail when outliers are present in the data, and cannot track impulsive inputs and abrupt state changes. Motivated by these shortcomings, generalized Kalman smoothing formulations have been proposed in the last few years, replacing quadratic models with more suitable, often nonsmooth, convex functions. In contrast to classical models, these general estimators require use of iterated algorithms, and these have received increased attention from control, signal processing, machine learning, and optimization communities. In this survey we show that the optimization viewpoint provides the control and signal processing community great freedom in the development of novel modeling and inference frameworks for dynamical systems. We discuss general statistical models for dynamic systems, making full use of nonsmooth convex penalties and constraints, and providing links to important models in signal processing and machine learning. We also survey optimization techniques for these formulations, paying close attention to dynamic problem structure. Modeling concepts and algorithms are illustrated with numerical examples. 
Generalized Lambda Distribution (GLD) 
Generalized lambda distribution is a generic distribution that can be used for various curve fittings or in general mathematical analysis. It is interesting because of the wide variety of distributional shapes it can take on. There are methods how to use this distribution to approximate various other distributions, or to fit experimental data set to this distribution. GLDEX 
Generalized Least Squares Screening (GLSS) 
Variable selection is a widely studied problem in high dimensional statistics, primarily since estimating the precise relationship between the covariates and the response is of great importance in many scientific disciplines. However, most of theory and methods developed towards this goal for the linear model invoke the assumption of iid subGaussian covariates and errors. This paper analyzes the theoretical properties of Sure Independence Screening (SIS) (Fan and Lv ) for high dimensional linear models with dependent and/or heavy tailed covariates and errors. We also introduce a generalized least squares screening (GLSS) procedure which utilizes the serial correlation present in the data. By utilizing this serial correlation when estimating our marginal effects, GLSS is shown to outperform SIS in many cases. For both procedures we prove sure screening properties, which depend on the moment conditions, and the strength of dependence in the error and covariate processes, amongst other factors. Additionally, combining these screening procedures with the adaptive Lasso is analyzed. Dependence is quantified by functional dependence measures (Wu ), and the results rely on the use of Nagaev type and exponential inequalities for dependent random variables. We also conduct simulations to demonstrate the finite sample performance of these procedures, and include a real data application of forecasting the US inflation rate. 
Generalized Likelihood Ratio Test (GLRT) 

Generalized Linear Mixed Model (GLMM) 
In statistics, a generalized linear mixed model (GLMM) is a particular type of mixed model. It is an extension to the generalized linear model in which the linear predictor contains random effects in addition to the usual fixed effects. These random effects are usually assumed to have a normal distribution. Fitting such models by maximum likelihood involves integrating over these random effects. In general, these integrals cannot be expressed in analytical form. Various approximate methods have been developed, but none has good properties for all possible models and data sets (ungrouped binary data being particularly problematic). For this reason, methods involving numerical quadrature or Markov chain Monte Carlo have increased in use as increasing computing power and advances in methods have made them more practical. 
Generalized Linear Models (GLM) 
In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. mdscore,mglmn 
Generalized Logistic Distribution  The term generalized logistic distribution is used as the name for several different families of probability distributions. For example, Johnson et al. list four forms, which are listed below. One family described here has also been called the skewlogistic distribution. For other families of distributions that have also been called generalized logistic distributions, see the shifted loglogistic distribution, which is a generalization of the loglogistic distribution. 
Generalized LpNorm TwoDimensional Linear Discriminant Analysis (G2DLDA) 
Recent advances show that twodimensional linear discriminant analysis (2DLDA) is a successful matrix based dimensionality reduction method. However, 2DLDA may encounter the singularity issue theoretically and the sensitivity to outliers. In this paper, a generalized Lpnorm 2DLDA framework with regularization for an arbitrary $p>0$ is proposed, named G2DLDA. There are mainly two contributions of G2DLDA: one is G2DLDA model uses an arbitrary Lpnorm to measure the betweenclass and withinclass scatter, and hence a proper $p$ can be selected to achieve the robustness. The other one is that by introducing an extra regularization term, G2DLDA achieves better generalization performance, and solves the singularity problem. In addition, G2DLDA can be solved through a series of convex problems with equality constraint, and it has closed solution for each single problem. Its convergence can be guaranteed theoretically when $1\leq p\leq2$. Preliminary experimental results on three contaminated human face databases show the effectiveness of the proposed G2DLDA. 
Generalized Mallows Model Latent Dirichlet Allocation (GMMLDA) 
Modeling document structure is of great importance for discourse analysis and related applications. The goal of this research is to capture the document intent structure by modeling documents as a mixture of topic words and rhetorical words. While the topics are relatively unchanged through one document, the rhetorical functions of sentences usually change following certain orders in discourse. We propose GMMLDA, a topic modeling based Bayesian unsupervised model, to analyze the document intent structure cooperated with order information. Our model is flexible that has the ability to combine the annotations and do supervised learning. Additionally, entropic regularization can be introduced to model the significant divergence between topics and intents. We perform experiments in both unsupervised and supervised settings, results show the superiority of our model over several stateoftheart baselines. 
Generalized Matrix Chain Algorithm  In this paper, we present a generalized version of the matrix chain algorithm to generate efficient code for linear algebra problems, a task for which human experts often invest days or even weeks of works. The standard matrix chain problem consists in finding the parenthesization of a matrix product $M := A_1 A_2 \cdots A_n$ that minimizes the number of scalar operations. In practical applications, however, one frequently encounters more complicated expressions, involving transposition, inversion, and matrix properties. Indeed, the computation of such expressions relies on a set of computational kernels that offer functionality well beyond the simple matrix product. The challenge then shifts from finding an optimal parenthesization to finding an optimal mapping of the input expression to the available kernels. Furthermore, it is often the case that a solution based on the minimization of scalar operations does not result in the optimal solution in terms of execution time. In our experiments, the generated code outperforms other libraries and languages on average by a factor of about 9. The motivation for this work comes from the fact that—despite great advances in the development of compilers—the task of mapping linear algebra problems to optimized kernels is still to be done manually. In order to relieve the user from this complex task, new techniques for the compilation of linear algebra expressions have to be developed. 
Generalized Matrix Splitting Algorithm (GMSA) 
Composite function minimization captures a wide spectrum of applications in both computer vision and machine learning. It includes bound constrained optimization, $\ell_1$ norm regularized optimization, and $\ell_0$ norm regularized optimization as special cases. This paper proposes and analyzes a new Generalized Matrix Splitting Algorithm (GMSA) for minimizing composite functions. It can be viewed as a generalization of the classical GaussSeidel method and the Successive OverRelaxation method for solving linear systems in the literature. Our algorithm is derived from a novel triangle operator mapping, which can be computed exactly using a new generalized Gaussian elimination procedure. We establish the global convergence, convergence rate, and iteration complexity of GMSA for convex problems. In addition, we also discuss several important extensions of GMSA. Finally, we validate the performance of our proposed method on three particular applications: nonnegative matrix factorization, $\ell_0$ norm regularized sparse coding, and $\ell_1$ norm regularized Dantzig selector problem. Extensive experiments show that our method achieves stateoftheart performance in term of both efficiency and efficacy. 
Generalized Maximum Entropy Estimation  We consider the problem of estimating a probability distribution that maximizes the entropy while satisfying a finite number of moment constraints, possibly corrupted by noise. Based on duality of convex programming, we present a novel approximation scheme using a smoothed fast gradient method that is equipped with explicit bounds on the approximation error. We further demonstrate how the presented scheme can be used for approximating the chemical master equation through the zeroinformation moment closure method. 
Generalized Method of Wavelet Moments (GMWM) 
gmwm 
Generalized MinMax (GMM) 
We develop some theoretical results for a robust similarity measure named ‘generalized minmax’ (GMM). This similarity has direct applications in machine learning as a positive definite kernel and can be efficiently computed via probabilistic hashing. Owing to the discrete nature, the hashed values can also be used for efficient near neighbor search. We prove the theoretical limit of GMM and the consistency result, assuming that the data follow an elliptical distribution, which is a very general family of distributions and includes the multivariate $t$distribution as a special case. The consistency result holds as long as the data have bounded first moment (an assumption which essentially holds for datasets commonly encountered in practice). Furthermore, we establish the asymptotic normality of GMM. Compared to the ‘cosine’ similarity which is routinely adopted in current practice in statistics and machine learning, the consistency of GMM requires much weaker conditions. Interestingly, when the data follow the $t$distribution with $\nu$ degrees of freedom, GMM typically provides a better measure of similarity than ‘cosine’ roughly when $\nu<8$ (which is already very close to normal). These theoretical results will help explain the recent success of GMM in learning tasks. 
Generalized Multistate Simulation Model  GUIgems,gems 
Generalized Network Dismantling  Finding the set of nodes, which removed or (de)activated can stop the spread of (dis)information, contain an epidemic or disrupt the functioning of a corrupt/criminal organization is still one of the key challenges in network science. In this paper, we introduce the generalized network dismantling problem, which aims to find the set of nodes that, when removed from a network, results in a network fragmentation into subcritical network components at minimum cost. For unit costs, our formulation becomes equivalent to the standard network dismantling problem. Our nonunit cost generalization allows for the inclusion of topological cost functions related to node centrality and nontopological features such as the price, protection level or even social value of a node. In order to solve this optimization problem, we propose a method, which is based on the spectral properties of a novel nodeweighted Laplacian operator. The proposed method is applicable to largescale networks with millions of nodes. It outperforms current stateoftheart methods and opens new directions in understanding the vulnerability and robustness of complex systems. 
Generalized Power Generalized Weibull Distribution  This paper introduces a new generalization of the power generalized Weibull distribution called the generalized power generalized Weibull distribution. This distribution can also be considered as a generalization of Weibull distribution. The hazard rate function of the new model has nice and flexible properties and it can take various shapes, including increasing, decreasing, upsidedown bathtub and bathtub shapes. Some of the statistical properties of the new model, including quantile function, moment generating function, reliability function, hazard function and the reverse hazard function are obtained. The moments, incomplete moments, mean deviations and Bonferroni and Lorenz curves and the order statistics densities are also derived. The model parameters are estimated by the maximum likelihood method. The usefulness of the proposed model is illustrated by using two applications of reallife data. 
Generalized Probability Smoothing  In this work we consider a generalized version of Probability Smoothing, the core elementary model for sequential prediction in the state of the art PAQ family of data compression algorithms. Our main contribution is a code length analysis that considers the redundancy of Probability Smoothing with respect to a Piecewise Stationary Source. The analysis holds for a finite alphabet and expresses redundancy in terms of the total variation in probability mass of the stationary distributions of a Piecewise Stationary Source. By choosing parameters appropriately Probability Smoothing has redundancy $O(S\cdot\sqrt{T\log T})$ for sequences of length $T$ with respect to a Piecewise Stationary Source with $S$ segments. 
Generalized Procrustes Analysis (GPA) 
Generalized Procrustes analysis (GPA) is a method of statistical analysis that can be used to compare the shapes of objects, or the results of surveys, interviews, or panels. It was developed for analysing the results of freechoice profiling, a survey technique which allows respondents (such as sensory panelists) to describe a range of products in their own words or language. GPA is one way to make sense of freechoice profiling data; other ways can be multiple factor analysis (MFA), or the STATIS method. The method was first published by J. C. Gower in 1975. 
Generalized Resistant Hyperplane Mechanisms  This paper is part of an emerging line of work at the intersection of machine learning and mechanism design, which aims to avoid noise in training data by correctly aligning the incentives of data sources. Specifically, we focus on the ubiquitous problem of linear regression, where strategyproof mechanisms have previously been identified in two dimensions. In our setting, agents have singlepeaked preferences and can manipulate only their response variables. Our main contribution is the discovery of a family of group strategyproof linear regression mechanisms in any number of dimensions, which we call generalized resistant hyperplane mechanisms. The gametheoretic properties of these mechanisms — and, in fact, their very existence — are established through a connection to a discrete version of the Ham Sandwich Theorem. 
Generalized Structured Component Analysis (GSCA) 
gesca 
Generalized Value Iteration Network (GVIN) 
In this paper, we introduce a generalized value iteration network (GVIN), which is an endtoend neural network planning module. GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. We propose three novel differentiable kernels as graph convolution operators and show that the embedding based kernel achieves the best performance. We further propose episodic Qlearning, an improvement upon traditional nstep Qlearning that stabilizes training for networks that contain a planning module. Lastly, we evaluate GVIN on planning problems in 2D mazes, irregular graphs, and realworld street networks, showing that GVIN generalizes well for both arbitrary graphs and unseen graphs of larger scale and outperforms a naive generalization of VIN (discretizing a spatial graph into a 2D image). 
Generalized Vector Space Model (GVSM) 
The Generalized vector space model is a generalization of the vector space model used in information retrieval. Many classifiers, especially those which are related to document or text classification, use the TFIDF basis of VSM. However, this is where the similarity between the models ends – the generalized model uses the results of the TFIDF dictionary to generate similarity metrics based on distance or angle difference, rather than centroid based classification. Wong et al. presented an analysis of the problems that the pairwise orthogonality assumption of the vector space model (VSM) creates. From here they extended the VSM to the generalized vector space model (GVSM). 
GeneraltoSpecific Model (GETS) 
This paper discusses the econometric methodology of generaltospecific modeling, in which the modeler simplifies an initially general model that adequately characterizes the empirical evidence within his or her theoretical framework. Central aspects of this approach include the theory of reduction, dynamic specification, model selection procedures, model selection criteria, model comparison, encompassing, computer automation, and empirical implementation. This paper thus reviews the theory of reduction, summarizes the approach of generaltospecific modeling, and discusses the econometrics of model selection, noting that generaltospecific modeling is the practical embodiment of reduction. gets 
Generative Adversarial Autoencoder Network (GAAN) 
We introduce an effective model to overcome the problem of mode collapse when training Generative Adversarial Networks (GAN). Firstly, we propose a new generator objective that finds it better to tackle mode collapse. And, we apply an independent Autoencoders (AE) to constrain the generator and consider its reconstructed samples as ‘real’ samples to slow down the convergence of discriminator that enables to reduce the gradient vanishing problem and stabilize the model. Secondly, from mappings between latent and data spaces provided by AE, we further regularize AE by the relative distance between the latent and data samples to explicitly prevent the generator falling into mode collapse setting. This idea comes when we find a new way to visualize the mode collapse on MNIST dataset. To the best of our knowledge, our method is the first to propose and apply successfully the relative distance of latent and data samples for stabilizing GAN. Thirdly, our proposed model, namely Generative Adversarial Autoencoder Networks (GAAN), is stable and has suffered from neither gradient vanishing nor mode collapse issues, as empirically demonstrated on synthetic, MNIST, MNIST1K, CelebA and CIFAR10 datasets. Experimental results show that our method can approximate well multimodal distribution and achieve better results than stateoftheart methods on these benchmark datasets. Our model implementation is published here: https://…/gaan 
Generative Adversarial Capsule Network (CapsuleGAN) 
We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) setting, while modeling image data. We provide guidelines for designing CapsNet discriminators and the updated GAN objective function, which incorporates the CapsNet margin loss, for training CapsuleGAN models. We show that CapsuleGAN outperforms convolutionalGAN at modeling image data distribution on the MNIST dataset of handwritten digits, evaluated on the generative adversarial metric and at semisupervised image classification. 
Generative Adversarial Imitation Learning (GAIL) 

Generative Adversarial Imputation Net (GAIN) 
We propose a novel method for imputing missing data by adapting the wellknown Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms stateoftheart imputation methods. 
Generative Adversarial Mapping Networks (GAMN) 
Generative Adversarial Networks (GANs) have shown impressive performance in generating photorealistic images. They fit generative models by minimizing certain distance measure between the real image distribution and the generated data distribution. Several distance measures have been used, such as JensenShannon divergence, $f$divergence, and Wasserstein distance, and choosing an appropriate distance measure is very important for training the generative network. In this paper, we choose to use the maximum mean discrepancy (MMD) as the distance metric, which has several nice theoretical guarantees. In fact, generative moment matching network (GMMN) (Li, Swersky, and Zemel 2015) is such a generative model which contains only one generator network $G$ trained by directly minimizing MMD between the real and generated distributions. However, it fails to generate meaningful samples on challenging benchmark datasets, such as CIFAR10 and LSUN. To improve on GMMN, we propose to add an extra network $F$, called mapper. $F$ maps both real data distribution and generated data distribution from the original data space to a feature representation space $\mathcal{R}$, and it is trained to maximize MMD between the two mapped distributions in $\mathcal{R}$, while the generator $G$ tries to minimize the MMD. We call the new model generative adversarial mapping networks (GAMNs). We demonstrate that the adversarial mapper $F$ can help $G$ to better capture the underlying data distribution. We also show that GAMN significantly outperforms GMMN, and is also superior to or comparable with other stateoftheart GAN based methods on MNIST, CIFAR10 and LSUNBedrooms datasets. 
Generative Adversarial Network (GAN) 
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax twoplayer game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples. GitXiv 
Generative Adversarial Network Embedding (GANE) 
Network embedding has become a hot research topic recently which can provide lowdimensional feature representations for many machine learning applications. Current work focuses on either (1) whether the embedding is designed as an unsupervised learning task by explicitly preserving the structural connectivity in the network, or (2) whether the embedding is a byproduct during the supervised learning of a specific discriminative task in a deep neural network. In this paper, we focus on bridging the gap of the two lines of the research. We propose to adapt the Generative Adversarial model to perform network embedding, in which the generator is trying to generate vertex pairs, while the discriminator tries to distinguish the generated vertex pairs from real connections (edges) in the network. Wasserstein1 distance is adopted to train the generator to gain better stability. We develop three variations of models, including GANE which applies cosine similarity, GANEO1 which preserves the firstorder proximity, and GANEO2 which tries to preserves the secondorder proximity of the network in the lowdimensional embedded vector space. We later prove that GANEO2 has the same objective function as GANEO1 when negative sampling is applied to simplify the training process in GANEO2. Experiments with realworld network datasets demonstrate that our models constantly outperform stateoftheart solutions with significant improvements on precision in link prediction, as well as on visualizations and accuracy in clustering tasks. 
Generative Adversarial Network Game (GANG) 
Generative Adversarial Networks (GAN) have become one of the most successful frameworks for unsupervised generative modeling. As GANs are difficult to train much research has focused on this. However, very little of this research has directly exploited gametheoretic techniques. We introduce Generative Adversarial Network Games (GANGs), which explicitly model a finite zerosum game between a generator ($G$) and classifier ($C$) that use mixed strategies. The size of these games precludes exact solution methods, therefore we define resourcebounded best responses (RBBRs), and a resourcebounded Nash Equilibrium (RBNE) as a pair of mixed strategies such that neither $G$ or $C$ can find a better RBBR. The RBNE solution concept is richer than the notion of `local Nash equilibria’ in that it captures not only failures of escaping local optima of gradient descent, but applies to any approximate best response computations, including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RBNE. We compare our results to standard GAN setups, and demonstrate that our method deals well with typical GAN problems such as mode collapse, partial mode coverage and forgetting. 
Generative Adversarial Networks With DecoderEncoder Output Noise (DEGAN) 
In recent years, research on image generation methods has been developing fast. The autoencoding variational Bayes method (VAEs) was proposed in 2013, which uses variational inference to learn a latent space from the image database and then generates images using the decoder. The generative adversarial networks (GANs) came out as a promising framework, which uses adversarial training to improve the generative ability of the generator. However, the generated pictures by GANs are generally blurry. The deep convolutional generative adversarial networks (DCGANs) were then proposed to leverage the quality of generated images. Since the input noise vectors are randomly sampled from a Gaussian distribution, the generator has to map from a whole normal distribution to the images. This makes DCGANs unable to reflect the inherent structure of the training data. In this paper, we propose a novel deep model, called generative adversarial networks with decoderencoder output noise (DEGANs), which takes advantage of both the adversarial training and the variational Bayesain inference to improve the performance of image generation. DEGANs use a pretrained decoderencoder architecture to map the random Gaussian noise vectors to informative ones and pass them to the generator of the adversarial networks. Since the decoderencoder architecture is trained by the same images as the generators, the output vectors could carry the intrinsic distribution information of the original images. Moreover, the loss function of DEGANs is different from GANs and DCGANs. A hiddenspace loss function is added to the adversarial loss function to enhance the robustness of the model. Extensive empirical results show that DEGANs can accelerate the convergence of the adversarial training process and improve the quality of the generated images. 
Generative Adversarial Privacy (GAP) 
We present a datadriven framework called generative adversarial privacy (GAP). Inspired by recent advancements in generative adversarial networks (GANs), GAP allows the data holder to learn the privatization mechanism directly from the data. Under GAP, finding the optimal privacy mechanism is formulated as a constrained minimax game between a privatizer and an adversary. We show that for appropriately chosen adversarial loss functions, GAP provides privacy guarantees against strong informationtheoretic adversaries. We also evaluate the performance of GAP on multidimensional Gaussian mixture models and the GENKI face database. 
Generative Adversarial Tree Search (GATS) 
We propose Generative Adversarial Tree Search (GATS), a sampleefficient Deep Reinforcement Learning (DRL) algorithm. While Monte Carlo Tree Search (MCTS) is known to be effective for search and planning in RL, it is often sampleinefficient and therefore expensive to apply in practice. In this work, we develop a Generative Adversarial Network (GAN) architecture to model an environment’s dynamics and a predictor model for the reward function. We exploit collected data from interaction with the environment to learn these models, which we then use for modelbased planning. During planning, we deploy a finite depth MCTS, using the learned model for tree search and a learned Qvalue for the leaves, to find the best action. We theoretically show that GATS improves the biasvariance tradeoff in valuebased DRL. Moreover, we show that the generative model learns the model dynamics using orders of magnitude fewer samples than the Qlearner. In nonstationary settings where the environment model changes, we find the generative model adapts significantly faster than the Qlearner to the new environment. 
Generative Autotransporter (GAT) 
In this paper, we aim to introduce the classic Optimal Transport theory to enhance deep generative probabilistic modeling. For this purpose, we design a Generative Autotransporter (GAT) model with explicit distribution optimal transport. Particularly, the GAT model owns a deep distribution transporter to transfer the target distribution to a specific prior probability distribution, which enables a regular decoder to generate target samples from the input data that follows the transported prior distribution. With such a design, the GAT model can be stably trained to generate novel data by merely using a very simple $l_1$ reconstruction loss function with a generalized manifoldbased Adam training algorithm. The experiments on two standard benchmarks demonstrate its strong generation ability. 
Generative Information Lower BOund (GILBO) 
We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). It offers a data independent measure of the complexity of the learned latent variable description, giving the log of the effective description length. It is welldefined for both VAEs and GANs. We compute the GILBO for 800 GANs and VAEs trained on MNIST and discuss the results. 
Generative Learning Algorithms  Algorithms that try to learn p(yx) directly (such as logistic regression), or algorithms that try to learn mappings directly from the space of inputs X to the labels {0, 1}, (such as the perceptron algorithm) are called discrim inative learning algorithms. Here, we’ll talk about algorithms that instead try to model p(xy) (and p(y)). These algorithms are called generative learning algorithms. For instance, if y indicates whether an example is a dog (0) or an elephant (1), then p(xy = 0) models the distribution of dogs’ features, and p(xy = 1) models the distribution of elephants’ features. Naive Bayes Generative Learning Algorithms 
Generative Markov Network (GMN) 
The assumption that data samples are independently identically distributed is the backbone of many learning algorithms. Nevertheless, datasets often exhibit rich structures in practice, and we argue that there exist some unknown orders within the data instances. Aiming to find such orders, we introduce a novel Generative Markov Network (GMN) which we use to extract the order of data instances automatically. Specifically, we assume that the instances are sampled from a Markov chain. Our goal is to learn the transitional operator of the chain as well as the generation order by maximizing the generation probability under all possible data permutations. One of our key ideas is to use neural networks as a soft lookup table for approximating the possibly huge, but discrete transition matrix. This strategy allows us to amortize the space complexity with a single model and make the transitional operator generalizable to unseen instances. To ensure the learned Markov chain is ergodic, we propose a greedy batchwise permutation scheme that allows fast training. Empirically, we evaluate the learned Markov chain by showing that GMNs are able to discover orders among data instances and also perform comparably well to stateoftheart methods on the oneshot recognition benchmark task. 
Generative Mixture of Networks  A generative model based on training deep architectures is proposed. The model consists of K networks that are trained together to learn the underlying distribution of a given data set. The process starts with dividing the input data into K clusters and feeding each of them into a separate network. After few iterations of training networks separately, we use an EMlike algorithm to train the networks together and update the clusters of the data. We call this model Mixture of Networks. The provided model is a platform that can be used for any deep structure and be trained by any conventional objective function for distribution modeling. As the components of the model are neural networks, it has high capability in characterizing complicated data distributions as well as clustering data. We apply the algorithm on MNIST handwritten digits and Yale face datasets. We also demonstrate the clustering ability of the model using some realworld and toy examples. 
Generative Model  In probability and statistics, a generative model is a model for randomly generating observable data, typically given some hidden parameters. It specifies a joint probability distribution over observation and label sequences. Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. A conditional distribution can be formed from a generative model through Bayes’ rule. Shannon (1948) gives an example in which a table of frequencies of English word pairs is used to generate a sentence beginning with “representing and speedily is an good”; which is not proper English but which will increasingly approximate it as the table is moved from word pairs to word triplets etc. 
Generative Moment Matching Network (GMMN) 
Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a twosample test based on kernel maximum mean discrepancy (MMD). 
Generative Moment Matching Network – Generative Adversarial Network (MMDGAN) 
Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a twosample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMDGAN. The new distance measure in MMDGAN is a meaningful loss that enjoys the advantage of weak topology and can be optimized via gradient descent with relatively small batch sizes. In our evaluation on multiple benchmark datasets, including MNIST, CIFAR 10, CelebA and LSUN, the performance of MMDGAN significantly outperforms GMMN, and is competitive with other representative GAN works. 
Generative Neural Machine Translation (GNMT) 
We introduce Generative Neural Machine Translation (GNMT), a latent variable architecture which is designed to model the semantics of the source and target sentences. We modify an encoderdecoder translation model by adding a latent variable as a language agnostic representation which is encouraged to learn the meaning of the sentence. GNMT achieves competitive BLEU scores on pure translation tasks, and is superior when there are missing words in the source sentence. We augment the model to facilitate multilingual translation and semisupervised learning without adding parameters. This framework significantly reduces overfitting when there is limited paired data available, and is effective for translating between pairs of languages not seen during training. 
Generative Reversible Network  ➘ “Reversible Neural Network” 
Generative Topic Embedding  Word embedding maps words into a lowdimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a lowdimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative topic embedding model to combine the two types of patterns. In our model, topics are represented by embedding vectors, and are shared across documents. The probability of each word is influenced by both its local context and its topic. A variational inference method yields the topic embeddings as well as the topic mixing proportions for each document. Jointly they represent the document in a lowdimensional continuous space. In two d 
Generative Topographic Map (GTM) 
Generative topographic map (GTM) is a machine learning method that is a probabilistic counterpart of the selforganizing map (SOM), is provably convergent and does not require a shrinking neighborhood or a decreasing step size. It is a generative model: the data is assumed to arise by first probabilistically picking a point in a lowdimensional space, mapping the point to the observed highdimensional input space (via a smooth function), then adding noise in that space. The parameters of the lowdimensional probability distribution, the smooth map and the noise are all learned from the training data using the expectationmaximization (EM) algorithm. GTM was introduced in 1996 in a paper by Christopher M. Bishop, Markus Svensen, and Christopher K. I. Williams. 
Generic Diffusion Process (genericDP) 
Image restoration problems are typical illposed problems where the regularization term plays an important role. The regularization term learned via generative approaches is easy to transfer to various image restoration, but offers inferior restoration quality compared with that learned via discriminative approaches. On the contrary, the regularization term learned via discriminative approaches are usually trained for a specific image restoration problem, and fail in the problem for which it is not trained. To address this issue, we propose a generic diffusion process (genericDP) to handle multiple Gaussian denoising problems based on the Trainable Nonlinear Reaction Diffusion (TNRD) models. Instead of one model, which consists of a diffusion and a reaction term, for one Gaussian denoising problem in TNRD, we enforce multiple TNRD models to share one diffusion term. The trained genericDP model can provide both promising denoising performance and high training efficiency compared with the original TNRD models. We also transfer the trained diffusion term to nonblind deconvolution which is unseen in the training phase. Experiment results show that the trained diffusion term for multiple Gaussian denoising can be transferred to image nonblind deconvolution as an image prior and provide competitive performance. 
GENESYS  Modern deep learning systems rely on (a) a handtuned neural network topology, (b) massive amounts of labeled training data, and (c) extensive training over largescale compute resources to build a system that can perform efficient image classification or speech recognition. Unfortunately, we are still far away from implementing adaptive general purpose intelligent systems which would need to learn autonomously in unknown environments and may not have access to some or any of these three components. Reinforcement learning and evolutionary algorithm (EA) based methods circumvent this problem by continuously interacting with the environment and updating the models based on obtained rewards. However, deploying these algorithms on ubiquitous autonomous agents at the edge (robots/drones) demands extremely high energyefficiency due to (i) tight power and energy budgets, (ii) continuous/lifelong interaction with the environment, (iii) intermittent or no connectivity to the cloud to run heavyweight processing. To address this need, we present GENESYS, an HWSW prototype of an EAbased learning system, that comprises a closed loop learning engine called EvE and an inference engine called ADAM. EvE can evolve the topology and weights of neural networks completely in hardware for the task at hand, without requiring handoptimization or backpropagation training. ADAM continuously interacts with the environment and is optimized for efficiently running the irregular neural networks generated by EvE. GENESYS identifies and leverages multiple unique avenues of parallelism unique to EAs that we term ‘gene’ level parallelism, and ‘population’level parallelism. We ran GENESYS with a suite of environments from OpenAI gym and observed 25 orders of magnitude higher energyefficiency over stateoftheart embedded and desktop CPU and GPU systems. 
Genetic Algorithm (GA) 
In the computer science field of artificial intelligence, a genetic algorithm (GA) is a search heuristic that mimics the process of natural selection. This heuristic (also sometimes called a metaheuristic) is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover. Genetic algorithms find application in bioinformatics, phylogenetics, computational science, engineering, economics, chemistry, manufacturing, mathematics, physics, pharmacometrics and other fields. 
Genetic Evolution Network (GEN) 
In this paper, we introduce an alternative approach, namely GEN (Genetic Evolution Network) Model, to the deep learning models. Instead of building one single deep model, GEN adopts a geneticevolutionary learning strategy to build a group of unit models generations by generations. Significantly different from the wellknown representation learning models with extremely deep structures, the unit models covered in GEN are of a much shallower architecture. In the training process, from each generation, a subset of unit models will be selected based on their performance to evolve and generate the child models in the next generation. GEN has significant advantages compared with existing deep representation learning models in terms of both learning effectiveness, efficiency and interpretability of the learning process and learned results. Extensive experiments have been done on diverse benchmark datasets, and the experimental results have demonstrated the outstanding performance of GEN compared with the stateoftheart baseline methods in both effectiveness of efficiency. 
Genetic Programming for Reinforcement Learning (GPRL) 
The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on modelbased batch reinforcement learning and genetic programming, which autonomously learns policy equations from preexisting default stateaction trajectory samples. GPRL is compared to a straightforward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing wellperforming, but noninterpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cartpole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing wellperforming interpretable reinforcement learning policies from preexisting default trajectory data. 
Genetic Programming Relevance Vector Machine (GPRVM) 
This paper proposes a hybrid basis function construction method (GPRVM) for Symbolic Regression problem, which combines an extended version of Genetic Programming called Kaizen Programming and Relevance Vector Machine to evolve an optimal set of basis functions. Different from traditional evolutionary algorithms where a single individual is a complete solution, our method proposes a solution based on linear combination of basis functions built from individuals during the evolving process. RVM which is a sparse Bayesian kernel method selects suitable functions to constitute the basis. RVM determines the posterior weight of a function by evaluating its quality and sparsity. The solution produced by GPRVM is a sparse Bayesian linear model of the coefficients of many nonlinear functions. Our hybrid approach is focused on nonlinear whitebox models selecting the right combination of functions to build robust predictions without prior knowledge about data. Experimental results show that GPRVM outperforms conventional methods, which suggest that it is an efficient and accurate technique for solving SR. The computational complexity of GPRVM scales in $O( M^{3})$, where $M$ is the number of functions in the basis set and is typically much smaller than the number $N$ of training patterns. 
GeneticEvolutionary Adam (GADAM) 
Deep neural network learning can be formulated as a nonconvex optimization problem. Existing optimization algorithms, e.g., Adam, can learn the models fast, but may get stuck in local optima easily. In this paper, we introduce a novel optimization algorithm, namely GADAM (GeneticEvolutionary Adam). GADAM learns deep neural network models based on a number of unit models generations by generations: it trains the unit models with Adam, and evolves them to the new generations with genetic algorithm. We will show that GADAM can effectively jump out of the local optima in the learning process to obtain better solutions, and prove that GADAM can also achieve a very fast convergence. Extensive experiments have been done on various benchmark datasets, and the learning results will demonstrate the effectiveness and efficiency of the GADAM algorithm. 
Genie  In this paper, we propose an offline counterfactual policy estimation framework called Genie to optimize Sponsored Search Marketplace. Genie employs an open box simulation engine with click calibration model to compute the KPI impact of any modification to the system. From the experimental results on Bing traffic, we showed that Genie performs better than existing observational approaches that employs randomized experiments for traffic slices that have frequent policy updates. We also show that Genie can be used to tune completely new policies efficiently without creating risky randomized experiments due to cold start problem. As time of today, Genie hosts more than 10000 optimization jobs yearly which runs more than 30 Million processing node hours of big data jobs for Bing Ads. For the last 3 years, Genie has been proven to be the one of the major platforms to optimize Bing Ads Marketplace due to its reliability under frequent policy changes and its efficiency to minimize risks in real experiments. 
GenVariScan  Motivation: Advances in nextgeneration sequencing (NGS) methods have enabled researchers and agencies to collect a wide variety of sequencing data across multiple platforms. The motivation behind such an exercise is to analyze these datasets jointly, in order to gain insights into disease prognosis, treatment, and cure. Clustering of such datasets, can provide much needed insight into biological associations. However, the differing scale, and the heterogeneity of the mixed dataset is hurdle for such analyses. Results: The paper proposes a nonparameteric Bayesian approach called GenVariScan for biclustering of highdimensional mixed data. Generalized Linear Models (GLM), and latent variable approaches are utilized to integrate mixed dataset. Sparsity inducing property of Poisson Dirichlet Process (PDP) is used to identify a lower dimensional structure of mixed covariates. We apply our method to Glioblastoma Multiforme (GBM) cancer dataset. We show that cluster detection is aposteriori consistent, as number of covariates and subject grows. As a byproduct, we derive a working value approach to perform beta regression. 
Geographic Information Systems (GIS) 
A geographic information system (GIS) is a computer system designed to capture, store, manipulate, analyze, manage, and present all types of geographical data. The acronym GIS is sometimes used for geographical information science or geospatial information studies to refer to the academic discipline or career of working with geographic information systems and is a large domain within the broader academic discipline of Geoinformatics. 
Geographic Resources Analysis Support System (GRASS) 
GRASS GIS, commonly referred to as GRASS (Geographic Resources Analysis Support System), is a free and open source Geographic Information System (GIS) software suite used for geospatial data management and analysis, image processing, graphics and maps production, spatial modeling, and visualization. GRASS GIS is currently used in academic and commercial settings around the world, as well as by many governmental agencies and environmental consulting companies. It is a founding member of the Open Source Geospatial Foundation (OSGeo). rgrass7 
GeoJSON  GeoJSON is a format for encoding a variety of geographic data structures. A GeoJSON object may represent a geometry, a feature, or a collection of features. GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Features in GeoJSON contain a geometry object and additional properties, and a feature collection represents a list of features. A complete GeoJSON data structure is always an object (in JSON terms). In GeoJSON, an object consists of a collection of name/value pairs — also called members. For each member, the name is always a string. Member values are either a string, number, object, array or one of the literals: true, false, and null. An array consists of elements where each element is a value as described above. 
Geometric Dirichlet Mean  We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the optimization of a geometric loss function, which is a surrogate to the LDA’s likelihood. Our method involves a fast optimization based weighted clustering procedure augmented with geometric corrections, which overcomes the computational and statistical inefficiencies encountered by other techniques based on Gibbs sampling and variational inference, while achieving the accuracy comparable to that of a Gibbs sampler. The topic estimates produced by our method are shown to be statistically consistent under some conditions. The algorithm is evaluated with extensive experiments on simulated and real data. 
Geometric Enclosing Network (GEN) 
Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometrybased optimization approach to address this problem. Orthogonal to current stateoftheart densitybased approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G\left(\bz\right) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easytocontrol optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and realworld datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multimodal data and quality of generated data. 
Geometric Generalization Based ZeroShot Learning Test  Raven’s Progressive Matrices are one of the widely used tests in evaluating the human test taker’s fluid intelligence. Analogously, this paper introduces geometric generalization based zeroshot learning tests to measure the rapid learning ability and the internal consistency of deep generative models. Our empirical research analysis on stateoftheart generative models discern their ability to generalize concepts across classes. In the process, we introduce Infinit World, an evaluable, scalable, multimodal, lightweight dataset and ZeroShot Intelligence Metric ZSI. The proposed tests condenses humanlevel spatial and numerical reasoning tasks to its simplistic geometric forms. The dataset is scalable to a theoretical limit of infinity, in numerical features of the generated geometric figures, image size and in quantity. We systematically analyze stateoftheart model’s internal consistency, identify their bottlenecks and propose a proactive optimization method for fewshot and zeroshot learning. 
Geometric Generative Adversarial Nets (Geometric GAN) 
Generative Adversarial Nets (GANs) represent an important milestone for effective generative models, which has inspired numerous variants seemingly different from each other. One of the main contributions of this paper is to reveal a unified geometric structure in GAN and its variants. Specifically, we show that the adversarial generative model training can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. This geometric intuition reveals the limitations of the existing approaches and leads us to propose a new formulation called geometric GAN using SVM separating hyperplane that maximizes the margin. Our theoretical analysis shows that the geometric GAN converges to a Nash equilibrium between the discriminator and generator. In addition, extensive numerical results show that the superior performance of geometric GAN. 
Geometric Mean Metric Learning  We revisit the task of learning a Euclidean metric from data. We approach this problem from first principles and formulate it as a surprisingly simple optimization problem. Indeed, our formulation even admits a closed form solution. This solution possesses several very attractive properties: (i) an innate geometric appeal through the Riemannian geometry of positive definite matrices; (ii) ease of interpretability; and (iii) computational speed several orders of magnitude faster than the widely used LMNN and ITML methods. Furthermore, on standard benchmark datasets, our closedform solution consistently attains higher classification accuracy. 
Geometric Program (GP) 
A geometric program (GP) is a type of mathematical optimization problem characterized by objective and constraint functions that have a special form. Recently developed solution methods can solve even largescale GPs extremely efficiently and reliably; at the same time a number of practical problems, particularly in circuit design, have been found to be equivalent to (or well approximated by) GPs. Putting these two together, we get effective solutions for the practical problems. The basic approach in GP modeling is to attempt to express a practical problem, such as an engineering analysis or design problem, in GP format. In the best case, this formulation is exact; when this is not possible, we settle for an approximate formulation. 
Geometric Semantic Genetic Programming (GSGP) 
In iterative supervised learning algorithms it is common to reach a point in the search where no further induction seems to be possible with the available data. If the search is continued beyond this point, the risk of overfitting increases significantly. Following the recent developments in inductive semantic stochastic methods, this paper studies the feasibility of using information gathered from the semantic neighborhood to decide when to stop the search. Two semantic stopping criteria are proposed and experimentally assessed in Geometric Semantic Genetic Programming (GSGP) and in the Semantic Learning Machine (SLM) algorithm (the equivalent algorithm for neural networks). The experiments are performed on realworld highdimensional regression datasets. The results show that the proposed semantic stopping criteria are able to detect stopping points that result in a competitive generalization for both GSGP and SLM. This approach also yields computationally efficient algorithms as it allows the evolution of neural networks in less than 3 seconds on average, and of GP trees in at most 10 seconds. The usage of the proposed semantic stopping criteria in conjunction with the computation of optimal mutation/learning steps also results in small trees and neural networks. 
Geometrically Designed Spline Regression  Geometrically Designed Spline (‘GeDS’) Regression is a nonparametric geometrically motivated method for fitting variable knots spline predictor models in one or two independent variables, in the context of generalized (non)linear models. ‘GeDS’ estimates the number and position of the knots and the order of the spline, assuming the response variable has a distribution from the exponential family. A description of the method can be found in Kaishev et al. (2016) <doi:10.1007/s0018001506217> and Dimitrova et al. (2017) <https://…/18460>. GeDS 
Geometry Score  One of the biggest challenges in the research of generative adversarial networks (GANs) is assessing the quality of generated samples and detecting various levels of mode collapse. In this work, we construct a novel measure of performance of a GAN by comparing geometrical properties of the underlying data manifold and the generated one, which provides both qualitative and quantitative means for evaluation. Our algorithm can be applied to datasets of an arbitrary nature and is not limited to visual data. We test the obtained metric on various reallife models and datasets and demonstrate that our method provides new insights into properties of GANs. 
GeometryAware Generative Adversarial Network (GAGAN) 
Deep generative models learned through adversarial training have become increasingly popular for their ability to generate naturalistic image textures. However, apart from the visual texture, the visual appearance of objects is significantly affected by their shape geometry, information which is not taken into account by existing generative models. This paper introduces the GeometryAware Generative Adversarial Network (GAGAN) for incorporating geometric information into the image generation process. Specifically, in GAGAN the generator samples latent variables from the probability space of a statistical shape model. By mapping the output of the generator to a canonical coordinate frame through a differentiable geometric transformation, we enforce the geometry of the objects and add an implicit connection from the prior to the generated object. Experimental results on face generation indicate that the GAGAN can generate realistic images of faces with arbitrary facial attributes such as facial expression, pose, and morphology, that are of better quality compared to current GANbased methods. Finally, our method can be easily incorporated into and improve the quality of the images generated by any existing GAN architecture. 
Gephi  Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs. Runs on Windows, Linux and Mac OS X. Gephi is opensource and free. 
GEPPG  In continuous action domains, standard deep reinforcement learning algorithms like DDPG suffer from inefficient exploration when facing sparse or deceptive reward problems. Conversely, evolutionary and developmental methods focusing on exploration like novelty search, qualitydiversity or goal exploration processes are less sample efficient during exploitation. In this paper, we present the GEPPG approach, taking the best of both worlds by sequentially combining two variants of a goal exploration process and two variants of DDPG. We study the learning performance of these components and their combination on a low dimensional deceptive reward problem and on the larger HalfCheetah benchmark. Among other things, we show that DDPG fails on the former and that GEPPG obtains performance above the stateoftheart on the latter. 
GGQID3  Usually, decision tree induction algorithms are limited to work with non relational data. Given a record, they do not take into account other objects attributes even though they can provide valuable information for the learning task. In this paper we present GGQID3, a multirelational decision tree learning algorithm that uses Generalized Graph Queries (GGQ) as predicates in the decision nodes. GGQs allow to express complex patterns (including cycles) and they can be refined stepbystep. Also, they can evaluate structures (not only single records) and perform Regular Pattern Matching. GGQ are built dynamically (pattern mining) during the GGQID3 tree construction process. We will show how to use GGQID3 to perform multirelational machine learning keeping complexity under control. Finally, some real examples of automatically obtained classification trees and semantic patterns are shown. —– Normalmente, los algoritmos de inducci\’on de \’arboles de decisi\’on trabajan con datos no relacionales. Dado un registro, no tienen en cuenta los atributos de otros objetos a pesar de que \’estos pueden proporcionar informaci\’on \’util para la tarea de aprendizaje. En este art\’iculo presentamos GGQID3, un algoritmo de aprendizaje de \’arboles de decisiones multirelacional que utiliza Generalized Graph Queries (GGQ) como predicados en los nodos de decisi\’on. Los GGQs permiten expresar patrones complejos (incluyendo ciclos) y pueden ser refinados paso a paso. Adem\’as, pueden evaluar estructuras (no solo registros) y llevar a cabo Regular Pattern Matching. En GGQID3, los GGQ son construidos din\’amicamente (pattern mining) durante el proceso de construcci\’on del \’arbol. Adem\’as, se muestran algunos ejemplos reales de \’arboles de clasificaci\’on multirelacionales y patrones sem\’anticos obtenidos autom\’aticamente. 
GGT  Adaptive regularization methods come in diagonal and fullmatrix variants. However, only the former have enjoyed widespread adoption in training largescale deep models. This is due to the computational overhead of manipulating a full matrix in high dimension. In this paper, we show how to make fullmatrix adaptive regularization practical and useful. We present GGT, a truly scalable fullmatrix adaptive optimizer. At the heart of our algorithm is an efficient method for computing the inverse square root of a lowrank matrix. We show that GGT converges to firstorder local minima, providing the first rigorous theoretical analysis of adaptive regularization in nonconvex optimization. In preliminary experiments, GGT trains faster across a variety of synthetic tasks and standard deep learning benchmarks. 
Gibbs Sampling  In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution (i.e. from the joint probability distribution of two or more random variables), when direct sampling is difficult. This sequence can be used to approximate the joint distribution (e.g., to generate a histogram of the distribution); to approximate the marginal distribution of one of the variables, or some subset of the variables (for example, the unknown parameters or latent variables); or to compute an integral (such as the expected value of one of the variables). Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled. 
GibbsNet  Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet sampling from them generally requires an iterative procedure such as blocked Gibbssampling that may require many steps to draw samples from the joint distribution $p(x, z)$. We propose a novel approach to learning the joint distribution between the data and a latent code which uses an adversarially learned iterative procedure to gradually refine the joint distribution, $p(x, z)$, to better match with the data distribution on each step. GibbsNet is the best of both worlds both in theory and in practice. Achieving the speed and simplicity of a directed latent variable model, it is guaranteed (assuming the adversarial game reaches the virtual training criteria global minimum) to produce samples from $p(x, z)$ with only a few sampling iterations. Achieving the expressiveness and flexibility of an undirected latent variable model, GibbsNet does away with the need for an explicit $p(z)$ and has the ability to do attribute prediction, classconditional generation, and joint imageattribute modeling in a single model which is not trained for any of these specific tasks. We show empirically that GibbsNet is able to learn a more complex $p(z)$ and show that this leads to improved inpainting and iterative refinement of $p(x, z)$ for dozens of steps and stable generation without collapse for thousands of steps, despite being trained on only a few steps. 
GIDropout  Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training. Inspired by dropout, this paper presents GIDropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use explicit instructions based on global information of the dataset to guide the training process. With GIDropout, the model is supposed to pay more attention to inapparent features or patterns. Experiments demonstrate the effectiveness of the dropout with global information on seven text classification tasks, including sentiment analysis and topic classification. 
Gini Impurity  Used by the CART (classification and regression tree) algorithm, Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it were randomly labeled according to the distribution of labels in the subset. Gini impurity can be computed by summing the probability of each item being chosen times the probability of a mistake in categorizing that item. It reaches its minimum (zero) when all cases in the node fall into a single target category. 
GirvanNewman Algorithm  The GirvanNewman algorithm detects communities by progressively removing edges from the original network. The connected components of the remaining network are the communities. Instead of trying to construct a measure that tells us which edges are the most central to communities, the GirvanNewman algorithm focuses on edges that are most likely “between” communities. 
GitHub Gist  Tom PrestonWerner presented the new Gist feature at a punk rock Ruby conference in 2008. Gist builds upon that idea by adding version control for code snippets, easy forking, and SSL encryption for private pastes. Because each “gist” is its own Git repository, multiple code snippets can be contained in a single paste and they can be pushed and pulled using Git. Further, forked code can be pushed back to the original author in the form of a patch, so pastes can become more like miniprojects. The main benefit of forking is that it allows you to freely experiment with changes without affecting the original project. Gist is a simple way to share snippets and pastes with others. All gists are Git repositories, so they are automatically versioned, forkable and usable from Git. gistr 
GitHub Hosted R Repository (ghrr) 
This ghrr (for ‘GitHub Hosted R Repository’) uses drat for both insertion of packages, and usage from R. http://…/#introducing_ghrr drat 
GitXiv  arXiv + Github + Links + Discussion: GitXiv is a space to share links to open computer science projects. Countless Github and arXiv links are floating around the web. Its hard to keep track of these gems. GitXiv attempts to solve this problem by offering a collaboratively curated feed of projects. Each project is conveniently presented as arXiv + Github + Links + Discussion. Members can submit their findings and let the community rank and discuss it. A regular newsletter makes it easy to stay uptodate on recent advancements. It´s free and open. 
GLearning  Modelfree reinforcement learning algorithms such as Qlearning perform poorly in the early stages of learning in noisy environments, because much effort is spent on unlearning biased estimates of the stateaction function. The bias comes from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose Glearning, a new offpolicy learning algorithm that regularizes the noise in the space of optimal actions by penalizing deterministic policies at the beginning of the learning. Moreover, it enables naturally incorporating prior distributions over optimal actions when available. The stochastic nature of Glearning also makes it more costeffective than Qlearning in noiseless but explorationrisky domains. We illustrate these ideas in several examples where Glearning results in significant improvements of the learning rate and the learning cost. 
Global Interpreter Lock (GIL) 
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not threadsafe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.) CPython extensions must be GILaware in order to avoid defeating threads. For an explanation, see Global interpreter lock. The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or longrunning operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck. However the GIL degrades performance even when it is not a bottleneck. Summarizing those slides: The system call overhead is significant, especially on multicore hardware. Two threads calling a function may take twice as much time as a single thread calling the function twice. The GIL can cause I/Obound threads to be scheduled ahead of CPUbound threads. And it prevents signals from being delivered. 
Global Sensitivity Analysis (GSA) 
This presentation aims to introduce global sensitivity analysis (SA), targeting an audience unfamiliar with the topic, and to give practical hints about the associated advantages and the effort needed. To this effect, we shall review some techniques for sensitivity analysis, including those that are not global, by applying them to a simple example. This will give the audience a chance to contrast each method’s result against the audience’s own expectation of what the sensitivity pattern for the simple model should be. We shall also try to relate the discourse on the relative importance of model input factors to specific questions, such as ‘Which of the uncertain input factor(s) is so noninfluential that we can safely fix it/them?’ or ‘If we could eliminate the uncertainty in one of the input factors, which factor should we choose to reduce the most the variance of the output?’ In this way, the selection of the method for sensitivity analysis will be put in relation to the framing of the analysis and to the interpretation and presentation of the results. The choice of the output of interest will be discussed in relation to the purpose of the model based analysis. The main methods that we present in this lecture are all related with one another, and are the method of Morris for factors’ screening and the variancebased measures. All are modelfree, in the sense that their application does not rely on special assumptions on the behaviour of the model (such as linearity, monotonicity and additivity of the relationship between input factor and model output). Monte Carlo filtering will be also be discussed to demonstrate the usefulness of global sensitivity analysis in relation to estimation. Global sensitivity analysis: An introduction (PDF Download Available) Global sensitivity analysis for statistical model parameters 
Global Style Token (GST) 
In this work, we propose ‘global style tokens’ (GSTs), a bank of embeddings that are jointly trained within Tacotron, a stateoftheart endtoend speech synthesis system. The embeddings are trained with no explicit labels, yet learn to model a large range of acoustic expressiveness. GSTs lead to a rich set of significant results. The soft interpretable ‘labels’ they generate can be used to control synthesis in novel ways, such as varying speed and speaking style – independently of the text content. They can also be used for style transfer, replicating the speaking style of a single audio clip across an entire longform text corpus. When trained on noisy, unlabeled found data, GSTs learn to factorize noise and speaker identity, providing a path towards highly scalable but robust speech synthesis. 
Global Vectors for Word Representation (GloVe) 
Recent methods for learning vector space representations of words have succeeded in capturing finegrained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a wordword cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition. 
Globally Improved ANT (GIANT) 
For distributed computing environments, we consider the canonical machine learning problem of empirical risk minimization (ERM) with quadratic regularization, and we propose a distributed and communicationefficient Newtontype optimization method. At every iteration, each worker locally finds an Approximate NewTon (ANT) direction, and then it sends this direction to the main driver. The driver, then, averages all the ANT directions received from workers to form a Globally Improved ANT (GIANT) direction. GIANT naturally exploits the tradeoffs between local computations and global communications in that more local computations result in fewer overall rounds of communications. GIANT is highly communication efficient in that, for $d$dimensional data uniformly distributed across $m$ workers, it has $4$ or $6$ rounds of communication and $O (d \log m)$ communication complexity per iteration. Theoretically, we show that GIANT’s convergence rate is faster than firstorder methods and existing distributed Newtontype methods. From a practical pointofview, a highly beneficial feature of GIANT is that it has only one tuning parameter—the iterations of the local solver for computing an ANT direction. This is indeed in sharp contrast with many existing distributed Newtontype methods, as well as popular first order methods, which have several tuning parameters, and whose performance can be greatly affected by the specific choices of such parameters. In this light, we empirically demonstrate the superior performance of GIANT compared with other competing methods. 
Glue Code  The term glue code is sometimes used to describe implementations of the adapter pattern. It does not serve any use in calculation or computation. Rather it serves as a proxy between otherwise incompatible parts of software, to make them compatible. The standard practice is to keep logic out of the glue code and leave that to the code blocks it connects to. 
Gnowee  This paper introduces Gnowee, a modular, Pythonbased, opensource hybrid metaheuristic optimization algorithm (Available from https://…/Gnowee ). Gnowee is designed for rapid convergence to nearly globally optimum solutions for complex, constrained nuclear engineering problems with mixedinteger and combinatorial design vectors and highcost, noisy, discontinuous, black box objective function evaluations. Gnowee’s hybrid metaheuristic framework is a new combination of a set of diverse, robust heuristics that appropriately balance diversification and intensification strategies across a wide range of optimization problems. This novel algorithm was specifically developed to optimize complex nuclear design problems; the motivating research problem was the design of material stackups to modify neutron energy spectra to specific targeted spectra for applications in nuclear medicine, technical nuclear forensics, nuclear physics, etc. However, there are a wider range of potential applications for this algorithm both within the nuclear community and beyond. To demonstrate Gnowee’s behavior for a variety of problem types, comparisons between Gnowee and several wellestablished metaheuristic algorithms are made for a set of eighteen continuous, mixedinteger, and combinatorial benchmarks. These results demonstrate Gnoweee to have superior flexibility and convergence characteristics over a wide range of design spaces. We anticipate this wide range of applicability will make this algorithm desirable for many complex engineering applications. 
Gnu Regression Econometrics and TimeSeries Library (gretl) 
Is a crossplatform software package for econometric analysis, written in the C programming language. It is free, opensource software. You may redistribute it and/or modify it under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation. 
GNU Scientific Library (GSL) 
The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. It is free software under the GNU General Public License. The library provides a wide range of mathematical routines such as random number generators, special functions and leastsquares fitting. There are over 1000 functions in total with an extensive test suite. RcppGSL 
Goal Oriented Optimal Design of Experiments (GOODE) 
We develop a framework for goal oriented optimal design of experiments (GOODE) for largescale Bayesian linear inverse problems governed by PDEs. This framework differs from classical Bayesian optimal design of experiments (ODE) in the following sense: we seek experimental designs that minimize the posterior uncertainty in a predicted quantity of interest (QoI) rather than the estimated parameter itself. This is suitable for scenarios in which the solution of an inverse problem is an intermediate step and the estimated parameter is then used to compute a prediction QoI. In such problems, a GOODE approach has two benefits: the designs can avoid wastage of experimental resources by a targeted collection of data, and the resulting design criteria are computationally easier to evaluate due to the often low dimensionality of prediction QoIs. We present two modified design criteria, AGOODE and DGOODE, which are natural analogues of classical Bayesian A and Doptimal criteria. We analyze the connections to other ODE criteria, and provide interpretations for the GOODE criteria by using tools from information theory. Then, we develop an efficient gradientbased optimization framework for solving the GOODE optimization problems. Additionally, we present comprehensive numerical experiments testing the various aspects of the presented approach. The driving application is the optimal placement of sensors to identify the source of contaminants in a diffusion and transport problem. We enforce sparsity of the sensor placements using an $\ell_1$norm penalty approach, and propose a practical strategy for specifying the associated penalty parameter. 
Goedel Machine  Can machines design Can they come up with creative solutions to problems and build tools and artifacts across a wide range of domains Recent advances in the field of computational creativity and formal Artificial General Intelligence (AGI) provide frameworks for machines with the general ability to design. In this paper we propose to integrate a formal computational creativity framework into the G\’odel machine framework. We call this machine a design G\’odel machine. Such a machine could solve a variety of design problems by generating novel concepts. In addition, it could change the way these concepts are generated by modifying itself. The design G\’odel machine is able to improve its initial design program, once it has proven that a modification would increase its return on the utility function. Finally, we sketch out a specific version of the design G\’odel machine which specifically aims at the design of complex software and hardware systems. Future work could be the development of a more formal version of the Design G\’odel machine and a potential implementation. 
Google AI  At Google AI, we’re conducting research that advances the stateoftheart in the field, applying AI to products and to new domains, and developing tools to ensure that everyone can access AI. Google’s mission is to organize the world’s information and make it universally accessible and useful. AI is helping us do that in exciting new ways, solving problems for our users, our customers, and the world. AI is making it easier for people to do things every day, whether it’s searching for photos of loved ones, breaking down language barriers in Google Translate, typing emails on the go, or getting things done with the Google Assistant. AI also provides new ways of looking at existing problems, from rethinking healthcare to advancing scientific discovery. 
Google Brain Project  Google Brain is an unofficial name for a deep learning research project at Google. 
Google Cloud Dataflow  Simplified stream and batch data processing, with equal reliability and expressiveness. Cloud Dataflow is a fullymanaged service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness — no more complex workarounds or compromises needed. And with its serverless approach to resource provisioning and management, you have access to virtually limitless capacity to solve your biggest data processing challenges, while paying only for what you use. Cloud Dataflow unlocks transformational use cases across industries, including: · check Clickstream, PointofSale, and segmentation analysis in retail · check Fraud detection in financial services · check Personalized user experience in gaming · check IoT analytics in manufacturing, healthcare, and logistics 
Google Prediction API  Google’s cloudbased machine learning tools. Google’s machine learning algorithms to analyze data and predict future outcomes using a familiar RESTful interface. 
GossipGraD  In this paper, we present GossipGraD – a gossip communication protocol based Stochastic Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on largescale systems. The salient features of GossipGraD are: 1) reduction in overall communication complexity from {\Theta}(log(p)) for p compute nodes in wellstudied SGD to O(1), 2) model diffusion such that compute nodes exchange their updates (gradients) indirectly after every log(p) steps, 3) rotation of communication partners for facilitating direct diffusion of gradients, 4) asynchronous distributed shuffle of samples during the feedforward phase in SGD to prevent overfitting, 5) asynchronous communication of gradients for further reducing the communication cost of SGD and GossipGraD. We implement GossipGraD for GPU and CPU clusters and use NVIDIA GPUs (Pascal P100) connected with InfiniBand, and Intel Knights Landing (KNL) connected with Aries network. We evaluate GossipGraD using wellstudied dataset ImageNet1K (~250GB), and widely studied neural network topologies such as GoogLeNet and ResNet50 (current winner of ImageNet Large Scale Visualization Research Challenge (ILSVRC)). Our performance evaluation using both KNL and Pascal GPUs indicates that GossipGraD can achieve perfect efficiency for these datasets and their associated neural network topologies. Specifically, for ResNet50, GossipGraD is able to achieve ~100% compute efficiency using 128 NVIDIA Pascal P100 GPUs – while matching the top1 classification accuracy published in literature. 
GouldenJackson Cluster Method  Finding the generating function for the number of words avoiding, as factors, the members of a prescribed set of ‘dirty words’. 
Gower’s Distance  Idea: Use distance measure between 0 and 1 for each variable and aggregate. gower 
GPflowOpt  A novel Python framework for Bayesian optimization known as GPflowOpt is introduced. The package is based on the popular GPflow library for Gaussian processes, leveraging the benefits of TensorFlow including automatic differentiation, parallelization and GPU computations for Bayesian optimization. Design goals focus on a framework that is easy to extend with custom acquisition functions and models. The framework is thoroughly tested and well documented, and provides scalability. The current released version of GPflowOpt includes some standard singleobjective acquisition functions, the stateoftheart maxvalue entropy search, as well as a Bayesian multiobjective approach. Finally, it permits easy use of custom modeling strategies implemented in GPflow. 
GPU Open Analytics Initiative (GOAI) 
Recently, Continuum Analytics, H2O.ai, and MapD announced the formation of the GPU Open Analytics Initiative (GOAI). GOAIalso joined by BlazingDB, Graphistry and the Gunrock project from the University of California, Davisaims to create open frameworks that allow developers and data scientists to build applications using standard data formats and APIs on GPUs. Bringing standard analytics data formats to GPUs will allow data analytics to be even more efficient, and to take advantage of the high throughput of GPUs. NVIDIA believes this initiative is a key contributor to the continued growth of GPU computing in accelerated analytics. 
Grabit Model (Grabit) 
We introduce a novel model which is obtained by applying gradient tree boosting to the Tobit model. The so called Grabit model allows for modeling data that consist of a mixture of a continuous part and discrete point masses at the borders. Examples of this include censored data, fractional response data, corner solution response data, rainfall data, and binary classification data where additional information, that is related to the underlying classification mechanism, is available. In contrast to the Tobit model, the Grabit model can account for general forms of nonlinearities and interactions, it is robust against outliers in covariates and scale invariant to monotonic transformations for the covariates, and its predictive performance is not impaired by multicollinearity. We apply the Grabit model for predicting defaults on loans made to Swiss small and mediumsized enterprises (SME), and we obtain a large improvement in predictive performance compared to other stateoftheart approaches. 
Gradient Acceleration in Activation Function (GAAF) 
Dropout has been one of standard approaches to train deep neural networks, and it is known to regularize large models to avoid overfitting. The effect of dropout has been explained by avoiding coadaptation. In this paper, however, we propose a new explanation of why dropout works and propose a new technique to design better activation functions. First, we show that dropout is an optimization technique to push the input towards the saturation area of nonlinear activation function by accelerating gradient information flowing even in the saturation area in backpropagation. Based on this explanation, we propose a new technique for activation functions, gradient acceleration in activation function (GAAF), that accelerates gradients to flow even in the saturation area. Then, input to the activation function can climb onto the saturation area which makes the network more robust because the model converges on a flat region. Experiment results support our explanation of dropout and confirm that the proposed GAAF technique improves performances with expected properties. 
Gradient Adversarial Training  We propose gradient adversarial training, an auxiliary deep learning framework applicable to different machine learning problems. In gradient adversarial training, we leverage a prior belief that in many contexts, simultaneous gradient updates should be statistically indistinguishable from each other. We enforce this consistency using an auxiliary network that classifies the origin of the gradient tensor, and the main network serves as an adversary to the auxiliary network in addition to performing standard taskbased training. We demonstrate gradient adversarial training for three different scenarios: (1) as a defense to adversarial examples we classify gradient tensors and tune them to be agnostic to the class of their corresponding example, (2) for knowledge distillation, we do binary classification of gradient tensors derived from the student or teacher network and tune the student gradient tensor to mimic the teacher’s gradient tensor; and (3) for multitask learning we classify the gradient tensors derived from different task loss functions and tune them to be statistically indistinguishable. For each of the three scenarios we show the potential of gradient adversarial training procedure. Specifically, gradient adversarial training increases the robustness of a network to adversarial attacks, is able to better distill the knowledge from a teacher network to a student network compared to soft targets, and boosts multitask learning by aligning the gradient tensors derived from the task specific loss functions. Overall, our experiments demonstrate that gradient tensors contain latent information about whatever tasks are being trained, and can support diverse machine learning problems when intelligently guided through adversarialization using a auxiliary network. 
Gradient Boost Convolutional Autoencoder with Neural Decision Forest (GrCAN) 
Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, we further combine convolutional autoencoder with neural decision forest, where autoencoder has its advantages in finding the hidden representations of the input data. We develop a gradient boost module and embed it into the proposed convolutional autoencoder with neural decision forest to improve the performance. The idea of gradient boost is to learn and use the residual in the prediction. In addition, we design a structure to learn the parameters of the neural decision forest and gradient boost module at contiguous steps. The extensive experiments on several public datasets demonstrate that our proposed model achieves good efficiency and prediction performance compared with a series of baseline methods. 
Gradient Boosted Regression Trees (GBRT) 

Gradient Boosting (GBDT,MART,TreeNet,BTE) 
Gradient boosting is a machine learning technique for regression problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stagewise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. The gradient boosting method can also be used for classification problems by reducing them to regression with a suitable loss function. 
Gradient Boosting Machine  Gradient boosting machines are a family of powerful machinelearning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling. A theoretical information is complemented with descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. Three practical examples of gradient boosting applications are presented and comprehensively analyzed. gbm 
Gradient Normalization  Deep multitask networks, in which one neural network produces multiple predictive outputs, are more scalable and often better regularized than their singletask counterparts. Such advantages can potentially lead to gains in both speed and performance, but multitask networks are also difficult to train without finding the right balance between tasks. We present a novel gradient normalization (GradNorm) technique which automatically balances the multitask loss function by directly tuning the gradients to equalize task training rates. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting over single networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $\alpha$. Thus, what was once a tedious search process which incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we hope to demonstrate that direct gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning. 
Gradient Projection Classical Sketch (GPCS) 

Gradient Projection Iterative Sketch (GPIS) 
We propose a randomized first order optimization algorithm Gradient Projection Iterative Sketch (GPIS) and an accelerated variant for efficiently solving large scale constrained Least Squares (LS). We provide theoretical convergence analysis for both proposed algorithms and demonstrate our methods’ computational efficiency compared to classical accelerated gradient method, and the state of the art variancereduced stochastic gradient methods through numerical experiments in various large synthetic/real data sets. 
Gradient Similarity  Deep neural networks are susceptible to smallbutspecific adversarial perturbations capable of deceiving the network. This vulnerability can lead to potentially harmful consequences in securitycritical applications. To address this vulnerability, we propose a novel metric called \emph{Gradient Similarity} that allows us to capture the influence of training data on test inputs. We show that \emph{Gradient Similarity} behaves differently for normal and adversarial inputs, and enables us to detect a variety of adversarial attacks with a near perfect ROCAUC of 95100\%. Even whitebox adversaries equipped with perfect knowledge of the system cannot bypass our detector easily. On the MNIST dataset, whitebox attacks are either detected with a high ROCAUC of 8796\%, or require very high distortion to bypass our detector. 
Gradual Tuning  In this paper we present an alternative strategy for finetuning the parameters of a network. We named the technique Gradual Tuning. Once trained on a first task, the network is finetuned on a second task by modifying a progressively larger set of the network’s parameters. We test Gradual Tuning on different transfer learning tasks, using networks of different sizes trained with different regularization techniques. The result shows that compared to the usual fine tuning, our approach significantly reduces catastrophic forgetting of the initial task, while still retaining comparable if not better performance on the new task. 
Graduated Symbol Map  A map with symbols that change in size according to the value of the attribute they represent. For example, denser populations might be represented by larger dots, or larger rivers by thicker lines. 
Granger Causality  The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect ‘mere’ correlations, but Clive Granger argued that causality in economics could be reflected by measuring the ability of predicting the future values of a time series using past values of another time series. Since the question of ‘true causality’ is deeply philosophical, econometricians assert that the Granger test finds only ‘predictive causality’. A time series X is said to Grangercause Y if it can be shown, usually through a series of ttests and Ftests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y. Granger also stressed that some studies using ‘Granger causality’ testing in areas outside economics reached ‘ridiculous’ conclusions. ‘Of course, many ridiculous papers appeared’, he said in his Nobel Lecture, December 8, 2003. However, it remains a popular method for causality analysis in time series due to its computational simplicity. The original definition of Granger causality does not account for latent confounding effects and does not capture instantaneous and nonlinear causal relationships, though several extensions have been proposed to address these issues. https://…/grangercausalitytest Cointegration & Granger Causality 
Granger Causality Network  We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, nonidentifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the application of MTD to highdimensional multivariate time series. As a baseline, we also formulate a multioutput logistic autoregressive model (mLTD), which while a straightforward extension of autoregressive Bernoulli generalized linear models, has not been previously applied to the analysis of multivariate categorial time series. We develop novel identifiability conditions of the MTD model and compare them to those for mLTD. We further devise novel and efficient optimization algorithm for the MTD based on the new convex formulation, and compare the MTD and mLTD in both simulated and real data experiments. Our approach simultaneously provides a comparison of methods for network inference in categorical time series and opens the door to modern, regularized inference with the MTD model. 
GRap  Finding the best neural network configuration for a given goal can be challenging, especially when it is not possible to assess the output quality of a network automatically. We present GRap, an interactive interface based on Visual Analytics principles for comparing outputs of multiple RNNs for the same training data. GRap enables an iterative result generation process that allows a user to evaluate the outputs with contextual statistics. 
Graph Attention Network (GAT) 
We present graph attention networks (GATs), novel neural network architectures that operate on graphstructured data, leveraging masked selfattentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods’ features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectralbased graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved stateoftheart results across three established transductive and inductive graph benchmarks: the Cora and Citeseer citation network datasets, as well as a proteinprotein interaction dataset (wherein test graphs are entirely unseen during training). 
Graph Based SemiSupervised Learning (GSSL) 

Graph Bayesian Optimization  Network structure optimization is a fundamental task in complex network analysis. However, almost all the research on Bayesian optimization is aimed at optimizing the objective functions with vectorial inputs. In this work, we first present a flexible framework, denoted graph Bayesian optimization, to handle arbitrary graphs in the Bayesian optimization community. By combining the proposed framework with graph kernels, it can take full advantage of implicit graph structural features to supplement explicit features guessed according to the experience, such as tags of nodes and any attributes of graphs. The proposed framework can identify which features are more important during the optimization process. We apply the framework to solve four problems including two evaluations and two applications to demonstrate its efficacy and potential applications. 
Graph Branch Distance (GBD) 
Graph similarity search is a common and fundamental operation in graph databases. One of the most popular graph similarity measures is the Graph Edit Distance (GED) mainly because of its broad applicability and high interpretability. Despite its prevalence, exact GED computation is proved to be NPhard, which could result in unsatisfactory computational efficiency on large graphs. However, exactly accurate search results are usually unnecessary for realworld applications especially when the responsiveness is far more important than the accuracy. Thus, in this paper, we propose a novel probabilistic approach to efficiently estimate GED, which is further leveraged for the graph similarity search. Specifically, we first take branches as elementary structures in graphs, and introduce a novel graph similarity measure by comparing branches between graphs, i.e., Graph Branch Distance (GBD), which can be efficiently calculated in polynomial time. Then, we formulate the relationship between GED and GBD by considering branch variations as the result ascribed to graph edit operations, and model this process by probabilistic approaches. By applying our model, the GED between any two graphs can be efficiently estimated by their GBD, and these estimations are finally utilized in the graph similarity search. Extensive experiments show that our approach has better accuracy, efficiency and scalability than other comparable methods in the graph similarity search over real and synthetic data sets. 
Graph Capsule Network (GCAPSCNN) 
Graph Convolutional Neural Networks (GCNNs) are the most recent exciting advancement in deep learning field and their applications are quickly spreading in multicrossdomains including bioinformatics, chemoinformatics, social networks, natural language processing and computer vision. In this paper, we expose and tackle some of the basic weaknesses of a GCNN model with a capsule idea presented in~\cite{hinton2011transforming} and propose our Graph Capsule Network (GCAPSCNN) model. In addition, we design our GCAPSCNN model to solve especially graph classification problem which current GCNN models find challenging. Through extensive experiments, we show that our proposed Graph Capsule Network can significantly outperforms both the existing stateofart deep learning methods and graph kernels on graph classification benchmark datasets. 
Graph Convolutional Network  This paper explores the recently proposed Graph Convolutional Network architecture proposed in (Kipf & Welling, 2016) The key points of their work is summarized and their results are reproduced. Graph regularization and alternative graph convolution approaches are explored. I find that explicit graph regularization was correctly rejected by (Kipf & Welling, 2016). I attempt to improve the performance of GCN by approximating a kstep transition matrix in place of the normalized graph laplacian, but I fail to find positive results. Nonetheless, the performance of several configurations of this GCN variation is shown for the Cora, Citeseer, and Pubmed datasets. 
Graph Convolutional Neural Network (Graph CNN) 
Due to the fact much of today’s data can be represented as graphs, there has been a demand for generalizing neural network models for graph data. One recent direction that has shown fruitful results, and therefore growing interest, is the usage of graph convolutional neural networks (GCNs). They have been shown to provide a significant improvement on a wide range of tasks in network analysis, one of which being node representation learning. The task of learning lowdimensional node representations has shown to increase performance on a plethora of other tasks from link prediction and node classification, to community detection and visualization. Simultaneously, signed networks (or graphs having both positive and negative links) have become ubiquitous with the growing popularity of social media. However, since previous GCN models have primarily focused on unsigned networks (or graphs consisting of only positive links), it is unclear how they could be applied to signed networks due to the challenges presented by negative links. The primary challenges are based on negative links having not only a different semantic meaning as compared to positive links, but their principles are inherently different and they form complex relations with positive links. Therefore we propose a dedicated and principled effort that utilizes balance theory to correctly aggregate and propagate the information across layers of a signed GCN model. We perform empirical experiments comparing our proposed signed GCN against stateoftheart baselines for learning node representations in signed networks. More specifically, our experiments are performed on four realworld datasets for the classical link sign prediction problem that is commonly used as the benchmark for signed network embeddings algorithms. 
Graph Cube  In a paper from the University of Illinois at UrbanaChampaign this time in collaboration with Microsoft and Google, a novel data warehousing model called Graph Cube is introduced. Based on a restricted graph model (e.g., no attributes on edges) introduced as multidimensional network (with the dimensions being the vertex attributes), they define the notion of an aggregate network (called cuboid). A graph cube constitutes then the set of all possible aggregations of the original network. 
Graph Database  In computing, a graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. A graph database is any storage system that provides indexfree adjacency. This means that every element contains a direct pointer to its adjacent elements and no index lookups are necessary. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases. 
Graph Function Library  A graph abstraction layer with an objectoriented programming interface has been introduced, which enables the implementation of custom graph algorithms for example within a stored procedure. A set of parameterizable implementations of frequentlyused algorithms will be provided in the form of a Graph Function Library for application developers to choose from. 
Graph Information Criterion (GIC) 
statGraph 
Graph Information Ratio  We introduce the notion of information ratio $\text{Ir}(H/G)$ between two (simple, undirected) graphs $G$ and $H$, defined as the supremum of ratios $k/n$ such that there exists a mapping between the strong products $G^k$ to $H^n$ that preserves nonadjacency. Operationally speaking, the information ratio is the maximal number of source symbols per channel use that can be reliably sent over a channel with a confusion graph $H$, where reliability is measured w.r.t. a source confusion graph $G$. Various results are provided, including in particular lower and upper bounds on $\text{Ir}(H/G)$ in terms of different graph properties, inequalities and identities for behavior under strong product and disjoint union, relations to graph cores, and notions of graph criticality. Informally speaking, $\text{Ir}(H/G)$ can be interpreted as a measure of similarity between $G$ and $H$. We make this notion precise by introducing the concept of information equivalence between graphs, a more quantitative version of homomorphic equivalence. We then describe a natural partial ordering over the space of information equivalence classes, and endow it with a suitable metric structure that is contractive under the strong product. Various examples and open problems are discussed. 
Graph Kernel Library (GraKeL) 
The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Graph kernels have recently emerged as a promising approach to this problem. There are now many kernels, each focusing on different structural aspects of graphs. Here, we present GraKeL, a library that unifies several graph kernels into a common framework. The library is written in Python and is build on top of scikitlearn. It is simple to use and can be naturally combined with scikitlearn’s modules to build a complete machine learning pipeline for tasks such as graph classification and clustering. The code is BSD licensed and is available at: https://…/GraKeL. 
Graph Learning  The construction of a meaningful graph topology plays a crucial role in the effective representation, processing, analysis and visualization of structured data. When a natural choice of the graph is not readily available from the datasets, it is thus desirable to infer or learn a graph topology from the data. In this tutorial overview, we survey solutions to the problem of graph learning, including classical viewpoints from statistics and physics, and more recent approaches that adopt a graph signal processing (GSP) perspective. We further emphasize the conceptual similarities and differences between classical and GSP graph inference methods and highlight the potential advantage of the latter in a number of theoretical and practical scenarios. We conclude with several open issues and challenges that are keys to the design of future signal processing and machine learning algorithms for learning graphs from data. 
Graph Neural Network (GNN) 
Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an mdimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities. 
Graph Partition Neural Network  We present graph partition neural networks (GPNN), an extension of graph neural networks (GNNs) able to handle extremely large graphs. GPNNs alternate between locally propagating information between nodes in small subgraphs and globally propagating information between the subgraphs. To efficiently partition graphs, we experiment with several partitioning algorithms and also propose a novel variant for fast processing of large scale graphs. We extensively test our model on a variety of semisupervised node classification tasks. Experimental results indicate that GPNNs are either superior or comparable to stateoftheart methods on a wide variety of datasets for graphbased semisupervised classification. We also show that GPNNs can achieve similar performance as standard GNNs with fewer propagation steps. 
Graph Processing Framework for Large Dynamic Graphs (BLADYG) 
Recently, distributed processing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, GraphLab, and Trinity. These systems can be divided into two categories: (1) vertexcentric and (2) blockcentric approaches. In vertexcentric approaches, each vertex corresponds to a process, and message are exchanged among vertices. In blockcentric approaches, the unit of computation is a block, a connected subgraph of the graph, and message exchanges occur among blocks. In this paper, we are considering the issues of scale and dynamism in the case of blockcentric approaches. We present bladyg, a blockcentric framework that addresses the issue of dynamism in largescale graphs. We present an implementation of BLADYG on top of akka framework. We experimentally evaluate the performance of the proposed framework. 
Graph Processing over Partitions (GPOP) 
The past decade has seen development of many sharedmemory graph processing frameworks intended to reduce the effort for creating high performance parallel applications. However, their programming models, based on Vertexcentric or Edgecentric paradigms suffer from several issues, such as poor cache utilization, irregular memory accesses, heavy use of synchronization primitives or theoretical inefficiency, that deteriorate the performance and scalability of applications. Recently, a cacheefficient partitioncentric paradigm was proposed for computing PageRank. In this paper, we generalize this approach to develop a novel Partitioncentric Programming Model(PPM) that is cacheefficient, scalable and workefficient. We implement PPM as part of Graph Processing over Partitions(GPOP) framework that can efficiently execute a variety of algorithms. GPOP dramatically improves the cache performance by exploiting locality of partitioning. It achieves high scalability by enabling completely lock and atomic free computation. Its builtin analytical performance models enable it to use a hybrid of source and destination centric communication modes in a way that ensures workefficiency of each iteration and simultaneously boosts high bandwidth sequential memory accesses. GPOP framework completely abstracts away underlying parallelism and programming model details from the user. It provides an easy to program set of APIs with the ability to selectively continue the active vertex set across iterations, which is not intrinsically supported by the current frameworks. We extensively evaluate the performance of GPOP for a variety of graph algorithms, using several large datasets. We observe that GPOP incurs upto 8.6x and 5.2x less L2 cache misses compared to Ligra and GraphMat, respectively. In terms of execution time, GPOP is upto 19x and 6.1x faster than Ligra and GraphMat, respectively. 
Graph sketchingbased Massive Data Clustering (DBMSTClu) 
In this paper, we address the problem of recovering arbitraryshaped data clusters from massive datasets. We present DBMSTClu a new densitybased nonparametric method working on a limited number of linear measurements i.e. a sketched version of the similarity graph $G$ between the $N$ objects to cluster. Unlike $k$means, $k$medians or $k$medoids algorithms, it does not fail at distinguishing clusters with particular structures. No input parameter is needed contrarily to DBSCAN or the Spectral Clustering method. DBMSTClu as a graphbased technique relies on the similarity graph $G$ which costs theoretically $O(N^2)$ in memory. However, our algorithm follows the dynamic semistreaming model by handling $G$ as a stream of edge weight updates and sketches it in one pass over the data into a compact structure requiring $O(\operatorname{poly} \operatorname{log} (N))$ space. Thanks to the property of the Minimum Spanning Tree (MST) for expressing the underlying structure of a graph, our algorithm successfully detects the right number of nonconvex clusters by recovering an approximate MST from the graph sketch of $G$. We provide theoretical guarantees on the quality of the clustering partition and also demonstrate its advantage over the existing stateoftheart on several datasets. 
Graph Structured Recurrent Neural Network (GSRNN) 
We present a generic framework for spatiotemporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multiscaled framework is a seamless coupling of two major components: a selfexciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting. 
Graph Weighted Model (GWM) 
Graph Weighted Models (GWMs) have recently been proposed as a natural generalization of weighted automata over strings and trees to arbitrary families of labeled graphs (and hypergraphs). A GWM generically associates a labeled graph with a tensor network and computes a value by successive contractions directed by its edges. 
Graph2Seq  Celebrated \emph{Sequence to Sequence learning (Seq2Seq)} and its fruitful variants are powerful models to achieve excellent performance on the tasks that map sequences to sequences. However, these are many machine learning tasks with inputs naturally represented in a form of graphs, which imposes significant challenges to existing Seq2Seq models for lossless conversion from its graph form to the sequence. In this work, we present a general endtoend approach to map the input graph to a sequence of vectors, and then another attentionbased LSTM to decode the target sequence from these vectors. Specifically, to address inevitable information loss for data conversion, we introduce a novel graphtosequence neural network model that follows the encoderdecoder architecture. Our method first uses an improved graphbased neural network to generate the node and graph embeddings by a novel aggregation strategy to incorporate the edge direction information into the node embeddings. We also propose an attention based mechanism that aligns node embeddings and decoding sequence to better cope with large graphs. Experimental results on bAbI task, Shortest Path Task, and Natural Language Generation Task demonstrate that our model achieves the stateoftheart performance and significantly outperforms other baselines. We also show that with the proposed aggregation strategy, our proposed model is able to quickly converge to good performance. 
graph2vec  Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn datadriven distributed representations of arbitrary sized graphs. graph2vec’s embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches. Our experiments on several benchmark and large realworld datasets show that graph2vec achieves significant improvements in classification and clustering accuracies over substructure representation learning approaches and are competitive with stateoftheart graph kernels. 
Graphbased Activity Regularization (GAR) 
In this paper, we propose a novel graphbased approach for semisupervised learning problems, which considers an adaptive adjacency of the examples throughout the unsupervised portion of the training. Adjacency of the examples is inferred using the predictions of a neural network model which is first initialized by a supervised pretraining. These predictions are then updated according to a novel unsupervised objective which regularizes another adjacency, now linking the output nodes. Regularizing the adjacency of the output nodes, inferred from the predictions of the network, creates an easier optimization problem and ultimately provides that the predictions of the network turn into the optimal embedding. Ultimately, the proposed framework provides an effective and scalable graphbased solution which is natural to the operational mechanism of deep neural networks. Our results show stateoftheart performance within semisupervised learning with the highest accuracies reported to date in the literature for SVHN and NORB datasets. 
GraphBased Collaborative Filtering (GCF) 
Introducing consumed items as users’ implicit feedback in matrix factorization (MF) method, SVD++ is one of the most effective collaborative filtering methods for personalized recommender systems. Though powerful, SVD++ has two limitations: (i). only userside implicit feedback is utilized, whereas itemside implicit feedback, which can also enrich item representations, is not leveraged;(ii). in SVD++, the interacted items are equally weighted when combining the implicit feedback, which can not reflect user’s true preferences accurately. To tackle the above limitations, in this paper we propose Graphbased collaborative filtering (GCF) model, Weighted Graphbased collaborative filtering (WGCF) model and Attentive Graphbased collaborative filtering (AGCF) model, which (i). generalize the implicit feedback to item side based on the useritem bipartite graph; (ii). flexibly learn the weights of individuals in the implicit feedback hence improve the model’s capacity. Comprehensive experiments show that our proposed models outperform stateoftheart models.For sparse implicit feedback scenarios, additional improvement is further achieved by leveraging the steptwo implicit feedback information. 
GraphBLAS  An effort to define standard building blocks for Graph Algorithms in the language of Linear Algebra. 
GraphConnect  Deep neural networks have proved very successful in domains where large training sets are available, but when the number of training samples is small, their performance suffers from overfitting. Prior methods of reducing overfitting such as weight decay, Dropout and DropConnect are dataindependent. This paper proposes a new method, GraphConnect, that is datadependent, and is motivated by the observation that data of interest lie close to a manifold. The new method encourages the relationships between the learned decisions to resemble a graph representing the manifold structure. Essentially GraphConnect is designed to learn attributes that are present in data samples in contrast to weight decay, Dropout and DropConnect which are simply designed to make it more difficult to fit to random error or noise. Empirical Rademacher complexity is used to connect the generalization error of the neural network to spectral properties of the graph learned from the input data. This framework is used to show that GraphConnect is superior to weight decay. Experimental results on several benchmark datasets validate the theoretical analysis, and show that when the number of training samples is small, GraphConnect is able to significantly improve performance over weight decay. 
Graphene  We introduce Graphene, an Open IE system whose goal is to generate accurate, meaningful and complete propositions that may facilitate a variety of downstream semantic applications. For this purpose, we transform syntactically complex input sentences into clean, compact structures in the form of core facts and accompanying contexts, while identifying the rhetorical relations that hold between them in order to maintain their semantic relationship. In that way, we preserve the context of the relational tuples extracted from a source sentence, generating a novel lightweight semantic representation for Open IE that enhances the expressiveness of the extracted propositions. 
GraphGAN  The goal of graph representation learning is to embed each vertex in a graph into a lowdimensional vector space. Existing graph representation learning methods can be classified into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper, we propose GraphGAN, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a gametheoretical minimax game. Specifically, for a given vertex, the generative model tries to fit its underlying true connectivity distribution over all other vertices and produces ‘fake’ samples to fool the discriminative model, while the discriminative model tries to detect whether the sampled vertex is from ground truth or generated by the generative model. With the competition between these two models, both of them can alternately and iteratively boost their performance. Moreover, when considering the implementation of generative model, we propose a novel graph softmax to overcome the limitations of traditional softmax function, which can be proven satisfying desirable properties of normalization, graph structure awareness, and computational efficiency. Through extensive experiments on realworld datasets, we demonstrate that GraphGAN achieves substantial gains in a variety of applications, including link prediction, node classification, and recommendation, over stateoftheart baselines. 
GraphGuided Fused LASSO (GFLASSO) 
Let X be a matrix of size n × p , with n observations and p predictors and Y a matrix of size n × k, with the same n observations and k responses, say, 1390 distinct electronics purchase records in 73 countries, to predict the ratings of 50 Netflix productions over all 73 countries. Models well poised for modeling pairs of highdimensional datasets include orthogonal twoway Partial Least Squares (O2PLS), Canonical Correlation Analysis (CCA) and CoInertia Analysis (CIA), all of which involving matrix decomposition. Additionally, since these models are based on latent variables (that is, projections based on the original predictors), the computational efficiency comes at a cost of interpretability. However, this tradeoff does not always pay off, and can be reverted with the direct prediction of k individual responses from selected features in X, in a unified regression framework that takes into account the relationships among the responses. Mathematically, the GFLASSO borrows the regularization of the LASSO discussed above and builds the model on the graph dependency structure underlying Y, as quantified by the k × k correlation matrix (that is the ‘strength of association’ that you read about earlier). As a result, similar (or dissimilar) responses will be explained by a similar (or dissimilar) subset of selected predictors. 
GraphH  It is common for realworld applications to analyze big graphs using distributed graph processing systems. Popular inmemory systems require an enormous amount of resources to handle big graphs. While several outofcore systems have been proposed recently for processing big graphs using secondary storage, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable high performance big graph analytics in small clusters. Specifically, we design a twostage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (GatherApply Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular inmemory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed outofcore systems, such as GraphD and Chaos when processing big graphs. 
Graphical Causal Models  A species of the broader genus of graphical models, especially intended to help with problems of causal inference. 
Graphical Generative Adversarial Network (GraphicalGAN) 
We propose Graphical Generative Adversarial Networks (GraphicalGAN) to model structured data. GraphicalGAN conjoins the power of Bayesian networks on compactly representing the dependency structures among random variables and that of generative adversarial networks on learning expressive dependency functions. We introduce a structured recognition model to infer the posterior distribution of latent variables given observations. We propose two alternative divergence minimization approaches to learn the generative model and recognition model jointly. The first one treats all variables as a whole, while the second one utilizes the structural information by checking the individual local factors defined by the generative model and works better in practice. Finally, we present two important instances of GraphicalGAN, i.e. Gaussian Mixture GAN (GMGAN) and State Space GAN (SSGAN), which can successfully learn the discrete and temporal structures on visual datasets, respectively. 
Graphical Markov Models (GMM) 
A central aspect of statistical science is the assessment of dependence among stochastic variables. The familiar concepts of correlation, regression, and prediction are special cases, and identification of causal relationships ultimately rests on representations of multivariate dependence. Graphical Markov models (GMM) use graphs, either undirected, directed, or mixed, to represent multivariate dependences in a visual and computationally efficient manner. A GMM is usually constructed by specifying local dependences for each variable, equivalently, node of the graph in terms of its immediate neighbors and/or parents by means of undirected and/or directed edges. This simple local specification can represent a highly varied and complex system of multivariate dependences by means of the global structure of the graph, thereby obtaining efficiency in modeling, inference, and probabilistic calculations. For a fixed graph, equivalently model, the classical methods of statistical inference may be utilized. In many applied domains, however, such as expert systems for medical diagnosis or weather forecasting, or the analysis of geneexpression data, the graph is unknown and is itself the first goal of the analysis. This poses numerous challenges, including the following: · The numbers of possible graphs and models grow superexponentially in the number of variables. · Distinct graphs G may be Markov equivalent = statistically indistinguishable. · Conversely, the same graph may possess different Markov interpretations. ggm 
Graphical Model  A graphical model is a probabilistic model for which a graph denotes the conditional dependence structure between random variables. They are commonly used in probability theory, statistics – particularly Bayesian statistics – and machine learning. Generally, probabilistic graphical models use a graphbased representation as the foundation for encoding a complete distribution over a multidimensional space and a graph that is a compact or factorized representation of a set of independences that hold in the specific distribution. Two branches of graphical representations of distributions are commonly used, namely, Bayesian networks and Markov networks. Both families encompass the properties of factorization and independences, but they differ in the set of independences they can encode and the factorization of the distribution that they induce. 
Graphite  Graphs are a fundamental abstraction for modeling relational data. However, graphs are discrete and combinatorial in nature, and learning representations suitable for machine learning tasks poses statistical and computational challenges. In this work, we propose Graphite an algorithmic framework for unsupervised learning of representations over nodes in a graph using deep latent variable generative models. Our model is based on variational autoencoders (VAE), and differs from existing VAE frameworks for data modalities such as images, speech, and text in the use of graph neural networks for parameterizing both the generative model (i.e., decoder) and inference model (i.e., encoder). The use of graph neural networks directly incorporates inductive biases due to the spatial, local structure of graphs directly in the generative model. Moreover, we draw novel connections between graph neural networks and approximate inference via kernel embeddings of distributions. We demonstrate empirically that Graphite outperforms stateoftheart approaches for the tasks of density estimation, link prediction, and node classification on synthetic and benchmark datasets. 
GraphQL  1. A query language for your API: GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools. 2. Ask for what you need, get exactly that: Send a GraphQL query to your API and get exactly what you need, nothing more and nothing less. GraphQL queries always return predictable results. Apps using GraphQL are fast and stable because they control the data they get, not the server. 3. Get many resources in a single request: GraphQL queries access not just the properties of one resource but also smoothly follow references between them. While typical REST APIs require loading from multiple URLs, GraphQL APIs get all the data your app needs in a single request. Apps using GraphQL can be quick even on slow mobile network connections. 4. Describe what´s possible with a type system: GraphQL APIs are organized in terms of types and fields, not endpoints. Access the full capabilities of your data from a single endpoint. GraphQL uses types to ensure Apps only ask for what´s possible and provide clear and helpful errors. Apps can use types to avoid writing manual parsing code. 5. Evolve your API without versions: Add new fields and types to your GraphQL API without impacting existing queries. Aging fields can be deprecated and hidden from tools. By using a single evolving version, GraphQL APIs give apps continuous access to new features and encourage cleaner, more maintainable server code. 6. Bring your own data and code: GraphQL creates a uniform API across your entire application without being limited by a specific storage engine. Write GraphQL APIs that leverage your existing data and code with GraphQL engines available in many languages. You provide functions for each field in the type system, and GraphQL calls them with optimal concurrency. 
GraphRNN  Modeling and generating graphs is fundamental for studying networks in biology, engineering, and social sciences. However, modeling complex distributions over graphs and then efficiently sampling from these distributions is challenging due to the nonunique, highdimensional nature of graphs and the complex, nonlocal dependencies that exist between edges in a given graph. Here we propose GraphRNN, a deep autoregressive model that addresses the above challenges and approximates any distribution of graphs with minimal assumptions about their structure. GraphRNN learns to generate graphs by training on a representative set of graphs and decomposes the graph generation process into a sequence of node and edge formations, conditioned on the graph structure generated so far. In order to quantitatively evaluate the performance of GraphRNN, we introduce a benchmark suite of datasets, baselines and novel evaluation metrics based on Maximum Mean Discrepancy, which measure distances between sets of graphs. Our experiments show that GraphRNN significantly outperforms all baselines, learning to generate diverse graphs that match the structural characteristics of a target set, while also scaling to graphs 50 times larger than previous deep models. 
GraphSAGE  Lowdimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Our algorithm outperforms strong baselines on three inductive nodeclassification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multigraph dataset of proteinprotein interactions. 
GraphSparse Logistic Regression  We introduce GraphSparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val idate this algorithm against synthetic data and benchmark it against L1regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package. 
Graphviz  Graphviz (short for Graph Visualization Software) is a package of opensource tools initiated by AT&T Labs Research for drawing graphs specified in DOT language scripts. It also provides libraries for software applications to use the tools. Graphviz is free software licensed under the Eclipse Public License. https://…/viz.js 
GraphX  GraphX is Apache Spark’s API for graphs and graphparallel computation. 
Gravitational Clustering  The downfall of many supervised learning algorithms, such as neural networks, is the inherent need for a large amount of training data. Although there is a lot of buzz about big data, there is still the problem of doing classification from a small dataset. Other methods such as support vector machines, although capable of dealing with few samples, are inherently binary classifiers, and are in need of learning strategies such as One vs All in the case of multiclassification. In the presence of a large number of classes this can become problematic. In this paper we present, a novel approach to supervised learning through the method of clustering. Unlike traditional methods such as KMeans, Gravitational Clustering does not require the initial number of clusters, and automatically builds the clusters, individual samples can be arbitrarily weighted and it requires only few samples while staying resilient to overfitting. 
Graybox Adversarial Training  Adversarial samples are perturbed inputs crafted to mislead the machine learning systems. A training mechanism, called adversarial training, which presents adversarial samples along with clean samples has been introduced to learn robust models. In order to scale adversarial training for large datasets, these perturbations can only be crafted using fast and simple methods (e.g., gradient ascent). However, it is shown that adversarial training converges to a degenerate minimum, where the model appears to be robust by generating weaker adversaries. As a result, the models are vulnerable to simple blackbox attacks. In this paper we, (i) demonstrate the shortcomings of existing evaluation policy, (ii) introduce novel variants of whitebox and blackbox attacks, dubbed graybox adversarial attacks’ based on which we propose novel evaluation method to assess the robustness of the learned models, and (iii) propose a novel variant of adversarial training, named Graybox Adversarial Training’ that uses intermediate versions of the models to seed the adversaries. Experimental evaluation demonstrates that the models trained using our method exhibit better robustness compared to both undefended and adversarially trained model 
Greedy Algorithm  A greedy algorithm is an algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum. In many problems, a greedy strategy does not in general produce an optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that approximate a global optimal solution in a reasonable time. For example, a greedy strategy for the traveling salesman problem (which is of a high computational complexity) is the following heuristic: ‘At each stage visit an unvisited city nearest to the current city’. This heuristic need not find a best solution, but terminates in a reasonable number of steps; finding an optimal solution typically requires unreasonably many steps. In mathematical optimization, greedy algorithms solve combinatorial problems having the properties of s. 
Greedy Neural Architecture Search (GNAS) 
A key problem in deep multiattribute learning is to effectively discover the interattribute correlation structures. Typically, the conventional deep multiattribute learning approaches follow the pipeline of manually designing the network architectures based on taskspecific expertise prior knowledge and careful network tunings, leading to the inflexibility for various complicated scenarios in practice. Motivated by addressing this problem, we propose an efficient greedy neural architecture search approach (GNAS) to automatically discover the optimal treelike deep architecture for multiattribute learning. In a greedy manner, GNAS divides the optimization of global architecture into the optimizations of individual connections step by step. By iteratively updating the local architectures, the global treelike architecture gets converged where the bottom layers are shared across relevant attributes and the branches in top layers more encode attributespecific features. Experiments on three benchmark multiattribute datasets show the effectiveness and compactness of neural architectures derived by GNAS, and also demonstrate the efficiency of GNAS in searching neural architectures. 
Greedy Randomized Adaptive Search Procedures (GRASP) 
The greedy randomized adaptive search procedure (also known as GRASP) is a metaheuristic algorithm commonly applied to combinatorial optimization problems. GRASP typically consists of iterations made up from successive constructions of a greedy randomized solution and subsequent iterative improvements of it through a local search. The greedy randomized solutions are generated by adding elements to the problem’s solution set from a list of elements ranked by a greedy function according to the quality of the solution they will achieve. To obtain variability in the candidate set of greedy solutions, wellranked candidate elements are often placed in a restricted candidate list (also known as RCL), and chosen at random when building up the solution. This kind of greedy randomized construction method is also known as a semigreedy heuristic, first described in Hart and Shogan (1987). GRASP was first introduced in Feo and Resende (1989). Survey papers on GRASP include Feo and Resende (1995), Pitsoulis and Resende (2002), and Resende and Ribeiro (2003). An annotated bibliography of GRASP can be found in Festa, G. C Resende (2002). 
Greenhouse  Greenhouse – a zeropositive machine learning system for timeseries anomaly detection. 
Greenplum Database (GPDB) 
The Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced costbased query optimizer delivering high analytical query performance on large data volumes. The Greenplum project is released under the Apache 2 license. We want to thank all our current community contributors and are really interested in all new potential contributions. For the Greenplum Database community no contribution is too small, we encourage all types of contributions. 
Greenwald and Khanna Algorithm  Copulas for Streaming Data 
greta  greta lets us build statistical models interactively in R, and then sample from them by MCMC. We build greta models with greta array objects, which behave much like R’s array, matrix and vector objects for numeric data. Like those numeric data objects, greta arrays can be manipulated with functions and mathematical operators to create new greta arrays. The key difference between greta arrays and numeric data objects is that when you do something to a greta array, greta doesn’t calculate the values of the new greta array. Instead, it just remembers what operation to do, and works out the size and shape of the result. 
Grey Box Model  In mathematics, statistics, and computational modelling, a grey box model combines a partial theoretical structure with data to complete the model. The theoretical structure may vary from information on the smoothness of results, to models that need only parameter values from data or existing literature. Thus, almost all models are grey box models as opposed to black box where no model form is assumed or white box models that are purely theoretical. Some models assume a special form such as a linear regression or neural network. These have special analysis methods. In particular linear regression techniques are much more efficient than most nonlinear techniques. The model can be deterministic or stochastic (i.e. containing random components) depending on its planned use. 
Grey Machine Learning  A brief introduction to the Grey Machine Learning 
Grey Machine Learning Model based Variable Separable (VSGML) 
The Grey Machine Learning Model based Variable Separable (VSGML) is presented in this paper. The VSGML’s function set is composed of the variable separable function. The DivideandConquer architecture based Radial Basis Function (DCRBF) Network is constructed to implement VSGML. This DCRBF is composed of several subRBF networks which takes each subspace as its input. The output of DCRBF is the sum of each subRBF networks’ output. The algorithm of DCRBF is given and its approximation ability also is discussed in this paper. The experimental results have shown that the DCRBF is outperforms the conventional RBF. A Grey Machine Learning Model with application to time series. Available from: https://…ing_Model_with_application_to_time_series [accessed May 07 2018]. 
Grid Search  The de facto standard way of performing hyperparameter optimization is grid search, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by crossvalidation on the training set or evaluation on a heldout validation set. Since the parameter space of a machine learner may include realvalued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search. 
GridNet  This paper presents GridNet, a new Convolutional Neural Network (CNN) architecture for semantic image segmentation (full scene labelling). Classical neural networks are implemented as one stream from the input to the output with subsampling operators applied in the stream in order to reduce the feature maps size and to increase the receptive field for the final prediction. However, for semantic image segmentation, where the task consists in providing a semantic class to each pixel of an image, feature maps reduction is harmful because it leads to a resolution loss in the output prediction. To tackle this problem, our GridNet follows a grid pattern allowing multiple interconnected streams to work at different resolutions. We show that our network generalizes many well known networks such as convdeconv, residual or UNet networks. GridNet is trained from scratch and achieves competitive results on the Cityscapes dataset. 
Gridster.js  Gridster is a jQuery plugin that allows building intuitive draggable layouts from elements spanning multiple columns. You can even dynamically add and remove elements from the grid. It is on par with sliced bread, or possibly better. MIT licensed. Drag and Drop Visuals in your Interactive Dashboard 
Grounded Recurrent Neural Network (GRNN) 
In this work, we present the Grounded Recurrent Neural Network (GRNN), a recurrent neural network architecture for multilabel prediction which explicitly ties labels to specific dimensions of the recurrent hidden state (we call this process ‘grounding’). The approach is particularly wellsuited for extracting large numbers of concepts from text. We apply the new model to address an important problem in healthcare of understanding what medical concepts are discussed in clinical text. Using a publicly available dataset derived from Intensive Care Units, we learn to label a patient’s diagnoses and procedures from their discharge summary. Our evaluation shows a clear advantage to using our proposed architecture over a variety of strong baselines. 
Group Equivariant Capsule Network  We present group equivariant capsule networks, a framework to introduce guaranteed equivariance and invariance properties to the capsule network idea. We restrict pose vectors and learned transformations to be elements of a group, which allows us to prove equivariance of pose vectors and invariance of activations under application of the group law. Requirements are a modified spatial aggregation method for capsules and a generic routing by agreement algorithm with abstract rules, which we both present in this work. Further, we connect our equivariant capsule networks with work from the field of group convolutional networks, which consist of convolutions that are equivariant under applications of the group law. Through this connection, we are able to provide intuitions of how both methods relate and are able to combine both approaches in one deep neural network architecture, combining the strengths from both fields. The resulting framework allows sparse evaluation of feature maps defined over groups, provides control over specific equivariance and invariance properties and can use routing by agreement instead of pooling operations. It provides interpretable and equivariant representation vectors as output capsules, which disentangle evidence of object existence from its pose. 
Group Fused Multinomial Regression  gfmR 
Group Method of Data Handling (GMDH) 
Group method of data handling (GMDH) is a family of inductive algorithms for computerbased mathematical modeling of multiparametric datasets that features fully automatic structural and parametric optimization of models. GMDH is used in such fields as data mining, knowledge discovery, prediction, complex systems modeling, optimization and pattern recognition. GMDH algorithms are characterized by inductive procedure that performs sortingout of gradually complicated polynomial models and selecting the best solution by means of the socalled external criterion. GMDH,GMDH2 
Group Normalization (GN) 
Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pretraining to finetuning. GN can outperform or compete with its BNbased counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries. 
Grouped Merging Net (GMNet) 
Deep Convolutional Neural Networks (CNNs) are capable of learning unprecedentedly effective features from images. Some researchers have struggled to enhance the parameters’ efficiency using grouped convolution. However, the relation between the optimal number of convolutional groups and the recognition performance remains an open problem. In this paper, we propose a series of Basic Units (BUs) and a twolevel merging strategy to construct deep CNNs, referred to as a joint Grouped Merging Net (GMNet), which can produce joint grouped and reused deep features while maintaining the feature discriminability for classification tasks. Our GMNet architectures with the proposed BU_A (dense connection) and BU_B (straight mapping) lead to significant reduction in the number of network parameters and obtain performance improvement in image classification tasks. Extensive experiments are conducted to validate the superior performance of the GMNet than the stateofthearts on the benchmark datasets, e.g., MNIST, CIFAR10, CIFAR100 and SVHN. 
GroupFused Graphical Lasso (GFGL) 
We consider the consistency properties of a regularised estimator for the simultaneous identification of both changepoints and graphical dependency structure in multivariate timeseries. Traditionally, estimation of Gaussian Graphical Models (GGM) is performed in an i.i.d setting. More recently, such models have been extended to allow for changes in the distribution, but only where changepoints are known apriori. In this work, we study the GroupFused Graphical Lasso (GFGL) which penalises partialcorrelations with an L1 penalty while simultaneously inducing blockwise smoothness over time to detect multiple changepoints. We present a proof of consistency for the estimator, both in terms of changepoints, and the structure of the graphical models in each segment. 
GroupRemMap Penalty  Expression quantitative trait loci (eQTLs) are genomic loci that regulate expression levels of mRNAs or proteins. Understanding these regulatory provides important clues to biological pathways that underlie diseases. In this paper, we propose a new statistical method, GroupRemMap, for identifying eQTLs. We model the relationship between gene expression and single nucleotide variants (SNVs) through multivariate linear regression models, in which gene expression levels are responses and SNV genotypes are predictors. To handle the highdimensionality as well as to incorporate the intrinsic group structure of SNVs, we introduce a new regularization scheme to (1) control the overall sparsity of the model; (2) encourage the group selection of SNVs from the same gene; and (3) facilitate the detection of transhubeQTLs. We apply the proposed method to the colorectal and breast cancer data sets from The Cancer Genome Atlas (TCGA), and identify several biologically interesting eQTLs. These ndings may provide insight into biological processes associated with cancers and generate hypotheses for future studies. groupRemMap 
Growth Curve Analysis (GCA) 
Growth curve analysis (GCA) is a multilevel regression technique designed for analysis of time course or longitudinal data. A major advantage of this approach is that it can be used to simultaneously analyze both grouplevel effects (e.g., experimental manipulations) and individuallevel effects (i.e., individual differences). 
Growth Hacking  Growth hacking is a marketing technique developed by technology startups which uses creativity, analytical thinking, and social metrics to sell products and gain exposure. It can be seen as part of the online marketing ecosystem, as in many cases growth hackers are simply good at using techniques such as search engine optimization, website analytics, content marketing and A/B testing which are already mainstream. Growth hackers focus on lowcost and innovative alternatives to traditional marketing, e.g. utilizing social media and viral marketing instead of buying advertising through more traditional media such as radio, newspaper, and television. Growth hacking is particularly important for startups, as it allows for a ‘lean’ launch that focuses on ‘growth first, budgets second.’ Facebook, Twitter, LinkedIn, AirBnB and Dropbox are all companies that use growth hacking techniques. 
Grubbs Test  Grubbs’ test (named after Frank E. Grubbs, who published the test in 1950), also known as the maximum normed residual test or extreme studentized deviate test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population. outliers 
Guided Evolutionary Strategies  Many applications in machine learning require optimizing a function whose true gradient is unknown, but where surrogate gradient information (directions that may be correlated with, but not necessarily identical to, the true gradient) is available instead. This arises when an approximate gradient is easier to compute than the full gradient (e.g. in metalearning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e.g. in certain reinforcement learning applications, or when using synthetic gradients). We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search. We define a search distribution for evolutionary strategies that is elongated along a guiding subspace spanned by the surrogate gradients. This allows us to estimate a descent direction which can then be passed to a firstorder optimizer. We analytically and numerically characterize the tradeoffs that result from tuning how strongly the search distribution is stretched along the guiding subspace, and we use this to derive a setting of the hyperparameters that works well across problems. Finally, we apply our method to example problems including truncated unrolled optimization and a synthetic gradient problem, demonstrating improvement over both standard evolutionary strategies and firstorder methods that directly follow the surrogate gradient. We provide a demo of Guided ES at: https://…/guidedevolutionarystrategies. 
Guided Labeling  Over the last couple of years, deep learning and especially convolutional neural networks have become one of the work horses of computer vision. One limiting factor for the applicability of supervised deep learning to more areas is the need for large, manually labeled datasets. In this paper we propose an easy to implement method we call guided labeling, which automatically determines which samples from an unlabeled dataset should be labeled. We show that using this procedure, the amount of samples that need to be labeled is reduced considerably in comparison to labeling images arbitrarily. 
Guided Local Search (GLS) 
Guided Local Search is a metaheuristic search method. A metaheuristic method is a method that sits on top of a local search algorithm to change its behavior. Guided Local Search builds up penalties during a search. It uses penalties to help local search algorithms escape from local minimal and plateaus. When the given local search algorithm settles in a local optimum, GLS modifies the objective function using a specific scheme (explained below). Then the local search will operate using an augmented objective function, which is designed to bring the search out of the local optimum. The key is in the way that the objective function is modified. 
GuideR  This article presents GuideR, a userguided rule induction algorithm, which overcomes the largest limitation of the existing methodsthe lack of the possibility to introduce user’s preferences or domain knowledge to the rule learning process. Automatic selection of attributes and attribute ranges often leads to the situation in which resulting rules do not contain interesting information. We propose an induction algorithm which takes into account user’s requirements. Our method uses the sequential covering approach and is suitable for classification, regression, and survival analysis problems. The effectiveness of the algorithm in all these tasks has been verified experimentally, confirming guided rule induction to be a powerful data analysis tool. 
Advertisements