H2O  The Open Source InMemory Prediction Engine for Big Data Science. H2O is an awesome machine learning framework. It is really great for data scientists and business analysts ‘who need scalable and fast machine learning’. H2O is completely open source and what makes it important is that works right of the box. There seems to be no easier way to start with scalable machine learning. It hast support for R, Python, Scala, Java and also has a REST API and a own WebUI. So you can use it perfectly for research but also in production environments. H2O is based on Apache Hadoop and Apache Spark which gives it enormous power with inmemory parallel processing. Predict Social Network Influence with R and H2O Ensemble Learning 
HalfLife of Data  Radioactive substances have a half life. The half life is the amount of time it takes for the substance to lose half of its radioactivity. Half life is used more generally in physics as a way to estimate the rate of decay. We can apply exactly the same principle – the rate of decay – to business information. Like natural materials, data is subject to deterioration over time. In science, the half life of a given substance could be milliseconds. It could be many thousands of years. The half life of data has been measured, and it may be shorter than you were expecting. http://…/infographicsthehalflifeofdata 
Hamiltonian Flow Monte Carlo (HFMC) 

Hamiltonian Monte Carlo (HMC) 
The randomwalk behavior of many Markov Chain Monte Carlo (MCMC) algorithms makes Markov chain convergence to a target stationary distribution p(x) inefficient, resulting in slow mixing. Hamiltonian/Hybrid Monte Carlo (HMC), is a MCMC method that adopts physical system dynamics rather than a probability distribution to propose future states in the Markov chain. This allows the Markov chain to explore the target distribution much more efficiently, resulting in faster convergence. Here we introduce basic analytic and numerical concepts for simulation of Hamiltonian dynamics. We then show how Hamiltonian dynamics can be used as the Markov chain proposal function for an MCMC sampling algorithm (HMC). ➘ “Hybrid Monte Carlo” MCMC using Hamiltonian Dynamics 
Hamiltonian Variational AutoEncoder (HVAE) 
Variational AutoEncoders (VAEs) have become very popular techniques to perform inference and learning in latent variable models as they allow us to leverage the rich representational power of neural networks to obtain flexible approximations of the posterior of latent variables as well as tight evidence lower bounds (ELBOs). Combined with stochastic variational inference, this provides a methodology scaling to large datasets. However, for this methodology to be practically efficient, it is necessary to obtain lowvariance unbiased estimators of the ELBO and its gradients with respect to the parameters of interest. While the use of Markov chain Monte Carlo (MCMC) techniques such as Hamiltonian Monte Carlo (HMC) has been previously suggested to achieve this [23, 26], the proposed methods require specifying reverse kernels which have a large impact on performance. Additionally, the resulting unbiased estimator of the ELBO for most MCMC kernels is typically not amenable to the reparameterization trick. We show here how to optimally select reverse kernels in this setting and, by building upon Hamiltonian Importance Sampling (HIS) [17], we obtain a scheme that provides lowvariance unbiased estimators of the ELBO and its gradients using the reparameterization trick. This allows us to develop a Hamiltonian Variational AutoEncoder (HVAE). This method can be reinterpreted as a targetinformed normalizing flow [20] which, within our context, only requires a few evaluations of the gradient of the sampled likelihood and trivial Jacobian calculations at each iteration. 
HamiltonJacobi Reachability Analysis (HJRA) 
HamiltonJacobi (HJ) reachability analysis is an important formal verification method for guaranteeing performance and safety properties of dynamical systems; it has been applied to many smallscale systems in the past decade. Its advantages include compatibility with general nonlinear system dynamics, formal treatment of bounded disturbances, and the availability of welldeveloped numerical tools. The main challenge is addressing its exponential computational complexity with respect to the number of state variables. In this tutorial, we present an overview of basic HJ reachability theory and provide instructions for using the most recent numerical tools, including an efficient GPUparallelized implementation of a Level Set Toolbox for computing reachable sets. In addition, we review some of the current work in highdimensional HJ reachability to show how the dimensionality challenge can be alleviated via various general theoretical and applicationspecific insights. 
Hamming Distance  In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other. 
HANA  
HANA Data Scientist Tool  The Application Function Modeler 2.0 (AFM 2) is a graphical editor for complex data analysis pipelines in the HANA Studio. This tool is based on the HANA Data Scientist prototype developed at the HANA Platform Innovation Center in Potsdam, Germany. It is planned to be the next generation of the existing HANA Studio Application Function Modeler which was developed at the TIP CE&SP Algorithm Labs in Shanghai, China. The AFM 2 team consists of original and new developers from both locations. 
HANA Graph Engine  The HANA Graph Engine implements graph data processing capabilities directly inside the Column Store Engine of the SAP HANA Database. 
HANA Sizing  Check the HANA sizing overview to find the appropriate sizing method. 
HANA Social Media Integration (HANASMI) 
HANASMI is a reusable component on HANA XS that enables XS application developers the integration of social media providers (with an initial focus on SAP Jam) into their business application. 
Handsontable  Handsontable is a data grid component with an Excellike appearance. Built in JavaScript, it integrates with any data source with peak efficiency. It comes with powerful features like data validation, sorting, grouping, data binding, formula support or column ordering. Built and actively supported by the Handsoncode team and the GitHub community ?, distributed free under the MIT license. rhandsontable 
HardELiSH  Deep Neural Networks have been shown to be beneficial for a variety of tasks, in particular allowing for endtoend learning and reducing the requirement for manual design decisions. However, still many parameters have to be chosen in advance, also raising the need to optimize them. One important, but often ignored system parameter is the selection of a proper activation function. Thus, in this paper we target to demonstrate the importance of activation functions in general and show that for different tasks different activation functions might be meaningful. To avoid the manual design or selection of activation functions, we build on the idea of genetic algorithms to learn the best activation function for a given task. In addition, we introduce two new activation functions, ELiSH and HardELiSH, which can easily be incorporated in our framework. In this way, we demonstrate for three different image classification benchmarks that different activation functions are learned, also showing improved results compared to typically used baselines. 
HardtoFindData (HTFD) 
Well, really more of a 4letter acronym, but a powerful advantage of DaaS is the ability to source hardtofind data that has been aggregated from hundreds of Big Data sources. These data sets are highly targeted and go well beyond third party lists. 
Harmonic Adversarial Attack Method (HAAM) 
Adversarial attacks find perturbations that can fool models into misclassifying images. Previous works had successes in generating noisy/edgerich adversarial perturbations, at the cost of degradation of image quality. Such perturbations, even when they are small in scale, are usually easily spottable by human vision. In contrast, we propose Harmonic Adversarial Attack Methods (HAAM), that generates edgefree perturbations by using harmonic functions. The property of edgefree guarantees that the generated adversarial images can still preserve visual quality, even when perturbations are of large magnitudes. Experiments also show that adversaries generated by HAAM often have higher rates of success when transferring between models. In addition, we find harmonic perturbations can simulate natural phenomena like natural lighting and shadows. It would then be possible to help find corner cases for given models, as a first step to improving them. 
Harmony Search Algorithm (HSA) 
In computer science and operations research, harmony search (HS) is a phenomenonmimicking algorithm (also known as metaheuristic algorithm, soft computing algorithm or evolutionary algorithm) inspired by the improvisation process of musicians proposed by Zong Woo Geem in 2001. In the HS algorithm, each musician (= decision variable) plays (= generates) a note (= a value) for finding a best harmony (= global optimum) all together. Proponents claim the following merits: · HS does not require differential gradients, thus it can consider discontinuous functions as well as continuous functions. · HS can handle discrete variables as well as continuous variables. · HS does not require initial value setting for the variables. · HS is free from divergence. · HS may escape local optima. · HS may overcome the drawback of GA’s building block theory which works well only if the relationship among variables in a chromosome is carefully considered. If neighbor variables in a chromosome have weaker relationship than remote variables, building block theory may not work well because of crossover operation. However, HS explicitly considers the relationship using ensemble operation. · HS has a novel stochastic derivative applied to discrete variables, which uses musician’s experiences as a searching direction. · Certain HS variants do not require algorithm parameters such as HMCR and PAR, thus novice users can easily use the algorithm. Harmony Search Algorithm Harmony Search Algorithm 
HARVEST Algorithm  Feature selection with highdimensional data and a very small proportion of relevant features poses a severe challenge to standard statistical methods. We have developed a new approach (HARVEST) that is straightforward to apply, albeit somewhat computerintensive. This algorithm can be used to prescreen a large number of features to identify those that are potentially useful. The basic idea is to evaluate each feature in the context of many random subsets of other features. HARVEST is predicated on the assumption that an irrelevant feature can add no real predictive value, regardless of which other features are included in the subset. Motivated by this idea, we have derived a simple statistical test for feature relevance. Empirical analyses and simulations produced so far indicate that the HARVEST algorithm is highly effective in predictive analytics, both in science and business. 
Harvest Classification Algorithm  A tree model will often provide good prediction relative to other methods. It is also relatively interpretable, which is key, since it is of interest to identify diverse chemical classes amongst the active compounds, to serve as leads for drug optimization. Interpretability of a tree is often reduced, however, by the sheer size and number of variables involved. We develop a ‘tree harvesting’ algorithm to reduce the complexity of the tree. Harvest.Tree 
HASBRAIN  Mobile video consumption is increasing and sophisticated video quality adaptation strategies are required to deal with mobile throughput fluctuations. These adaptation strategies have to keep the switching frequency low, the average quality high and prevent stalling occurrences to ensure customer satisfaction. This paper proposes a novel methodology for the design of machine learningbased adaptation logics named HASBRAIN. Furthermore, the performance of a trained neural network against two algorithms from the literature is evaluated. We first use a modified existing optimization formulation to calculate optimal adaptation paths with a minimum number of quality switches for a wide range of videos and for challenging mobile throughput patterns. Afterwards we use the resulting optimal adaptation paths to train and compare different machine learning models. The evaluation shows that an artificial neural networkbased model can reach a high average quality with a low number of switches in the mobile scenario. The proposed methodology is general enough to be extended for further designs of machine learningbased algorithms and the provided model can be deployed in ondemand streaming scenarios or be further refined using rewardbased mechanisms such as reinforcement learning. All tools, models and datasets created during the work are provided as opensource software. 
Hash2Vec  In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words. We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications. 
HashNet  Learning to hash has been widely applied to approximate nearest neighbor search for largescale multimedia retrieval, due to its computation efficiency and retrieval quality. Deep learning to hash, which improves retrieval quality by endtoend representation learning and hash encoding, has received increasing attention recently. Subject to the vanishing gradient difficulty in the optimization with binary activations, existing deep learning to hash methods need to first learn continuous representations and then generate binary hash codes in a separated binarization step, which suffer from substantial loss of retrieval quality. This paper presents HashNet, a novel deep architecture for deep learning to hash by continuation method, which learns exactly binary hash codes from imbalanced similarity data where the number of similar pairs is much smaller than the number of dissimilar pairs. The key idea is to attack the vanishing gradient problem in optimizing deep networks with nonsmooth binary activations by continuation method, in which we begin from learning an easier network with smoothed activation function and let it evolve during the training, until it eventually goes back to being the original, difficult to optimize, deep network with the sign activation function. Comprehensive empirical evidence shows that HashNet can generate exactly binary hash codes and yield stateoftheart multimedia retrieval performance on standard benchmarks. 
Haversine Distance  The haversine formula determines the greatcircle distance between two points on a sphere given their longitudes and latitudes. Important in navigation, it is a special case of a more general formula in spherical trigonometry, the law of haversines, that relates the sides and angles of spherical triangles. The first table of haversines in English was published by James Andrew in 1805, but Florian Cajori credits an earlier use by José de Mendoza y Ríos in 1801. The term haversine was coined in 1835 by James Inman. These names follow from the fact that they are customarily written in terms of the haversine function, given by haversin( ) = sin^2(theta/2). The formulas could equally be written in terms of any multiple of the haversine, such as the older versine function (twice the haversine). Prior to the advent of computers, the elimination of division and multiplication by factors of two proved convenient enough that tables of haversine values and logarithms were included in 19th and early 20th century navigation and trigonometric texts. These days, the haversine form is also convenient in that it has no coefficient in front of the sin^2 function. 
Hawkes Graph  This paper introduces the Hawkes skeleton and the Hawkes graph. These notions summarize the branching structure of a multivariate Hawkes point process in a compact and fertile way. In particular, we explain how the graph view is useful for the specification and estimation of Hawkes models from large, multitype event streams. Based on earlier work, we give a nonparametric statistical procedure to estimate the Hawkes skeleton and the Hawkes graph from data. We show how the graph estimation may then be used for choosing and fitting parametric Hawkes models. Our method avoids the a priori assumptions on the model from a straighforward MLEapproach and it is numerically more flexible than the latter. A simulation study confirms that the presented procedure works as desired. We give special attention to computational issues in the implementation. This makes our results applicable to highdimensional eventstream data, such as dozens of event streams and thousands of events per component. 
Hazard Function  The hazard function (also known as the failure rate, hazard rate, or force of mortality) h(x) is the ratio of the probability density function P(x) to the survival function S(x), given by h(x) = P(x)/S(x) = P(x)/(1 – D(x)), where D(x) is the distribution function. 
Hazard Ratio  In survival analysis, the hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions described by two levels of an explanatory variable. For example, in a drug study, the treated population may die at twice the rate per unit time as the control population. The hazard ratio would be 2, indicating higher hazard of death from the treatment. Or in another study, men receiving the same treatment may suffer a certain complication ten times more frequently per unit time than women, giving a hazard ratio of 10. Hazard ratios differ from relative risks in that the latter are cumulative over an entire study, using a defined endpoint, while the former represent instantaneous risk over the study time period, or some subset thereof. Hazard ratios suffer somewhat less from selection bias with respect to the endpoints chosen and can indicate risks that happen before the endpoint. 
Hazelcast  Hazelcast, a leading open source inmemory data grid (IMDG) with hundreds of thousands of installed clusters and over 17 million server starts per month, launched Hazelcast Jet – a distributed processing engine for big data streams. With Hazelcast’s IMDG providing storage functionality, Hazelcast Jet is a new Apache 2 licensed open source project that performs parallel execution to enable dataintensive applications to operate in near realtime. Using directed acyclic graphs (DAG) to model relationships between individual steps in the data processing pipeline, Hazelcast Jet is simple to deploy and can execute both batch and streambased data processing applications. Hazelcast Jet is appropriate for applications that require a near realtime experience such as sensor updates in IoT architectures (house thermostats, lighting systems), instore ecommerce systems and social media platforms. 
HDIdx  Fast Nearest Neighbor (NN) search is a fundamental challenge in largescale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present ‘HDIdx’, an efficient highdimensional indexing library for fast approximate NN search, which is opensource and written in Python. It offers a family of stateoftheart algorithms that convert input highdimensional vectors into compact binary codes, making them very efficient and scalable for NN search with very low space complexity. 
Heaped Data  
Heckman Correction  The Heckman correction (the twostage method, Heckman’s lambda or the Heckit method, Heckman Model) is any of a number of related statistical methods developed by James Heckman at the University of Chicago in 1976 to 1979 which allow the researcher to correct for selection bias. Selection bias problems are endemic to applied econometric problems, which make Heckman’s original technique, and subsequent refinements by both himself and others, indispensable to applied econometricians. Heckman received the Economics Nobel Prize in 2000 for this achievement. http://…/HeckmanSelectionModel.html 
Hedonic Regression  In economics, hedonic regression or hedonic demand theory is a revealed preference method of estimating demand or value. It decomposes the item being researched into its constituent characteristics, and obtains estimates of the contributory value of each characteristic. This requires that the composite good being valued can be reduced to its constituent parts and that the market values those constituent parts. Hedonic models are most commonly estimated using regression analysis, although more generalized models, such as sales adjustment grids, are special cases of hedonic models. An attribute vector, which may be a dummy or panel variable, is assigned to each characteristic or group of characteristics. Hedonic models can accommodate nonlinearity, variable interaction, or other complex valuation situations. Hedonic models are commonly used in real estate appraisal, real estate economics, and Consumer Price Index (CPI) calculations. In CPI calculations hedonic regression is used to control the effect of changes in product quality. Price changes that are due to substitution effects are subject to hedonic quality adjustments. 
Helix  Data application developers and data scientists spend an inordinate amount of time iterating on machine learning (ML) workflows — by modifying the data preprocessing, model training, and postprocessing steps — via trialanderror to achieve the desired model performance. Existing work on accelerating machine learning focuses on speeding up oneshot execution of workflows, failing to address the incremental and dynamic nature of typical ML development. We propose Helix, a declarative machine learning system that accelerates iterative development by optimizing workflow execution endtoend and across iterations. Helix minimizes the runtime per iteration via program analysis and intelligent reuse of previous results, which are selectively materialized — trading off the cost of materialization for potential future benefits — to speed up future iterations. Additionally, Helix offers a graphical interface to visualize workflow DAGs and compare versions to facilitate iterative development. Through two ML applications, in classification and in structured prediction, attendees will experience the succinctness of Helix programming interface and the speed and ease of iterative development using Helix. In our evaluations, Helix achieved up to an order of magnitude reduction in cumulative run time compared to stateoftheart machine learning tools. 
Hellinger Distance  In probability and statistics, the Hellinger distance (also called Bhattacharyya distance as this was originally introduced by Anil Kumar Bhattacharya) is used to quantify the similarity between two probability distributions. It is a type of fdivergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.[1][2] 
Henge  We present Henge, a system to support intentbased multitenancy in modern stream processing applications. Henge supports multitenancy as a firstclass citizen: everyone inside an organization can now submit their stream processing jobs to a single, shared, consolidated cluster. Additionally, Henge allows each tenant (job) to specify its own intents (i.e., requirements) as a Service Level Objective (SLO) that captures latency and/or throughput. In a multitenant cluster, the Henge scheduler adapts continually to meet jobs’ SLOs in spite of limited cluster resources, and under dynamic input workloads. SLOs are soft and are based on utility functions. Henge continually tracks SLO satisfaction, and when jobs miss their SLOs, it wisely navigates the state space to perform resource allocations in real time, maximizing total system utility achieved by all jobs in the system. Henge is integrated in Apache Storm and we present experimental results using both production topologies and real datasets. 
HerfindahlHirschman Index  Based on the aggregated shares retained by individual firms or actors within a market or space, the HerfindahlHirschman Index (HHI) measures the level of concentration in the market or space. It is often used as a measure of competition, where 0 equals perfect competition amongst firms or actors and 10,000 equals perfect monopoly. hhi 
Hessian Approximated Multiple Subsets Iteration (HAMSI) 
We propose HAMSI, a provably convergent incremental algorithm for solving largescale partially separable optimization problems that frequently emerge in machine learning and inferential statistics. The algorithm is based on a local quadratic approximation and hence allows incorporating a second order curvature information to speedup the convergence. Furthermore, HAMSI needs almost no tuning, and it is scalable as well as easily parallelizable. In largescale simulation studies with the MovieLens datasets, we illustrate that the method is superior to a stateoftheart distributed stochastic gradient descent method in terms of convergence behavior. This performance gain comes at the expense of using memory that scales only linearly with the total size of the optimization variables. We conclude that HAMSI may be considered as a viable alternative in many scenarios, where first order methods based on variants of stochastic gradient descent are applicable. 
Heterogeneous Deep Discriminative Model (HDDM) 
This paper presents a new deep learning approach for videobased scene classification. We design a Heterogeneous Deep Discriminative Model (HDDM) whose parameters are initialized by performing an unsupervised pretraining in a layerwise fashion using Gaussian Restricted Boltzmann Machines (GRBM). In order to avoid the redundancy of adjacent frames, we extract spatiotemporal variation patterns within frames and represent them sparsely using Sparse Cubic Symmetrical Pattern (SCSP). Then, a preinitialized HDDM is separately trained using the videos of each class to learn classspecific models. According to the minimum reconstruction error from the learnt classspecific models, a weighted voting strategy is employed for the classification. The performance of the proposed method is extensively evaluated on two action recognition datasets; UCF101 and Hollywood II, and three dynamic texture and dynamic scene datasets; DynTex, YUPENN, and Maryland. The experimental results and comparisons against stateoftheart methods demonstrate that the proposed method consistently achieves superior performance on all datasets. 
Heterogeneous Incremental Nearest Class Mean Random Forest (hiRF) 
In recent years, dynamically growing data and incrementally growing number of classes pose new challenges to largescale data classification research. Most traditional methods struggle to balance the precision and computational burden when data and its number of classes increased. However, some methods are with weak precision, and the others are timeconsuming. In this paper, we propose an incremental learning method, namely, heterogeneous incremental Nearest Class Mean Random Forest (hiRF), to handle this issue. It is a heterogeneous method that either replaces trees or updates trees leaves in the random forest adaptively, to reduce the computational time in comparable performance, when data of new classes arrive. Specifically, to keep the accuracy, one proportion of trees are replaced by new NCM decision trees; to reduce the computational load, the rest trees are updated their leaves probabilities only. Most of all, outofbag estimation and outofbag boosting are proposed to balance the accuracy and the computational efficiency. Fair experiments were conducted and demonstrated its comparable precision with much less computational time. 
Heterogeneous Simultaneous Multiscale Change Point Estimator (HSMUCE) 
We propose, a heterogeneous simultaneous multiscale change point estimator called ‘HSMUCE’ for the detection of multiple change points of the signal in a heterogeneous Gaussian regression model. A piecewise constant function is estimated by minimizing the number of change points over the acceptance region of a multiscale test which locally adapts to changes in the variance. The multiscale test is a combination of local likelihood ratio tests which are properly calibrated by scaledependent critical values to keep a global nominal level a, even for finite samples. We show that HSMUCE controls the error of overestimation and underestimation of the number of change points. For this, new deviation bounds for Ftype statistics are derived. Moreover, we obtain confidence sets for the whole signal. All results are nonasymptotic and uniform over a large class of heterogeneous change point models. HSMUCE is fast to compute, achieves the optimal detection rate and estimates the number of change points at almost optimal accuracy for vanishing signals, while still being robust. We compare HSMUCE with several state of the art methods in simulations and analyse current recordings of a transmembrane protein in the bacterial outer membrane with pronounced heterogeneity for its states. An Rpackage is available on line. 
Heterogeneous UltraDense Network (HUDN) 
Machine Learning for Heterogeneous UltraDense Networks with Graphical Representations 
Heteroscedasticity  In statistics, a collection of random variables is heteroscedastic if there are subpopulations that have different variabilities from others. Here “variability” could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity. 
Hidden Factor Graph Models (HFM) 
Hidden Factor graph models generalise Hidden Markov Models to tree structured data. The distinctive feature of ‘treeHFM’ is that it learns a transition matrix for first order (sequential) and for second order (splitting) events. It can be applied to all discrete and continuous data that is structured as a binary tree. In the case of continuous observations, ‘treeHFM’ has Gaussian distributions as emissions. treeHFM 
Hidden Markov Model (HMM) 
Hidden Markov Models (HMMs) are powerful, flexible methods for representing and classifying data with trends over time. A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. A HMM can be considered the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E. Baum and coworkers. It is closely related to an earlier work on optimal nonlinear filtering problem (stochastic processes) by Ruslan L. Stratonovich, who was the first to describe the forwardbackward procedure. In simpler Markov models (like a Markov chain), the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. Note that the adjective ‘hidden’ refers to the state sequence through which the model passes, not to the parameters of the model; the model is still referred to as a ‘hidden’ Markov model even if these parameters are known exactly. Hidden Markov models are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, partofspeech tagging, musical score following, partial discharges and bioinformatics. 
Hidden Tree Markov Network (HTN) 
The paper introduces the Hidden Tree Markov Network (HTN), a neuroprobabilistic hybrid fusing the representation power of generative models for trees with the incremental and discriminative learning capabilities of neural networks. We put forward a modular architecture in which multiple generative models of limited complexity are trained to learn structural feature detectors whose outputs are then combined and integrated by neural layers at a later stage. In this respect, the model is both deep, thanks to the unfolding of the generative models on the input structures, as well as wide, given the potentially large number of generative modules that can be trained in parallel. Experimental results show that the proposed approach can outperform stateoftheart syntactic kernels as well as generative kernels built on the same probabilistic model as the HTN. 
HiddenLayer LSTM (HLSTM) 
Long shortterm memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases runtime delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hiddenlayer LSTM (HLSTM) that adds hidden layers to LSTM’s original one level nonlinear control gates. HLSTM increases accuracy while employing fewer external stacked layers, thus reducing the number of parameters and runtime latency significantly. We employ growandprune (GP) training to iteratively adjust the hidden layers through gradientbased growth and magnitudebased pruning of connections. This learns both the weights and the compact architecture of HLSTM control gates. We have GPtrained HLSTMs for image captioning and speech recognition applications. For the NeuralTalk architecture on the MSCOCO dataset, our three models reduce the number of parameters by 38.7x [floatingpoint operations (FLOPs) by 45.5x], runtime latency by 4.5x, and improve the CIDEr score by 2.6. For the DeepSpeech2 architecture on the AN4 dataset, our two models reduce the number of parameters by 19.4x (FLOPs by 23.5x), runtime latency by 15.7%, and the word error rate from 12.9% to 8.7%. Thus, GPtrained HLSTMs can be seen to be compact, fast, and accurate. 
Hierarchical Attention Mechanism (Ham) 
Attention mechanisms in sequence to sequence models have shown great ability and wonderful performance in various natural language processing (NLP) tasks, such as sentence embedding, text generation, machine translation, machine reading comprehension, etc. Unfortunately, existing attention mechanisms only learn either highlevel or lowlevel features. In this paper, we think that the lack of hierarchical mechanisms is a bottleneck in improving the performance of the attention mechanisms, and propose a novel Hierarchical Attention Mechanism (Ham) based on the weighted sum of different layers of a multilevel attention. Ham achieves a stateoftheart BLEU score of 0.26 on Chinese poem generation task and a nearly 6.5% averaged improvement compared with the existing machine reading comprehension models such as BIDAF and MatchLSTM. Furthermore, our experiments and theorems reveal that Ham has greater generalization and representation ability than existing attention mechanisms. 
Hierarchical Attention Network (HAN) 
We propose a hierarchical attention network for document classification. Our model has two distinctive characteristics: (i) it has a hierarchical structure that mirrors the hierarchical structure of documents; (ii) it has two levels of attention mechanisms applied at the wordand sentencelevel, enabling it to attend differentially to more and less important content when constructing the document representation. Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin. Visualization of the attention layers illustrates that the model selects qualitatively informative words and sentences. 
Hierarchical AttentionBased Recurrent Highway Network (HRHN) 
Time series prediction has been studied in a variety of domains. However, it is still challenging to predict future series given historical observations and past exogenous data. Existing methods either fail to consider the interactions among different components of exogenous variables which may affect the prediction accuracy, or cannot model the correlations between exogenous data and target data. Besides, the inherent temporal dynamics of exogenous data are also related to the target series prediction, and thus should be considered as well. To address these issues, we propose an endtoend deep learning model, i.e., Hierarchical attentionbased Recurrent Highway Network (HRHN), which incorporates spatiotemporal feature extraction of exogenous variables and temporal dynamics modeling of target variables into a single framework. Moreover, by introducing the hierarchical attention mechanism, HRHN can adaptively select the relevant exogenous features in different semantic levels. We carry out comprehensive empirical evaluations with various methods over several datasets, and show that HRHN outperforms the state of the arts in time series prediction, especially in capturing sudden changes and sudden oscillations of time series. 
Hierarchical Block Sparse Neural Network (HBsNN) 
Sparse deep neural networks(DNNs) are efficient in both memory and compute when compared to dense DNNs. But due to irregularity in computation of sparse DNNs, their efficiencies are much lower than that of dense DNNs on general purpose hardwares. This leads to poor/no performance benefits for sparse DNNs. Performance issue for sparse DNNs can be alleviated by bringing structure to the sparsity and leveraging it for improving runtime efficiency. But such structural constraints often lead to sparse models with suboptimal accuracies. In this work, we jointly address both accuracy and performance of sparse DNNs using our proposed class of neural networks called HBsNN ( Hierarchical Block Sparse Neural Networks). 
Hierarchical Clustering  In data mining, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types: 1. Agglomerative: This is a “bottom up” approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. 2. Divisive: This is a “top down” approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram. 
Hierarchical Clustering and Topic Modeling based on Fast Rank2 NMF (HierNMF2) 
The importance of unsupervised clustering and topic modeling is well recognized with everincreasing volumes of text data. In this paper, we propose a fast method for hierarchical clustering and topic modeling called HierNMF2. Our method is based on fast Rank2 nonnegative matrix factorization (NMF) that performs binary clustering and an efficient node splitting rule. Further utilizing the final leaf nodes generated in HierNMF2 and the idea of nonnegative least squares fitting, we propose a new clustering/topic modeling method called FlatNMF2 that recovers a flat clustering/topic modeling result in a very simple yet significantly more effective way than any other existing methods. We describe highly optimized open source software in C++ for both HierNMF2 and FlatNMF2 for hierarchical and partitional clustering/topic modeling of document data sets. Substantial experimental tests are presented that illustrate significant improvements both in computational time as well as quality of solutions. We compare our methods to other clustering methods including Kmeans, standard NMF, and CLUTO, and also topic modeling methods including latent Dirichlet allocation (LDA) and recently proposed algorithms for NMF with separability constraints. Overall, we present efficient tools for analyzing largescale data sets, and techniques that can be generalized to many other data analytics problem domains. 
Hierarchical Compartmental Model  A variety of trianglebased stochastic reserving techniques have been proposed for estimating future general insurance claims payments, ranging from generalized linear models (England and Verrall, 2002) to nonlinear hierarchical models (Guszcza, 2008). Methods incorporating both paid and incurred information have been explored (MartínezMiranda, Nielsen and Verrall, 2012; Quarg and Mack, 2004), which provide richer inference and improved interpretability. Furthermore, Bayesian methods (Zhang, Dukic and Guszcza, 2012; Meyers, 2007; England and Verrall, 2005; Verrall, 2004) have become increasingly ubiquitous; providing flexibility and the ability to robustly incorporate judgment into uncertainty projections. This paper explores a new trianglebased (and optionallyBayesian) stochastic reserving framework which considers the relationship between exposure, case reserves and paid claims. By doing so, it enables practitioners to build communicable models that are consistent with their understanding of the insurance claims process. Furthermore, it supports the identification and quantification of claims process characteristics to provide tangible business insights. Hierarchical compartmental reserving models 
Hierarchical Compositional Network (HCN) 
We introduce the hierarchical compositional network (HCN), a directed generative model able to discover and disentangle, without supervision, the building blocks of a set of binary images. The building blocks are binary features defined hierarchically as a composition of some of the features in the layer immediately below, arranged in a particular manner. At a high level, HCN is similar to a sigmoid belief network with pooling. Inference and learning in HCN are very challenging and existing variational approximations do not work satisfactorily. A main contribution of this work is to show that both can be addressed using maxproduct message passing (MPMP) with a particular schedule (no EM required). Also, using MPMP as an inference engine for HCN makes new tasks simple: adding supervision information, classifying images, or performing inpainting all correspond to clamping some variables of the model to their known values and running MPMP on the rest. When used for classification, fast inference with HCN has exactly the same functional form as a convolutional neural network (CNN) with linear activations and binary weights. However, HCN’s features are qualitatively very different. 
Hierarchical Configuration Model  We introduce a class of random graphs with a hierarchical community structure, which we call the hierarchical configuration model. On the intercommunity level, the graph is a configuration model, and on the intracommunity level, every vertex in the configuration model is replaced by a community: a small graph. These communities may have any shape, as long as they are connected. For these hierarchical graphs, we find the size of the largest component, the degree distribution and the clustering coefficient. Furthermore, we determine the conditions under which a giant percolation cluster exists, and find its size. 
Hierarchical Data Format (HDF) 
Hierarchical Data Format (HDF, HDF4, or HDF5) is a set of file formats and libraries designed to store and organize large amounts of numerical data. Originally developed at the National Center for Supercomputing Applications, it is supported by the nonprofit HDF Group, whose mission is to ensure continued development of HDF5 technologies, and the continued accessibility of data stored in HDF. 
Hierarchical Deep Learning for Text Classification (HDLTex) 
The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of these traditional classifiers has degraded as the number of documents has increased. This is because along with this growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multiclass classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy. 
Hierarchical Incremental GRAdient Descent (HiGrad) 
Hierarchical Incremental GRAdient Descent (HiGrad) algorithm, a firstorder algorithm for finding the minimizer of a function in online learning just like stochastic gradient descent (SGD). See Su and Zhu (2018) <arXiv:1802.04876> for details. higrad 
Hierarchical Inference Testing (HIT) 
hit 
Hierarchical Kernel Learning (HKL) 
http://…/jawanpuria15a.pdf 
Hierarchical Latent Dirichlet Allocation (HLDA, HLDA) 
An extension to LDA is the hierarchical LDA (hLDA), where topics are joined together in a hierarchy by using the nested Chinese restaurant process. http://…/automatictopicmodellingwithlda 
Hierarchical Latent Space Network Model (HLSM) 
HLSM 
Hierarchical Latent Tree Analysis (HLTA) 
In the LDA approach to topic detection, a topic is determined by identifying the words that are used with high frequency when writing about the topic. However, high frequency words in one topic may be also used with high frequency in other topics. Thus they may not be the best words to characterize the topic. In this paper, we propose a new method for topic detection, where a topic is determined by identifying words that appear with high frequency in the topic and low frequency in other topics. We model patterns of word cooccurrence and cooccurrences of those patterns using a hierarchy of discrete latent variables. The states of the latent variables represent clusters of documents and they are interpreted as topics. The words that best distinguish a cluster from other clusters are selected to characterize the topic. Empirical results show that the new method yields topics with clearer thematic characterizations than the alternative approaches. In this work, we introduce semantically higher level latent variables to model cooccurrence of those patterns, resulting in hierarchical latent tree models (HLTMs). The latent variables at higher levels of the hierarchy correspond to more general topics, while the latent variables at lower levels correspond to more specific topics. The proposed method for topic detection is therefore called hierarchical latent tree analysis (HLTA). 
Hierarchical Latent Tree Model (HLTM) 

Hierarchical Mode Association Clustering / Mode Association Clustering (HMAC, MAC) 
Mode association clustering (MAC) can be conducted either hierarchically or at one level. MAC is similar to mixture model based clustering in the sense of characterizing clusters by smooth densities. However, MAC requires no model fitting and uses a nonparametric kernel density estimation. The density of a cluster is not restricted to be parametric, for instance, Gaussian, but ensures unimodality. The algorithm seems to combine the complementary merits of bottomup clustering such as linkage and topdown clustering such as mixture modeling and kmeans. It also tends to be robust against nonGaussian shaped clusters. 
Hierarchical Model  There isn’t a single authorative definition of a hierarchical model. Click for an overview. 
Hierarchical Multinomial Marginal Models (HMM) 
In the loglinear parametrization all the interactions are contrasts of logarithms of joint probabilities and this is the main reason why this parametrization is not convenient to express hypotheses on marginal distributions or to model ordered categorical data. On the contrary Hierarchical Multinomial Marginal models (HMM) (Bartolucci et al. 2007) are based on parameters, called generalized marginal interactions, which are contrasts of logarithms of sums of probabilities. HMM models allow great flexibility in choosing the marginal distributions, within which the interactions are defined, and they are a useful tool for modeling marginal distributions and for taking into proper account the presence of ordinal categorical variables. hmmm 
Hierarchical Multiscale LSTM  Hierarchical Multiscale LSTM (Chung et al., 2016a) is a stateoftheart language model that learns interpretable structure from characterlevel input. 
Hierarchical Navigation Reinforcement Network (HNRN) 
This paper proposes a navigation algorithm oriented to multiagent dynamic environment. The algorithm is expressed as a hierarchical framework which contains a Hidden Markov Model (HMM) and Deep Reinforcement Learning (DRL). For simplification, we term our method Hierarchical Navigation Reinforcement Network (HNRN). In highlevel architecture, we train an HMM to evaluate agents environment in order to obtain a score. According to this score, adaptive control action will be chosen. While in lowlevel architecture, two subsystems are introduced, one is a differential targetdriven system, which aims at heading to the target, the other is collision avoidance DRL system, which is used for avoiding obstacles in the dynamic environment. The advantage of this hierarchical system is to decouple the targetdriven and collision avoidance tasks, leading to a faster and easier model to be trained. As the experiments manifest, our algorithm has faster learning efficiency and a higher success rate than traditional Velocity Obstacle (VO) algorithms and hybrid DRL method. 
Hierarchical Nearest Neighbor Descent (HNND) 

Hierarchical Network  A hierarchical network is the type of network topology in which a central “root” node (the top level of the hierarchy) is connected to one or more other nodes that are one level lower in the hierarchy (i.e., the second level) with a pointtopoint link between each of the second level nodes and the top level central “root” node, while each of the second level nodes that are connected to the top level central “root” node will also have one or more other nodes that are one level lower in the hierarchy (i.e., the third level) connected to it, also with a pointtopoint link, the top level central “root” node being the only node that has no other node above it in the hierarchy. 
Hierarchical Network Model (HNM) 
Hierarchical network models are iterative algorithms for creating networks which are able to reproduce the unique properties of the scalefree topology and the high clustering of the nodes at the same time. These characteristics are widely observed in nature, from biology to language to some social networks. 
Hierarchical Reinforcement Learning (HRL) 

Hierarchical Semantic Embedding (HSE) 
Object categories inherently form a hierarchy with different levels of concept abstraction, especially for finegrained categories. For example, birds (Aves) can be categorized according to a fourlevel hierarchy of order, family, genus, and species. This hierarchy encodes rich correlations among various categories across different levels, which can effectively regularize the semantic space and thus make prediction less ambiguous. However, previous studies of finegrained image recognition primarily focus on categories of one certain level and usually overlook this correlation information. In this work, we investigate simultaneously predicting categories of different levels in the hierarchy and integrating this structured correlation information into the deep neural network by developing a novel Hierarchical Semantic Embedding (HSE) framework. Specifically, the HSE framework sequentially predicts the category score vector of each level in the hierarchy, from highest to lowest. At each level, it incorporates the predicted score vector of the higher level as prior knowledge to learn finergrained feature representation. During training, the predicted score vector of the higher level is also employed to regularize label prediction by using it as soft targets of corresponding subcategories. To evaluate the proposed framework, we organize the 200 bird species of the CaltechUCSD birds dataset with the fourlevel category hierarchy and construct a largescale butterfly dataset that also covers four level categories. Extensive experiments on these two and the newlyreleased VegFru datasets demonstrate the superiority of our HSE framework over the baseline methods and existing competitors. 
Hierarchical Spectral Merger (HSM) 
We present a new method for time series clustering which we call the Hierarchical Spectral Merger (HSM) method. This procedure is based on the spectral theory of time series and identifies series that share similar oscillations or waveforms. The extent of similarity between a pair of time series is measured using the total variation distance between their estimated spectral densities. At each step of the algorithm, every time two clusters merge, a new spectral density is estimated using the whole information present in both clusters, which is representative of all the series in the new cluster. The method is implemented in an R package HSMClust. We present two applications of the HSM method, one to data coming from waveheight measurements in oceanography and the other to electroencefalogram (EEG) data. 
Hierarchical Stochastic Clustering (HSC) 
Hierarchical clustering is one of the most powerful solutions to the problem of clustering, on the grounds that it performs a multi scale organization of the data. In recent years, research on hierarchical clustering methods has attracted considerable interest due to the demanding modern application domains. We present a novel divisive hierarchical clustering framework called Hierarchical Stochastic Clustering (HSC), that acts in two stages. In the first stage, it finds a primary hierarchy of clustering partitions in a dataset. In the second stage, feeds a clustering algorithm with each one of the clusters of the very detailed partition, in order to settle the final result. The output is a hierarchy of clusters. Our method is based on the previous research of Meyer and Weissel Stochastic Data Clustering and the theory of Simon and Ando on Variable Aggregation. Our experiments show that our framework builds a meaningful hierarchy of clusters and benefits consistently the clustering algorithm that acts in the second stage, not only computationally but also in terms of cluster quality. This result suggest that HSC framework is ideal for obtaining hierarchical solutions of large volumes of data. 
Hierarchical Temporal Memory (HTM) 
Hierarchical temporal memory (HTM) is a biologically constrained theory of machine intelligence originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee. HTM is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the human brain. The technology has been tested and implemented in software through example applications from Numenta and commercial applications from Numenta’s partners. At the core of HTM are learning algorithms that can store, learn, infer and recall highorder sequences. Unlike most other machine learning methods, HTM learns timebased patterns in unlabeled data on a continuous basis. HTM is robust to noise and high capacity, meaning that it can learn multiple patterns simultaneously. When applied to computers, HTM is well suited for prediction, anomaly detection, classification and ultimately sensorimotor applications. 
Hierarchical Time Series / Grouped Time Series (HTS) 
Time series can often be naturally disaggregated in a hierarchical structure using attributes such as geographical location, product type, etc. For example, the total number of bicycles sold by a cycling warehouse can be disaggregated into a hierarchy of bicycle types. Such a warehouse will sell road bikes, mountain bikes, children bikes or hybrids. Each of these can be disaggregated into finer categories. Children’s bikes can be divided into balance bikes for children under 4 years old, single speed bikes for children between 4 and 6 and bikes for children over the age of 6. Hybrid bikes can be divided into city, commuting, comfort, and trekking bikes; and so on. Such disaggregation imposes a hierarchical structure. We refer to these as hierarchical time series. hts,gtop 
Hierarchical Topic Models  
Hierarchically Supervised Latent Dirichlet Allocation (HSLDA) 
We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bagofword data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and freetext clinical records and their assigned diagnosis codes. Outofsample label prediction is the primary goal of this work, but improved lowerdimensional representations of the bagofword data are also of interest. We demonstrate HSLDA on largescale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves outofsample label prediction substantially when compared to models that do not. 
High Dimensional Data Clustering (HDDC) 
Clustering in highdimensional spaces is a recurrent problem in many domains, for example in object recognition. Highdimensional data usually live in different lowdimensional subspaces hidden in the original space. HDDC is a clustering approach which estimates the specific subspace and the intrinsic dimension of each class. The approach adapts the Gaussian mixture model framework to highdimensional data and estimates the parameters which best fit the data. This results in a robust clustering method called High Dimensional Data Clustering (HDDC). HDDC is applied to locate objects in natural images in a probabilistic framework. Experiments on a recently proposed database demonstrate the effectiveness of our clustering method for category localization. 
High Frequency Trading (HFT) 
Highfrequency trading (HFT) is a primary form of algorithmic trading in finance. Specifically, it is the use of sophisticated technological tools and computer algorithms to rapidly trade securities. HFT uses proprietary trading strategies carried out by computers to move in and out of positions in seconds or fractions of a second. It is estimated that as of 2009, HFT accounted for 6073% of all US equity trading volume, with that number falling to approximately 50% in 2012. Highfrequency traders move in and out of shortterm positions at high volumes aiming to capture sometimes a fraction of a cent in profit on every trade. HFT firms do not consume significant amounts of capital, accumulate positions or hold their portfolios overnight. As a result, HFT has a potential Sharpe ratio (a measure of risk and reward) tens of times higher than traditional buyandhold strategies. Highfrequency traders typically compete against other HFTs, rather than longterm investors. HFT firms make up the low margins with incredible high volumes of tradings, frequently numbering in the millions. It has been argued that a core incentive in much of the technological development behind highfrequency trading is essentially front running, in which the varying delays in the propagation of orders is taken advantage of by those who have earlier access to information. A substantial body of research argues that HFT and electronic trading pose new types of challenges to the financial system. Algorithmic and highfrequency traders were both found to have contributed to volatility in the May 6, 2010 Flash Crash, when highfrequency liquidity providers rapidly withdrew from the market. Several European countries have proposed curtailing or banning HFT due to concerns about volatility. Other complaints against HFT include the argument that some HFT firms scrape profits from investors when index funds rebalance their portfolios. Other financial analysts point to evidence of benefits that HFT has brought to the modern markets. Researchers have stated that HFT and automated markets improve market liquidity, reduce trading costs, and make stock prices more efficient. 
High Performance Analytics Toolkit (HPAT) 
Big data analytics requires high programmer productivity and high performance simultaneously on largescale clusters. However, current big data analytics frameworks (e.g. Apache Spark) have high runtime overheads since they are librarybased. Given the characteristics of the data analytics domain, we introduce the High Performance Analytics Toolkit (HPAT), which is a big data analytics framework that performs static compilation of highlevel scripting programs into high performance parallel code using novel domainspecific compilation techniques. HPAT provides scripting abstractions in the Julia language for analytics tasks, automatically parallelizes them, generates efficient MPI/C++ code, and provides resiliency. Since HPAT is compilerbased, it avoids overheads of librarybased systems such as dynamic task scheduling and masterexecutor coordination. In addition, it provides automatic optimizations for scripting programs, such as fusion of array operations. Therefore, HPAT is 14x to 400x faster than Spark on the Cori supercomputer at LBL/NERSC. Furthermore, HPAT is much more flexible in distributed data structures, which enables the use of existing libraries such as HDF5, ScaLAPACK, and Intel R DAAL. 
High Performance Computing (HPC) 
High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business. A supercomputer is a computer with a very highlevel computational capacity. As of 2015, there are supercomputers which could perform upto quadrillions of floating point operations per second. http://…/Supercomputer 
High Quality Bidirectional Generative Adversarial Network  Generative adversarial networks (GANs) have achieved outstanding success in generating the high quality data. Focusing on the generation process, existing GANs investigate unidirectional mapping from the latent vector to the data. Later, various studies point out that the latent space of GANs is semantically meaningful and can be utilized in advanced data analysis and manipulation. In order to analyze the real data in the latent space of GANs, it is necessary to investigate the inverse generation mapping from the data to the latent vector. To tackle this problem, the bidirectional generative models introduce an encoder to enable the inverse path of generation process. Unfortunately, this effort leads to the degradation of generation quality because the imperfect generator rather interferes the encoder training and vice versa. In this paper, we propose a new inference model that estimates the latent vector from the feature of GAN discriminator. While existing bidirectional models learns the image to latent translation, our algorithm formulates this inference mapping by the feature to latent translation. It is important to note that training of our model is independent of the GAN training. Owing to the attractive nature of this independency, the proposed algorithm can generate the high quality samples identical to those of unidirectional GANs and also reconstruct the original data faithfully. Moreover, our algorithm can be employed to any unidirectional GAN, even the pretraind GANs. 
Highcharts  Highcharts is a charting library written in pure JavaScript, offering an easy way of adding interactive charts to your web site or web application. Highcharts currently supports line, spline, area, areaspline, column, bar, pie, scatter, angular gauges, arearange, areasplinerange, columnrange, bubble, box plot, error bars, funnel, waterfall and polar chart types. 
Higher Order Propagation Framework (HOPF) 
Given a graph wherein every node has certain attributes associated with it and some nodes have labels associated with them, Collective Classification (CC) is the task of assigning labels to every unlabeled node using information from the node as well as its neighbors. It is often the case that a node is not only influenced by its immediate neighbors but also by its higher order neighbors, multiple hops away. Recent stateoftheart models for CC use differentiable variations of WeisfeilerLehman kernels to aggregate multihop neighborhood information. However, in this work, we show that these models suffer from the problem of Node Information Morphing wherein the information of the node is morphed or overwhelmed by the information of its neighbors when considering multiple hops. Further, existing models are not scalable as the memory and computation needs grow exponentially with the number of hops considered. To circumvent these problems, we propose a generic Higher Order Propagation Framework (HOPF) which includes (i) a differentiable Node Information Preserving (NIP) kernel and (ii) a scalable iterative learning and inferencing mechanism to aggregate information over larger hops. We do an extensive evaluation using 11 datasets from different domains and show that unlike existing CC models, our NIP model with iterative inference is robust across all the datasets and can handle much larger neighborhoods in a scalable manner. 
HigherOrder Generalized Singular Value Decomposition  hogsvdR 
Highest Density Regions (HDR) 

Highest Posterior Density (HPD) 
Highest Posterior Density – The x% highest posterior density interval is the shortest interval in parameter space that contains x% of the posterior probability. 
Highly Efficient Network (HENet) 
In order to enhance the realtime performance of convolutional neural networks(CNNs), more and more researchers are focusing on improving the efficiency of CNN. Based on the analysis of some CNN architectures, such as ResNet, DenseNet, ShuffleNet and so on, we combined their advantages and proposed a very efficient model called Highly Efficient Networks(HENet). The new architecture uses an unusual way to combine group convolution and channel shuffle which was mentioned in ShuffleNet. Inspired by ResNet and DenseNet, we also proposed a new way to use elementwise addition and concatenation connection with each block. In order to make greater use of feature maps, pooling operations are removed from HENet. The experiments show that our model’s efficiency is more than 1 times higher than ShuffleNet on many open source datasets, such as CIFAR10/100 and SVHN. 
HighResolution Deep Convolutional Generative Adversarial Network (HRDCGAN) 
Generative Adversarial Networks (GANs) convergence in a highresolution setting with a computational constrain of GPU memory capacity (from 12GB to 24 GB) has been beset with difficulty due to the known lack of convergence rate stability. In order to boost network convergence of DCGAN (Deep Convolutional Generative Adversarial Networks) and achieve goodlooking highresolution results we propose a new layered network structure, HRDCGAN, that incorporates current stateoftheart techniques for this effect. 
HilbertSchmidt Independence Criterion (HSIC) 
‘Dependency Bottleneck’ in Autoencoding Architectures: an Empirical Study 
Hill Climbing  In computer science, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by incrementally changing a single element of the solution. If the change produces a better solution, an incremental change is made to the new solution, repeating until no further improvements can be found. For example, hill climbing can be applied to the travelling salesman problem. It is easy to find an initial solution that visits all the cities but will be very poor compared to the optimal solution. The algorithm starts with such a solution and makes small improvements to it, such as switching the order in which two cities are visited. Eventually, a much shorter route is likely to be obtained. Hill climbing is good for finding a local optimum (a solution that cannot be improved by considering a neighbouring configuration) but it is not necessarily guaranteed to find the best possible solution (the global optimum) out of all possible solutions (the search space). In convex problems, hillclimbing is optimal. Examples of algorithms that solve convex problems by hillclimbing include the simplex algorithm for linear programming and binary search. The characteristic that only local optima are guaranteed can be cured by using restarts (repeated local search), or more complex schemes based on iterations, like iterated local search, on memory, like reactive search optimization and tabu search, or memoryless stochastic modifications, like simulated annealing. The relative simplicity of the algorithm makes it a popular first choice amongst optimizing algorithms. It is used widely in artificial intelligence, for reaching a goal state from a starting node. Choice of next node and starting node can be varied to give a list of related algorithms. Although more advanced algorithms such as simulated annealing or tabu search may give better results, in some situations hill climbing works just as well. Hill climbing can often produce a better result than other algorithms when the amount of time available to perform a search is limited, such as with realtime systems. It is an anytime algorithm: it can return a valid solution even if it’s interrupted at any time before it ends. 
Hindcasting  In oceanography and meteorology, backtesting is also known as hindcasting: a hindcast is a way of testing a mathematical model; known or closely estimated inputs for past events are entered into the model to see how well the output matches the known results. Hindcasting usually refers to a numerical model integration of a historical period where no observations have been assimilated. This distinguishes a hindcast run from a reanalysis. Oceanographic observations of salinity and temperature as well as observations of surface wave parameters such as the significant wave height are much scarcer than meteorological observations, making hindcasting more common in oceanography than in meteorology. Also, since surface waves represent a forced system where the wind is the only generating force, wave hindcasting is often considered adequate for generating a reasonable representation of the wave climate with little need for a full reanalysis. Hindcasting is also used in hydrology for model stream flows. 
HIRO  Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks. Yet, the majority of current HRL methods require careful taskspecific design and onpolicy training, making them difficult to apply in realworld scenarios. In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for realworld problems such as robotic control. For generality, we develop a scheme where lowerlevel controllers are supervised with goals that are learned and proposed automatically by the higherlevel controllers. To address efficiency, we propose to use offpolicy experience for both higher and lowerlevel training. This poses a considerable challenge, since changes to the lowerlevel behaviors change the action space for the higherlevel policy, and we introduce an offpolicy correction to remedy this challenge. This allows us to take advantage of recent advances in offpolicy modelfree RL to learn both higher and lowerlevel policies using substantially fewer environment interactions than onpolicy algorithms. We term the resulting HRL agent HIRO and find that it is generally applicable and highly sampleefficient. Our experiments show that HIRO can be used to learn highly complex behaviors for simulated robots, such as pushing objects and utilizing them to reach target locations, learning from only a few million samples, equivalent to a few days of realtime interaction. In comparisons with a number of prior HRL methods, we find that our approach substantially outperforms previous stateoftheart techniques. 
Histogram of Oriented Gradients (HOG) 
Histogram of Oriented Gradients (HOG) are feature descriptors used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scaleinvariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy. Navneet Dalal and Bill Triggs, researchers for the French National Institute for Research in Computer Science and Control (INRIA), first described Histogram of Oriented Gradient descriptors in their June 2005 CVPR paper. In this work they focused their algorithm on the problem of pedestrian detection in static images, although since then they expanded their tests to include human detection in film and video, as well as to a variety of common animals and vehicles in static imagery. 
History PCA  In this paper we propose a new algorithm for streaming principal component analysis. With limited memory, small devices cannot store all the samples in the highdimensional regime. Streaming principal component analysis aims to find the $k$dimensional subspace which can explain the most variation of the $d$dimensional data points that come into memory sequentially. In order to deal with large $d$ and large $N$ (number of samples), most streaming PCA algorithms update the current model using only the incoming sample and then dump the information right away to save memory. However the information contained in previously streamed data could be useful. Motivated by this idea, we develop a new streaming PCA algorithm called History PCA that achieves this goal. By using $O(Bd)$ memory with $B\approx 10$ being the block size, our algorithm converges much faster than existing streaming PCA algorithms. By changing the number of inner iterations, the memory usage can be further reduced to $O(d)$ while maintaining a comparable convergence speed. We provide theoretical guarantees for the convergence of our algorithm along with the rate of convergence. We also demonstrate on synthetic and real world data sets that our algorithm compares favorably with other stateoftheart streaming PCA methods in terms of the convergence speed and performance. 
HiTMVAE  This work focuses on combining nonparametric topic models with AutoEncoding Variational Bayes (AEVB). Specifically, we first propose iTMVAE, where the topics are treated as trainable parameters and the documentspecific topic proportions are obtained by a stickbreaking construction. The inference of iTMVAE is modeled by neural networks such that it can be computed in a simple feedforward manner. We also describe how to introduce a hyperprior into iTMVAE so as to model the uncertainty of the prior parameter. Actually, the hyperprior technique is quite general and we show that it can be applied to other AEVB based models to alleviate the {\it collapsetoprior} problem elegantly. Moreover, we also propose HiTMVAE, where the documentspecific topic distributions are generated in a hierarchical manner. HiTMVAE is even more flexible and can generate topic distributions with better variability. Experimental results on 20News and Reuters RCV1V2 datasets show that the proposed models outperform the stateoftheart baselines significantly. The advantages of the hyperprior technique and the hierarchical model construction are also confirmed by experiments. 
HitNet  Neural networks designed for the task of classification have become a commodity in recent years. Many works target the development of better networks, which results in a complexification of their architectures with more layers, multiple subnetworks, or even the combination of multiple classifiers. In this paper, we show how to redesign a simple network to reach excellent performances, which are better than the results reproduced with CapsNet on several datasets, by replacing a layer with a HitorMiss layer. This layer contains activated vectors, called capsules, that we train to hit or miss a central capsule by tailoring a specific centripetal loss function. We also show how our network, named HitNet, is capable of synthesizing a representative sample of the images of a given class by including a reconstruction network. This possibility allows to develop a data augmentation step combining information from the data space and the feature space, resulting in a hybrid data augmentation process. In addition, we introduce the possibility for HitNet, to adopt an alternative to the true target when needed by using the new concept of ghost capsules, which is used here to detect potentially mislabeled images in the training data. 
Hitting Time  In the study of stochastic processes in mathematics, a hitting time (or first hit time) is the first time at which a given process “hits” a given subset of the state space. Exit times and return times are also examples of hitting times. 
HIVAE  Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate to capture the latent structure of vast amounts of complex highdimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in realworld applications. In this paper, we propose a general framework to design VAEs, suitable for fitting incomplete heterogenous data. The proposed HIVAE includes likelihood models for realvalued, positive real valued, interval, categorical, ordinal and count data, and allows to estimate (and potentially impute) missing data accurately. Furthermore, HIVAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data. 
Hive Plot  The hive plot is a rational visualization method for drawing networks. Nodes are mapped to and positioned on radially distributed linear axes – this mapping is based on network structural properties. Edges are drawn as curved links. Simple and interpretable. The purpose of the hive plot is to establish a new baseline for visualization of large networks – a method that is both general and tunable and useful as a starting point in visually exploring network structure. 
HodrickPrescott Filter (HP Filter) 
The HodrickPrescott filter (also known as HodrickPrescott decomposition) is a mathematical tool used in macroeconomics, especially in real business cycle theory, to remove the cyclical component of a time series from raw data. It is used to obtain a smoothedcurve representation of a time series, one that is more sensitive to longterm than to shortterm fluctuations. The adjustment of the sensitivity of the trend to shortterm fluctuations is achieved by modifying a multiplier \lambda. The filter was popularized in the field of economics in the 1990s by economists Robert J. Hodrick and Nobel Memorial Prize winner Edward C. Prescott. However, it was first proposed much earlier by E. T. Whittaker in 1923. The HP Filter and Unit Roots 
Hoeffding Anytime Tree  We introduce a novel incremental decision tree learning algorithm, Hoeffding Anytime Tree, that is statistically more efficient than the current stateoftheart, Hoeffding Tree. We demonstrate that an implementation of Hoeffding Anytime Tree—‘Extremely Fast Decision Tree’, a minor modification to the MOA implementation of Hoeffding Tree—obtains significantly superior prequential accuracy on most of the largest classification datasets from the UCI repository. Hoeffding Anytime Tree produces the asymptotic batch tree in the limit, is naturally resilient to concept drift, and can be used as a higher accuracy replacement for Hoeffding Tree in most scenarios, at a small additional computational cost. 
Hoeffding Tree (VFDT) 
A Hoeffding tree (VFDT) is an incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time. Hoeffding trees exploit the fact that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound, which quantifies the number of observations (in our case, examples) needed to estimate some statistics within a prescribed precision (in our case, the goodness of an attribute). A theoretically appealing feature of Hoeffding Trees not shared by otherincremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a nonincremental learner using infinitely many examples. For more information see: Geoff Hulten, Laurie Spencer, Pedro Domingos: Mining timechanging data streams. In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 97106, 2001. 
Hogwild!  Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateoftheart performance on a variety of machine learning tasks. Several researchers have recently pro posed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and im plementation that SGD can be implemented without any locking. We present an update scheme called Hogwild! which allows processors access to shared memory with the possibility of over writing each other’s work. 
Hollow Heap  We introduce the hollow heap, a very simple data structure with the same amortized efficiency as the classical Fibonacci heap. All heap operations except delete and deletemin take $O(1)$ time, worst case as well as amortized; delete and deletemin take $O(\log n)$ amortized time on a heap of $n$ items. Hollow heaps are by far the simplest structure to achieve this. Hollow heaps combine two novel ideas: the use of lazy deletion and reinsertion to do decreasekey operations, and the use of a dag (directed acyclic graph) instead of a tree or set of trees to represent a heap. Lazy deletion produces hollow nodes (nodes without items), giving the data structure its name. 
Holographic Neural Architecture (HNA) 
Representation learning is at the heart of what makes deep learning effective. In this work, we introduce a new framework for representation learning that we call ‘Holographic Neural Architectures’ (HNAs). In the same way that an observer can experience the 3D structure of a holographed object by looking at its hologram from several angles, HNAs derive Holographic Representations from the training set. These representations can then be explored by moving along a continuous bounded single dimension. We show that HNAs can be used to make generative networks, stateoftheart regression models and that they are inherently highly resistant to noise. Finally, we argue that because of their denoising abilities and their capacity to generalize well from very few examples, models based upon HNAs are particularly well suited for biological applications where training examples are rare or noisy. 
Holonomic Gradient Method (HGM) 
The holonomic gradient method introduced by Nakayama et al. (2011) presents a new methodology for evaluating normalizing constants of probability distributions and for obtaining the maximum likelihood estimate of a statistical model. The method utilizes partial differential equations satisfied by the normalizing constant and is based on the Grobner basis theory for the ring of differential operators. In this talk we give an introduction to this new methodology. The method has already proved to be useful for problems in directional statistics and in classical multivariate distribution theory involving hypergeometric functions of matrix arguments. hgm 
HoltWinters double exponential smoothing  This method is used when the data shows a trend. Exponential smoothing with a trend works much like simple smoothing except that two components must be updated each period – level and trend. The level is a smoothed estimate of the value of the data at the end of each period. The trend is a smoothed estimate of average growth at the end of each period. http://…–theholtwintersforecastingmethod.pdf 
HoltWinters Method (HW) 
Holt (1957) and Winters (1960) extended Holt’s method to capture seasonality. The HoltWinters seasonal method comprises the forecast equation and three smoothing equations – one for the level ℓ t , one for trend b t , and one for the seasonal component denoted by s t, with smoothing parameters α , β ∗ and γ. We use m to denote the period of the seasonality, i.e., the number of seasons in a year. For example, for quarterly data m=4 , and for monthly data m=12. There are two variations to this method that differ in the nature of the seasonal component. The additive method is preferred when the seasonal variations are roughly constant through the series, while the multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series. With the additive method, the seasonal component is expressed in absolute terms in the scale of the observed series, and in the level equation the series is seasonally adjusted by subtracting the seasonal component. Within each year the seasonal component will add up to approximately zero. With the multiplicative method, the seasonal component is expressed in relative terms (percentages) and the series is seasonally adjusted by dividing through by the seasonal component. Within each year, the seasonal component will sum up to approximately m. 
Homebrew  Homebrew has made extensive use of GitHub to expand the support of several packages through user contributions. In 2010, Homebrew was the thirdmostforked repository on GitHub. In 2012, Homebrew had the largest number of new contributors on GitHub. In 2013, Homebrew had both the largest number of contributors and issues closed of any project on GitHub. Homebrew has spawned several subprojects such as Linuxbrew, which is a Linux port, Homebrew Cask, which builds upon Homebrew and focuses on the installation of GUI applications, and ‘taps’ dedicated to specific areas or programming languages like PHP. How to Install and Use Homebrew 
Homographic Adaptation  This paper presents a selfsupervised framework for training interest point detectors and descriptors suitable for a large number of multipleview geometry problems in computer vision. As opposed to patchbased neural networks, our fullyconvolutional model operates on fullsized images and jointly computes pixellevel interest point locations and associated descriptors in one forward pass. We introduce Homographic Adaptation, a multiscale, multihomography approach for boosting interest point detection accuracy and performing crossdomain adaptation (e.g., synthetictoreal). Our model, when trained on the MSCOCO generic image dataset using Homographic Adaptation, is able to repeatedly detect a much richer set of interest points than the initial preadapted deep model and any other traditional corner detector. The final system gives rise to strong interest point repeatability on the HPatches dataset and outperforms traditional descriptors such as ORB and SIFT on point matching accuracy and on the task of homography estimation. 
Homoscedasticity  In statistics, a sequence or a vector of random variables is homoscedastic if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used. 
Hopfield Network  A Hopfield network is a form of recurrent artificial neural network invented by John Hopfield in 1982. Hopfield nets serve as contentaddressable memory systems with binary threshold nodes. They are guaranteed to converge to a local minimum, but convergence to a false pattern (wrong local minimum) rather than the stored pattern (expected local minimum) can occur. Hopfield networks also provide a model for understanding human memory. 
HopsFS  Recent improvements in both the performance and scalability of sharednothing, transactional, inmemory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS single node inmemory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS improves capacity and throughput compared to HDFS. HopsFS can store 24 times more metadata than HDFS. We also provide public, fully reproducible experiments based on a workload trace from Spotify that show HopsFS has 2.6 times the throughput of Apache HDFS, lower latency for greater than 400 concurrent clients, and no downtime during failover. Finally, and most significantly, HopsFS allows metadata to be exported to external systems, analyzed or searched online, and easily extended. 
Horn  I introduce a new distributed system for effective training and regularizing of LargeScale Neural Networks on distributed computing architectures. The experiments demonstrate the effectiveness of flexible model partitioning and parallelization strategies based on neuroncentric computation model, with an implementation of the collective and parallel dropout neural networks training. Experiments are performed on MNIST handwritten digits classification including results. 
Horn Implication Counterexamples (HornICE) 
HornICE Learning for Synthesizing Invariants and Contracts 
HornConcerto  Graph representations of large knowledge bases may comprise billions of edges. Usually built upon humangenerated ontologies, several knowledge bases do not feature declared ontological rules and are far from being complete. Current rule mining approaches rely on schemata or store the graph inmemory, which can be unfeasible for large graphs. In this paper, we introduce HornConcerto, an algorithm to discover Horn clauses in large graphs without the need of a schema. Using a standard factbased confidence score, we can mine close Horn rules having an arbitrary body size. We show that our method can outperform existing approaches in terms of runtime and memory consumption and mine highquality rules for the link prediction task, achieving stateoftheart results on a widelyused benchmark. Moreover, we find that rules alone can perform inference significantly faster than embeddingbased methods and achieve accuracies on link prediction comparable to resourcedemanding approaches such as Markov Logic Networks. 
Horovod  Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support interGPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead. Second, the user must modify his or her training code to take advantage of interGPU communication. Depending on the training library’s API, the modification required may be either significant or minimal. Existing methods for enabling multiGPU training under the TensorFlow library entail nonnegligible communication overhead and require users to heavily modify their modelbuilding code, leading many researchers to avoid the whole mess and stick with slower singleGPU training. In this paper we introduce Horovod, an open source library that improves on both obstructions to scaling: it employs efficient interGPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow. Horovod is available under the Apache 2.0 license at https://…/horovod. 
HorseRule  The HorseRule model is a flexible tree based Bayesian regression method for linear and nonlinear regression and classification described in Nalenz & Villani (2017) <arXiv:1702.05008>. horserule 
Horseshoe Estimator  This paper proposes a new approach to sparsesignal detection called the horseshoe estimator. We show that the horseshoe is a close cousin of the lasso in that it arises from the same class of multivariate scale mixtures of normals, but that it is almost universally superior to the doubleexponential prior at handling sparsity. A theoretical framework is proposed for understanding why the horseshoe is a better default ‘sparsity’ estimator than those that arise from poweredexponential priors. Comprehensive numerical evidence is presented to show that the difference in performance can often be large. Most importantly, we show that the horseshoe estimator corresponds quite closely to the answers one would get if one pursued a full Bayesian modelaveraging approach using a ‘twogroups’ model: a point mass at zero for noise, and a continuous density for signals. Surprisingly, this correspondence holds both for the estimator itself and for the classification rule induced by a simple threshold applied to the estimator. We show how the resulting thresholded horseshoe can also be viewed as a novel Bayes multipletesting procedure. horseshoe 
Horseshoe Regularization  Feature subset selection arises in many highdimensional applications in machine learning and statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NPhard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a nonconvex $\ell_\gamma$ penalty for $\gamma\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing nonconvex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables an efficient expectationmaximization algorithm for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithm provides better statistical performance, and the computation requires a fraction of time of state of the art nonconvex solvers. 
Hospital Residents Problem  ➘ “Stable Marriage Problem” 
Hot Deck Imputation  This method sorts respondents and nonrespondents into a number of imputation subsets according to a userspecified set of covariates. An imputation subset comprises cases with the same values as those of the userspecified covariates. Missing values are then replaced with values taken from matching respondents (i.e. respondents that are similar with respect to the covariates). If there is more than one matching respondent for any particular nonrespondent, the user has two choices: 1. The first respondent’s value as counted from the missing entry downwards within the imputation subset is used to impute. The reason for this is that the first respondent’s value may be closer in time to the case that has the missing value. For example, if cases are entered according to the order in which they occur, there may possibly be some type of time effect in some studies. 2. A respondent’s value is randomly selected from within the imputation subset. If a matching respondent does not exist in the initial imputation class, the subset will be collapsed by one level starting with the last variable that was selected as a sort variable, or until a match can be found. Note that if no matching respondent is found, even after all of the sort variables have been collapsed, three options are available: 1. Respecify new sort variables: The user can specify up to five sort variables. 2. Perform random overall imputation: Where the missing value will be replaced with a value randomly selected from the observed values in that variable. 3. Do not impute the missing value: SOLAS will not impute any missing values for which no matching respondent is found. HotDeckImputation,hot.deck 
Hot Spot Analysis  Also known as GetisOrd Gi* – The resultant zscores and pvalues tell you where features with either high or low values cluster spatially. This tool works by looking at each feature within the context of neighboring features. A feature with a high value is interesting by may not be a statistically significant hot spot. To be a statistically significant hotspot, a feature will have a high value and be surrounded by other features with high values as well. The local sum for a feature and its neighbors is compared proportionally to the sum of all features; when the local sum is very different from the expected local sum, and that difference is too large to be the result of random choice, a statistically significant zscore results. The Gi* statistic returned for each feature in the dataset is a zscore. For statistically significant positive zscores, the larger the zscore is, the more intense clustering of high values (hot spot). For statistically significant negative zscores, the smaller the zscore is, the more intense the clustering of low values (cold spot). When to use: Results aren’t reliable with less than 30 features. Applications can be found in crime analysis, epidemiology, voting pattern analysis, economic geography, retail analysis, traffic incident analysis, and demographics. Examples: Where is the disease outbreak concentrated? – Where are kitchen fires a larger than expected proportion of all residential fires? – Where should the evacuation sites be located? – Where/When do peak intensities occur? How Hot Spot Analysis works 
Houdini  Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and nondecomposable. We successfully apply Houdini to a range of applications such as speech recognition, pose estimation and semantic segmentation. In all cases, the attacks based on Houdini achieve higher success rate than those based on the traditional surrogates used to train the models while using a less perceptible adversarial perturbation. 
Huber Loss  In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used. hqreg 
Hubs and Authorities  ➘ “HyperlinkInduced Topic Search” 
HuFu  Recently, Deep Learning (DL), especially Convolutional Neural Network (CNN), develops rapidly and is applied to many tasks, such as image classification, face recognition, image segmentation, and human detection. Due to its superior performance, DLbased models have a wide range of application in many areas, some of which are extremely safetycritical, e.g. intelligent surveillance and autonomous driving. Due to the latency and privacy problem of cloud computing, embedded accelerators are popular in these safetycritical areas. However, the robustness of the embedded DL system might be harmed by inserting hardware/software Trojans into the accelerator and the neural network model, since the accelerator and deploy tool (or neural network model) are usually provided by thirdparty companies. Fortunately, inserting hardware Trojans can only achieve inflexible attack, which means that hardware Trojans can easily break down the whole system or exchange two outputs, but can’t make CNN recognize unknown pictures as targets. Though inserting software Trojans has more freedom of attack, it often requires tampering input images, which is not easy for attackers. So, in this paper, we propose a hardwaresoftware collaborative attack framework to inject hidden neural network Trojans, which works as a backdoor without requiring manipulating input images and is flexible for different scenarios. We test our attack framework for image classification and face recognition tasks, and get attack success rate of 92.6% and 100% on CIFAR10 and YouTube Faces, respectively, while keeping almost the same accuracy as the unattacked model in the normal mode. In addition, we show a specific attack scenario in which a face recognition system is attacked and gives a specific wrong answer. 
Human And Machine coLEarning Technique (HAMLET) 
Efficient label acquisition processes are key to obtaining robust classifiers. However, data labeling is often challenging and subject to high levels of label noise. This can arise even when classification targets are well defined, if instances to be labeled are more difficult than the prototypes used to define the class, leading to disagreements among the expert community. Here, we enable efficient training of deep neural networks. From lowconfidence labels, we iteratively improve their quality by simultaneous learning of machines and experts. We call it Human And Machine coLEarning Technique (HAMLET). Throughout the process, experts become more consistent, while the algorithm provides them with explainable feedback for confirmation. HAMLET uses a neural embedding function and a memory module filled with diverse reference embeddings from different classes. Its output includes classification labels and highly relevant reference embeddings as explanation. We took the study of brain monitoring at intensive care unit (ICU) as an application of HAMLET on continuous electroencephalography (cEEG) data. Although cEEG monitoring yields large volumes of data, labeling costs and difficulty make it hard to build a classifier. Additionally, while experts agree on the labels of clearcut examples of cEEG patterns, labeling many realworld cEEG data can be extremely challenging. Thus, a large minority of sequences might be mislabeled. HAMLET has shown significant performance gain against deep learning and other baselines, increasing accuracy from 7.03% to 68.75% on challenging inputs. Besides improved performance, clinical experts confirmed the interpretability of those reference embeddings in helping explaining the classification results by HAMLET. 
Human Group Optimizer (HGO) 
A large number of optimization algorithms have been developed by researchers to solve a variety of complex problems in operations management area. We present a novel optimization algorithm belonging to the class of swarm intelligence optimization methods. The algorithm mimics the decision making process of human groups and exploits the dynamics of this process as an optimization tool for combinatorial problems. In order to achieve this aim, a continuoustime Markov process is proposed to describe the behavior of a population of socially interacting agents, modelling how humans in a group modify their opinions driven by selfinterest and consensus seeking. As in the case of a collection of spins, the dynamics of such a system is characterized by a phase transition from low to high values of the overall consenus (magnetization). We recognize this phase transition as being associated with the emergence of a collective superior intelligence of the population. While this state being active, a cooling schedule is applied to make agents closer and closer to the optimal solution, while performing their random walk on the fitness landscape. A comparison with simulated annealing as well as with a multiagent version of the simulated annealing is presented in terms of efficacy in finding good solution on a NK – Kauffman landscape. In all cases our method outperforms the others, particularly in presence of limited knowledge of the agent. 
Humanintheloop Artificial Intelligence (HITAI) 
Little by little, newspapers are revealing the bright future that Artificial Intelligence (AI) is building. Intelligent machines will help everywhere. However, this bright future has a dark side: a dramatic job market contraction before its unpredictable transformation. Hence, in a near future, large numbers of job seekers will need financial support while catching up with these novel unpredictable jobs. This possible job market crisis has an antidote inside. In fact, the rise of AI is sustained by the biggest knowledge theft of the recent years. Learning AI machines are extracting knowledge from unaware skilled or unskilled workers by analyzing their interactions. By passionately doing their jobs, these workers are digging their own graves. In this paper, we propose Humanintheloop Artificial Intelligence (HITAI) as a fairer paradigm for Artificial Intelligence systems. HITAI will reward aware and unaware knowledge producers with a different scheme: decisions of AI systems generating revenues will repay the legitimate owners of the knowledge used for taking those decisions. As modern Robin Hoods, HITAI researchers should fight for a fairer Artificial Intelligence that gives back what it steals. 
HumanMachine Inference Network (HuMaIN) 
The emerging paradigm of HumanMachine Inference Networks (HuMaINs) combines complementary cognitive strengths of humans and machines in an intelligent manner to tackle various inference tasks and achieves higher performance than either humans or machines by themselves. While inference performance optimization techniques for humanonly or sensoronly networks are quite mature, HuMaINs require novel signal processing and machine learning solutions. In this paper, we present an overview of the HuMaINs architecture with a focus on three main issues that include architecture design, inference algorithms including security/privacy challenges, and application areas/use cases. 
Hurst Coefficient  ➘ “Hurst Exponent” 
Hurst Exponent  The Hurst exponent is used as a measure of longterm memory of time series. It relates to the autocorrelations of the time series, and the rate at which these decrease as the lag between pairs of values increases. Studies involving the Hurst exponent were originally developed in hydrology for the practical matter of determining optimum dam sizing for the Nile river’s volatile rain and drought conditions that had been observed over a long period of time. The name ‘Hurst exponent’, or ‘Hurst coefficient’, derives from Harold Edwin Hurst (18801978), who was the lead researcher in these studies; the use of the standard notation H for the coefficient relates to his name also. In fractal geometry, the generalized Hurst exponent has been denoted by H or Hq in honor of both Harold Edwin Hurst and Ludwig Otto Hölder (18591937) by Benoît Mandelbrot (19242010). H is directly related to fractal dimension, D, and is a measure of a data series’ ‘mild’ or ‘wild’ randomness. The Hurst exponent is referred to as the ‘index of dependence’ or ‘index of longrange dependence’. It quantifies the relative tendency of a time series either to regress strongly to the mean or to cluster in a direction. A value H in the range 0.51 indicates a time series with longterm positive autocorrelation, meaning both that a high value in the series will probably be followed by another high value and that the values a long time into the future will also tend to be high. A value in the range 0 – 0.5 indicates a time series with longterm switching between high and low values in adjacent pairs, meaning that a single high value will probably be followed by a low value and that the value after that will tend to be high, with this tendency to switch between high and low values lasting a long time into the future. A value of H=0.5 can indicate a completely uncorrelated series, but in fact it is the value applicable to series for which the autocorrelations at small time lags can be positive or negative but where the absolute values of the autocorrelations decay exponentially quickly to zero. This in contrast to the typically power law decay for the 0.5 < H < 1 and 0 < H < 0.5 cases. 
HVARX  The Vector AutoRegressive (VAR) model is fundamental to the study of multivariate time series. Although VAR models are intensively investigated by many researchers, practitioners often show more interest in analyzing VARX models that incorporate the impact of unmodeled exogenous variables (X) into the VAR. However, since the parameter space grows quadratically with the number of time series, estimation quickly becomes challenging. While several proposals have been made to sparsely estimate large VAR models, the estimation of large VARX models is underexplored. Moreover, typically these sparse proposals involve a lassotype penalty and do not incorporate lag selection into the estimation procedure. As a consequence, the resulting models may be difficult to interpret. In this paper, we propose a lagbased hierarchically sparse estimator, called ‘HVARX’, for large VARX models. We illustrate the usefulness of HVARX on a crosscategory management marketing application. Our results show how it provides a highly interpretable model, and improves outofsample forecast accuracy compared to a lassotype approach. 
Hy  Hy is a Lisp dialect that converts its structure into Python’s abstract syntax tree. It is to Python what LFE is to Erlang.This provides developers from many backgrounds with the following: · A lisp that feels very Pythonic · A great way to use Lisp’s crazy powers but in the wide world of Python’s libraries · A great way to start exploring Lisp, from the comfort of python · A pleasant language that has a lot of neat ideas 🙂 
Hybrid  We study the problem of personalized, interactive tag recommendation for Flickr: While a user enters/selects new tags for a particular picture, the system suggests related tags to her, based on the tags that she or other people have used in the past along with (some of) the tags already entered. The suggested tags are dynamically updated with every additional tag entered/selected. We describe a new algorithm, called Hybrid, which can be applied to this problem, and show that it outperforms previous algorithms. It has only a single tunable parameter, which we found to be very robust. 
Hybrid Ant Colony Optimization Algorithm (HACO) 
In this paper, we propose a Hybrid Ant Colony Optimization algorithm (HACO) for Next Release Problem (NRP). NRP, a NPhard problem in requirement engineering, is to balance customer requests, resource constraints, and requirement dependencies by requirement selection. Inspired by the successes of Ant Colony Optimization algorithms (ACO) for solving NPhard problems, we design our HACO to approximately solve NRP. Similar to traditional ACO algorithms, multiple artificial ants are employed to construct new solutions. During the solution construction phase, both pheromone trails and neighborhood information will be taken to determine the choices of every ant. In addition, a local search (first found hill climbing) is incorporated into HACO to improve the solution quality. Extensively wide experiments on typical NRP test instances show that HACO outperforms the existing algorithms (GRASP and simulated annealing) in terms of both solution uality and running time. 
Hybrid Artificial Intelligence  ➘ “Hybrid Intelligent System” 
Hybrid Artificial Intelligence Optimization  Book: Applications of Artificial Intelligence Techniques in Industry 4.0 
Hybrid Consensus Alternating Direction Method of Multipliers (HCADMM) 
The present work introduces the hybrid consensus alternating direction method of multipliers (HCADMM), a novel framework for optimization over networks which unifies existing distributed optimization approaches, including the centralized and the decentralized consensus ADMM. HCADMM provides a flexible tool that leverages the underlying graph topology in order to achieve a desirable sweetspot between nodetonode communication overhead and rate of convergence — thereby alleviating known limitations of both CCADMM and DCADMM. A rigorous analysis of the novel method establishes linear convergence rate, and also guides the choice of parameters to optimize this rate. The novel hybrid update rules of HCADMM lend themselves to ‘innetwork acceleration’ that is shown to effect considerable — and essentially ‘freeofcharge’ — performance boost over the fully decentralized ADMM. Comprehensive numerical tests validate the analysis and showcase the potential of the method in tackling efficiently, widely useful learning tasks. 
Hybrid Contextualized Sentiment Classifier (HCSC) 
The use of user/product information in sentiment analysis is important, especially for coldstart users/products, whose number of reviews are very limited. However, current models do not deal with the coldstart problem which is typical in review websites. In this paper, we present Hybrid Contextualized Sentiment Classifier (HCSC), which contains two modules: (1) a fast word encoder that returns word vectors embedded with short and long range dependency features; and (2) ColdStart Aware Attention (CSAA), an attention mechanism that considers the existence of coldstart problem when attentively pooling the encoded word vectors. HCSC introduces shared vectors that are constructed from similar users/products, and are used when the original distinct vectors do not have sufficient information (i.e. coldstart). This is decided by a frequencyguided selective gate vector. Our experiments show that in terms of RMSE, HCSC performs significantly better when compared with on famous datasets, despite having less complexity, and thus can be trained much faster. More importantly, our model performs significantly better than previous models when the training data is sparse and has coldstart problems. 
Hybrid FilterWrapper Feature Selection Method  HybridFS 
Hybrid Intelligent System  Hybrid intelligent system denotes a software system which employs, in parallel, a combination of methods and techniques from artificial intelligence subfields as: · Neurofuzzy systems · hybrid connectionistsymbolic models · Fuzzy expert systems · Connectionist expert systems · Evolutionary neural networks · Genetic fuzzy systems · Rough fuzzy hybridization · Reinforcement learning with fuzzy, neural, or evolutionary methods as well as symbolic reasoning methods. From the cognitive science perspective, every natural intelligent system is hybrid because it performs mental operations on both the symbolic and subsymbolic levels. For the past few years there has been an increasing discussion of the importance of A.I. Systems Integration. Based on notions that there have already been created simple and specific AI systems (such as systems for computer vision, speech synthesis, etc., or software that employs some of the models mentioned above) and now is the time for integration to create broad AI systems. Proponents of this approach are researchers such as Marvin Minsky, Ron Sun, Aaron Sloman, and Michael A. Arbib. An example hybrid is a hierarchical control system in which the lowest, reactive layers are subsymbolic. The higher layers, having relaxed time constraints, are capable of reasoning from an abstract world model and performing planning. Intelligent systems usually rely on hybrid reasoning systems, which include induction, deduction, abduction and reasoning by analogy. 
Hybrid Monte Carlo  In mathematics and physics, the hybrid Monte Carlo algorithm, also known as Hamiltonian Monte Carlo, is a Markov chain Monte Carlo method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult. This sequence can be used to approximate the distribution (i.e., to generate a histogram), or to compute an integral (such as an expected value). It differs from the MetropolisHastings algorithm by reducing the correlation between successive sampled states by using a Hamiltonian evolution between states and additionally by targeting states with a higher acceptance criteria than the observed probability distribution. This causes it to converge more quickly to the absolute probability distribution. It was devised by Simon Duane, A.D. Kennedy, Brian Pendleton and Duncan Roweth in 1987. ➚ “Hamiltonian Monte Carlo” 
Hybrid Transactional / Analytical Processing (HTAP) 
Hybrid Transactional/Analytical Processing (HTAP) is a term used to describe the capability of a single database that can perform both online transaction processing (OLTP) and online analytical processing (OLAP) for the purpose of realtime operational intelligence processing. The term was created by Gartner, Inc., a technology research firm. 
HybridNet  In this paper, we introduce a new model for leveraging unlabeled data to improve generalization performances of image classifiers: a twobranch encoderdecoder architecture called HybridNet. The first branch receives supervision signal and is dedicated to the extraction of invariant classrelated representations. The second branch is fully unsupervised and dedicated to model information discarded by the first branch to reconstruct input data. To further support the expected behavior of our model, we propose an original training objective. It favors stability in the discriminative branch and complementarity between the learned representations in the two branches. HybridNet is able to outperform stateoftheart results on CIFAR10, SVHN and STL10 in various semisupervised settings. In addition, visualizations and ablation studies validate our contributions and the behavior of the model on both CIFAR10 and STL10 datasets. 
HybridSVD  We propose a hybrid algorithm for top$n$ recommendation task that allows to incorporate both user and item side information within the standard collaborative filtering approach. The algorithm extends PureSVD — one of the stateoftheart latent factor models — by exploiting a generalized formulation of the singular value decomposition. This allows to inherit key advantages of the classical algorithm such as highly efficient Lanczosbased optimization procedure, minimal parameter tuning during a model selection phase and a quick foldingin computation to generate recommendations instantly even in a highly dynamic online environment. Within the generalized formulation itself we provide an efficient scheme for side information fusion which avoids undesirable computational overhead and addresses the scalability question. Evaluation of the model is performed in both standard and coldstart scenarios using the datasets with different sparsity levels. We demonstrate in which cases our approach outperforms conventional methods and also provide some intuition on when it may give no significant improvement. 
Hydranet  Despite recent efforts, deep learning techniques remain often heavily dependent on a large quantity of labeled data. This problem is even more challenging in medical image analysis where the annotator expertise is often scarce. In this paper we propose a novel dataaugmentation method to regularize neural network regressors, learning from a single global label per image. The principle of the method is to create new samples by recombining existing ones. We demonstrate the performance of our algorithm on two tasks: the regression of number of enlarged perivascular spaces in the basal ganglia; and the regression of white matter hyperintensities volume. We show that the proposed method improves the performance even when more basic data augmentation is used. Furthermore we reached an intraclass correlation coefficient between ground truth and network predictions of 0.73 on the first task and 0.86 on the second task, only using between 25 and 30 scans with a single global label per scan for training. To achieve a similar correlation on the first task, stateoftheart methods needed more than 1000 training scans. 
Hyperbolic Attention Network  We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and powerlaw structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by reexpressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact. 
Hyperbolic Neural Network  Hyperbolic spaces have recently gained momentum in the context of machine learning due to their high capacity and treelikeliness properties. However, the representational power of hyperbolic geometry is not yet on par with Euclidean geometry, mostly because of the absence of corresponding hyperbolic neural network layers. This makes it hard to use hyperbolic embeddings in downstream tasks. Here, we bridge this gap in a principled manner by combining the formalism of M\’obius gyrovector spaces with the Riemannian geometry of the Poincar\’e model of hyperbolic spaces. As a result, we derive hyperbolic versions of important deep learning tools: multinomial logistic regression, feedforward and recurrent neural networks such as gated recurrent units. This allows to embed sequential data and perform classification in the hyperbolic space. Empirically, we show that, even if hyperbolic optimization tools are limited, hyperbolic sentence embeddings either outperform or are on par with their Euclidean variants on textual entailment and noisyprefix recognition tasks. 
Hyperdata  Hyperdata indicates data objects linked to other data objects in other places, as hypertext indicates text linked to other text in other places. Hyperdata enables formation of a web of data, evolving from the “data on the Web” that is not interrelated (or at least, not linked). In the same way that hypertext usually refers to the World Wide Web but is a broader term, hyperdata usually refers to the Semantic Web, but may also be applied more broadly to other datalinking technologies such as Microformats – including XHTML Friends Network. 
HyperDenseNet  Recently, dense connections have attracted substantial attention in computer vision because they facilitate gradient flow and implicit deep supervision during training. Particularly, DenseNet, which connects each layer to every other layer in a feedforward fashion, has shown impressive performances in natural image classification tasks. We propose HyperDenseNet, a 3D fully convolutional neural network that extends the definition of dense connectivity to multimodal segmentation problems. Each imaging modality has a path, and dense connections occur not only between the pairs of layers within the same path, but also between those across different paths. This contrasts with the existing multimodal CNN approaches, in which modeling several modalities relies entirely on a single joint layer (or level of abstraction) for fusion, typically either at the input or at the output of the network. Therefore, the proposed network has total freedom to learn more complex combinations between the modalities, within and inbetween all the levels of abstraction, which increases significantly the learning representation. We report extensive evaluations over two different and highly competitive multimodal brain tissue segmentation challenges, iSEG 2017 and MRBrainS 2013, with the former focusing on 6month infant data and the latter on adult images. HyperDenseNet yielded significant improvements over many stateoftheart segmentation networks, ranking at the top on both benchmarks. We further provide a comprehensive experimental analysis of features reuse, which confirms the importance of hyperdense connections in multimodal representation learning. Our code is publicly available at https://…/HyperDenseNet. 
HyperFusionNet  Salient object detection (SOD), which aims to find the most important region of interest and segment the relevant object/item in that area, is an important yet challenging vision task. This problem is inspired by the fact that human seems to perceive main scene elements with high priorities. Thus, accurate detection of salient objects in complex scenes is critical for humancomputer interaction. In this paper, we present a novel feature learning framework for SOD, in which we cast the SOD as a pixelwise classification problem. The proposed framework utilizes a densely hierarchical feature fusion network, named HyperFusionNet, automatically predicts the most important area and segments the associated objects in an endtoend manner. Specifically, inspired by the human perception system and image reflection separation, we first decompose input images into reflective image pairs by contentpreserving transforms. Then, the complementary information of reflective image pairs is jointly extracted by an interweaved convolutional neural network (ICNN) and hierarchically combined with a hyperdense fusion mechanism. Based on the fused multiscale features, our method finally achieves a promising way of predicting SOD. As shown in our extensive experiments, the proposed method consistently outperforms other stateoftheart methods on seven public datasets with a large margin. 
Hypergraphbased Outlier Test for Categorical Data (HOT) 
As a widely used data mining technique, outlier detection is a process which aims to find anomalies while providing good explanations. Most existing detection methods are basically designed for numeric data, however, reallife data such as web pages, business transactions and bioinformatics records always contain categorical data. So it causes difficulty to find reasonable exceptions in the real world applications. In this paper, we introduce a novel outlier mining method based on hypergraph model for categorical data. Since hy pergraphs precisely capture the distribution characteristics in data subspaces, this method is effective in identifying anomalies in dense subspaces and presents good interpre tations for the local outlierness. By selecting the most rel evant subspaces, the problem of ‘curse of dimensionality’ in very large databases can also be ameliorated. Further more, the connectivity property is used to replace the dis tance metrics, so that the distancebased computation is not needed anymore, which enhances the robustness for han dling missingvalue data. The fact that connectivity com putation facilitates the aggregation operations supported by most SQLcompatible database systems, makes the mining process much efficient. Finally, we give experiments and analysis which show that our method can find outliers in categorical data with good performance and quality. 
HyperHeuristics  A hyperheuristic is a heuristic search method that seeks to automate, often by the incorporation of machine learning techniques, the process of selecting, combining, generating or adapting several simpler heuristics (or components of such heuristics) to efficiently solve computational search problems. One of the motivations for studying hyperheuristics is to build systems which can handle classes of problems rather than solving just one problem. There might be multiple heuristics from which one can choose for solving a problem, and each heuristic has its own strength and weakness. The idea is to automatically devise algorithms by combining the strength and compensating for the weakness of known heuristics. In a typical hyperheuristic framework there is a highlevel methodology and a set of lowlevel heuristics (either constructive or perturbative heuristics). Given a problem instance, the highlevel method selects which lowlevel heuristic should be applied at any given time, depending upon the current problem state, or search stage. 
HyperlinkInduced Topic Search (HITS) 
HyperlinkInduced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that it held, but were used as compilations of a broad catalog of information that led users directly to other authoritative pages. In other words, a good hub represented a page that pointed to many other pages, and a good authority represented a page that was linked by many different hubs. The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages. Network Analysis for Wikipedia HITS Algorithm – Hubs and Authorities on the Internet 
HyperLogLog  HyperLogLog is an algorithm for the countdistinct problem, approximating the number of distinct elements in a multiset (the cardinality). Calculating the exact cardinality of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly less memory than this, at the cost of obtaining only an approximation of the cardinality. The HyperLogLog algorithm is able to estimate cardinalities of with a typical accuracy of 2%, using 1.5kB of memory. HyperLogLog is an extension of the earlier LogLog algorithm. 
Hyperparameter  In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis. For example, if one is using a beta distribution to model the distribution of the parameter p of a Bernoulli distribution, then: · p is a parameter of the underlying system (Bernoulli distribution), and · alpha and beta are parameters of the prior distribution (beta distribution), hence hyperparameters One may take a single value for a given hyperparameter, or one can iterate and take a probability distribution on the hyperparameter itself, called a hyperprior. State of Hyperparameter Selection 
Hyperparameter Optimization  In the context of machine learning, hyperparameter optimization or model selection is the problem of choosing a set of hyperparameters for a learning algorithm, usually with the goal of obtaining good generalization. Hyperparameter optimization contrasts with actual learning problems, which are also often cast as optimization problems, but optimize a loss function on the training set alone. In effect, learning algorithms learn parameters that model/reconstruct their inputs well, while hyperparameter optimization is to ensure the model does not overfit its data by tuning, e.g., regularization. 
Hyperspherical Convolution (SphereConv) 
Convolution as inner product has been the founding basis of convolutional neural networks (CNNs) and the key to endtoend visual representation learning. Benefiting from deeper architectures, recent CNNs have demonstrated increasingly strong representation abilities. Despite such improvement, the increased depth and larger parameter space have also led to challenges in properly training a network. In light of such challenges, we propose hyperspherical convolution (SphereConv), a novel learning framework that gives angular representations on hyperspheres. We introduce SphereNet, deep hyperspherical convolution networks that are distinct from conventional inner product based convolutional networks. In particular, SphereNet adopts SphereConv as its basic convolution operator and is supervised by generalized angular softmax loss – a natural loss formulation under SphereConv. We show that SphereNet can effectively encode discriminative representation and alleviate training difficulty, leading to easier optimization, faster convergence and comparable (even better) classification accuracy over convolutional counterparts. We also provide some theoretical insights for the advantages of learning on hyperspheres. In addition, we introduce the learnable SphereConv, i.e., a natural improvement over prefixed SphereConv, and SphereNorm, i.e., hyperspherical learning as a normalization method. Experiments have verified our conclusions. 
Hyperspherical Variational AutoEncoder  The Variational AutoEncoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von MisesFisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$VAE, in low dimensions on other data types. 
HyperTools  A python toolbox for gaining geometric insights into highdimensional data. 
Hypervariate Data  Hypervariate data is Data with four or more dimensions in the dataset. Dartmouth College researchers have published a free Python software package called HyperTools that allows users to turn complex data into 3D shapes or animations. The tool allows users to visualize patterns in their data and compare the characteristics of different datasets, which in turn could inform researchers on how to train their machine learning algorithms by illuminating differences between groups of data. Additionally, the Dartmouth researchers have published tutorials for HyperTools and a gallery of examples, such as how to plot the text of State of the Union addresses, to help users create visualizations. 
Hypervolume Under Manifold (HUM) 
Paper: Jialiang Li (2008) <doi:10.1093/biostatistics/kxm050>. Jialiang Li (2014) <doi:10.3109/1354750X.2013.868516>. mcca 
Hypothesistestingbased Adaptive Spline Filtering (HASF) 
Trend Analysis of Fragmented Time Series for mHealth Apps: Hypothesis Testing Based Adaptive Spline Filtering Method with Importance Weighting 
Advertisements