Eager Execution  Eager execution is an imperative, definebyrun interface where operations are executed immediately as they are called from Python. This makes it easier to get started with TensorFlow, and can make research and development more intuitive. The benefits of eager execution include: · Fast debugging with immediate runtime errors and integration with Python tools · Support for dynamic models using easytouse Python control flow · Strong support for custom and higherorder gradients · Almost all of the available TensorFlow operations Eager execution is available now as an experimental feature, so we’re looking for feedback from the community to guide our direction. ➘ “Tensorflow” 
Eager Learning  In artificial intelligence, eager learning is a learning method in which the system tries to construct a general, input independent target function during training of the system, as opposed to lazy learning, where generalization beyond the training data is delayed until a query is made to the system. The main advantage gained in employing an eager learning method, such as an artificial neural network, is that the target function will be approximated globally during training, thus requiring much less space than a lazy learning system. Eager learning systems also deal much better with noise in the training data. Eager learning is an example of offline learning, in which posttraining queries to the system have no effect on the system itself, and thus the same query to the system will always produce the same result. The main disadvantage with eager learning is that it is generally unable to provide good local approximations in the target function. 
EagleEye  Deep neural networks (DNNs) are inherently vulnerable to adversarial inputs: such maliciously crafted samples trigger DNNs to misbehave, leading to detrimental consequences for DNNpowered systems. The fundamental challenges of mitigating adversarial inputs stem from their adaptive and variable nature. Existing solutions attempt to improve DNN resilience against specific attacks; yet, such static defenses can often be circumvented by adaptively engineered inputs or by new attack variants. Here, we present EagleEye, an attackagnostic adversarial tampering analysis engine for DNNpowered systems. Our design exploits the {\em minimality principle} underlying many attacks: to maximize the attack’s evasiveness, the adversary often seeks the minimum possible distortion to convert genuine inputs to adversarial ones. We show that this practice entails the distinct distributional properties of adversarial inputs in the input space. By leveraging such properties in a principled manner, EagleEye effectively discriminates adversarial inputs and even uncovers their correct classification outputs. Through extensive empirical evaluation using a range of benchmark datasets and DNN models, we validate EagleEye’s efficacy. We further investigate the adversary’s possible countermeasures, which implies a difficult dilemma for her: to evade EagleEye’s detection, excessive distortion is necessary, thereby significantly reducing the attack’s evasiveness regarding other detection mechanisms. 
EAP  A good clustering algorithm should not only be able to discover clusters of arbitrary shapes (global view) but also provide additional information, which can be used to gain more meaningful insights into the internal structure of the clusters (local view). In this work we use the mathematical framework of factor graphs and message passing algorithms to optimize a pairwise similarity based cost function, in the same spirit as was done in Affinity Propagation. Using this framework we develop two variants of a new clustering algorithm, EAP and SHAPE. EAP/SHAPE can not only discover clusters of arbitrary shapes but also provide a rich local view in the form of meaningful local representatives (exemplars) and connections between these local exemplars. We discuss how this local information can be used to gain various insights about the clusters including varying relative cluster densities and indication of local strength in different regions of a cluster . We also discuss how this can help an analyst in discovering and resolving potential inconsistencies in the results. The efficacy of EAP/SHAPE is shown by applying it to various synthetic and real world benchmark datasets. 
EARL  In order to answer natural language questions over knowledge graphs, most processing pipelines involve entity and relation linking. Traditionally, entity linking and relation linking has been performed either as dependent sequential tasks or independent parallel tasks. In this paper, we propose a framework, called EARL, which performs entity linking and relation linking as a joint single task. EARL is modeled on an optimised variation of GeneralisedTravelling Salesperson Problem. The system determines the best semantic connection between all keywords of the question by referring to the knowledge graph. This is achieved by exploiting the connection density between entity candidates and relation candidates. We have empirically evaluated the framework on a dataset with 3000 complex questions. Our system surpasses stateoftheart scores for entity linking task by reporting an accuracy of 0.67against 0.40 from the next best entity linker 
Early Stopping  In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner’s performance on data outside of the training set. Past that point, however, improving the learner’s fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to overfit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation. 
Earnings Before Interest, Taxes, Depreciation and Amortization (EBIDTA) 
A company’s earnings before interest, taxes, depreciation, and amortization (EBITDA) is an accounting metric computed by considering a company’s earnings before interest payments, tax, depreciation, and amortization are subtracted for any final accounting of its income and expenses. The EBITDA of a business gives an indication of its current operational profitability, i.e., how much profit it makes with its present assets and its operations on the products it produces and sells. 
Earth Mover’s Distance (EMD) 
In computer science, the earth mover’s distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric. Informally, if the distributions are interpreted as two different ways of piling up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by which it is moved. The above definition is valid only if the two distributions have the same integral (informally, if the two piles have the same amount of dirt), as in normalized histograms or probability density functions. In that case, the EMD is equivalent to the 1st Mallows distance or 1st Wasserstein distance between the two distributions. ➘ “Wasserstein Metric” 
Easy Convolution and Random Pooling (ECP) 
Convolution operations dominate the overall execution time of Convolutional Neural Networks (CNNs). This paper proposes an easy yet efficient technique for both Convolutional Neural Network training and testing. The conventional convolution and pooling operations are replaced by Easy Convolution and Random Pooling (ECP). In ECP, we randomly select one pixel out of four and only conduct convolution operations of the selected pixel. As a result, only a quarter of the conventional convolution computations are needed. Experiments demonstrate that the proposed EasyConvPooling can achieve 1.45x speedup on training time and 1.64x on testing time. What’s more, a speedup of 5.09x on pure Easy Convolution operations is obtained compared to conventional convolution operations. 
EasyConvPooling  ➚ “Easy Convolution and Random Pooling” 
EC3  Classification and clustering algorithms have been proved to be successful individually in different contexts. Both of them have their own advantages and limitations. For instance, although classification algorithms are more powerful than clustering methods in predicting class labels of objects, they do not perform well when there is a lack of sufficient manually labeled reliable data. On the other hand, although clustering algorithms do not produce label information for objects, they provide supplementary constraints (e.g., if two objects are clustered together, it is more likely that the same label is assigned to both of them) that one can leverage for label prediction of a set of unknown objects. Therefore, systematic utilization of both these types of algorithms together can lead to better prediction performance. In this paper, We propose a novel algorithm, called EC3 that merges classification and clustering together in order to support both binary and multiclass classification. EC3 is based on a principled combination of multiple classification and multiple clustering methods using an optimization function. We theoretically show the convexity and optimality of the problem and solve it by block coordinate descent method. We additionally propose iEC3, a variant of EC3 that handles imbalanced training data. We perform an extensive experimental analysis by comparing EC3 and iEC3 with 14 baseline methods (7 wellknown standalone classifiers, 5 ensemble classifiers, and 2 existing methods that merge classification and clustering) on 13 standard benchmark datasets. We show that our methods outperform other baselines for every single dataset, achieving at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than the best baseline), more resilient to noise and class imbalance than the best baseline method. 
Echo State Network (ESN) 
The echo state network (ESN), is a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can (re)produce specific temporal patterns. The main interest of this network is that although its behaviour is nonlinear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system. Alternatively, one may consider a nonparametric Bayesian formulation of the output layer, under which: (i) a prior distribution is imposed over the output weights; and (ii) the output weights are marginalized out in the context of prediction generation, given the training data. This idea has been demonstrated in by using Gaussian priors, whereby a Gaussian process model with ESNdriven kernel function is obtained. Such a solution was shown to outperform ESNs with trainable (finite) sets of weights in several benchmarks. Deep Echo State Networks for Diagnosis of Parkinson’s Disease 
Eclat Algorithm  The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains. The basic idea for the eclat algorithm is use tidset intersections to compute the support of a candidate itemset avoiding the generation of subsets that does not exist in the prefix tree. 
Ecological Regression  Ecological regression is a statistical technique used especially in political science and history to estimate group voting behavior from aggregate data. For example, if counties have a known Democratic vote (in percentage) D, and a known percentage of Catholics, C, then run the linear regression of dependent variable D against independent variable C. This gives D = a + bC. When C = 1 (100% Catholic) this gives the estimated Democratic vote as a+b. When C = 0 (0% Catholic), this gives the estimated nonCatholic vote as a. For example, if the regression gives D = .22 + .45C, then the estimated Catholic vote is 67% Democratic and the nonCatholic vote is 22% Democratic. The technique has been often used in litigation brought under the Voting Rights Act of 1965 to see how blacks and whites voted. 
EcologicallyInspired GENetic Approach for Neural Network Structure Searching (EIGEN) 
Designing the structure of neural networks is considered one of the most challenging tasks in deep learning. Recently, a few approaches have been proposed to automatically search for the optimal structure of neural networks, however, they suffer from either prohibitive computation cost (e.g., 256 Hours on 250 GPU in [1]) or unsatisfactory performance compared to those of handcrafted neural networks. In this paper, we propose an EcologicallyInspired GENetic approach for neural network structure search (EIGEN), that includes succession, mimicry and gene duplication. Specifically, we first use primary succession to rapidly evolve a community of poor initialized neural network structures into a more diverse community, followed by a secondary succession stage for finegrained searching based on the networks from the primary succession. Extinction is applied in both stages to reduce computational cost. Mimicry is employed during the entire evolution process to help the inferior networks imitate the behavior of a superior network and gene duplication is utilized to duplicate the learned blocks of novel structures, both of which help to find the better network structures. Extensive experimental results show that our proposed approach can achieve the similar or better performance compared to the existing genetic approaches with dramatically reduced computation cost. For example, the network discovered by our approach on CIFAR100 dataset achieves 78.1% test accuracy under 120 GPU hours, compared to 77.0% test accuracy in more than 65, 536 GPU hours in [1]. 
Econometrics  Econometrics is the application of mathematics, statistical methods, and, more recently, computer science, to economic data and is described as the branch of economics that aims to give empirical content to economic relations. More precisely, it is “the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference.” An introductory economics textbook describes econometrics as allowing economists “to sift through mountains of data to extract simple relationships.” The first known use of the term “econometrics” (in cognate form) was by Polish economist Pawel Ciompa in 1910. Ragnar Frisch is credited with coining the term in the sense in which it is used today. Econometrics is the intersection of economics, mathematics, and statistics. Econometrics adds empirical content to economic theory allowing theories to be tested and used for forecasting and policy evaluation. 
Edgent  As the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resourceconstrained mobile devices is, however, by no means trivial, since it incurs high performance and energy overhead. While offloading DNNs to the cloud for execution suffers unpredictable performance, due to the uncontrolled long widearea network latency. To address these challenges, in this paper, we propose Edgent, a collaborative and ondemand DNN coinference framework with deviceedge synergy. Edgent pursues two design knobs: (1) DNN partitioning that adaptively partitions DNN computation between device and edge, in order to leverage hybrid computation resources in proximity for realtime DNN inference. (2) DNN rightsizing that accelerates DNN inference through earlyexit at a proper intermediate DNN layer to further reduce the computation latency. The prototype implementation and extensive evaluations based on Raspberry Pi demonstrate Edgent’s effectiveness in enabling ondemand lowlatency edge intelligence. 
Edgeworth Series  The GramCharlier A series (named in honor of Jørgen Pedersen Gram and Carl Charlier), and the Edgeworth series (named in honor of Francis Ysidro Edgeworth) are series that approximate a probability distribution in terms of its cumulants. The series are the same; but, the arrangement of terms (and thus the accuracy of truncating the series) differ. http://…/EdgeworthSeries.html EW 
eDiscovery  Electronic discovery (or ‘eDiscovery’) is the process of identifying, preserving, collecting, analyzing, reviewing, and producing electronically stored information (ESI). Structured and unstructured data analysis is at the core of eDiscovery. Even routine matters regularly involve hundreds of gigabytes of data that much be analyzed for relevancy and privilege. Most often undertaken for litigation and regulatory compliance, eDiscovery processes and the underlying data mining technology are also deployed for internal investigations, due diligence, financial contract analysis, privacy impact assessments (including GDPR), and data breach responses. Undoubtedly, eDiscovery efforts are crucial to ongoing success in today’s modern corporation. For effective eDiscovery, enterprises need to be able to search through information across their entire enterprise, including both structured (e.g. databases) and unstructured data (e.g. emails, images), and effectively analyze content. The best eDiscovery software will integrate with existing systems and litigationready policies. It enables targeted data collections, sophisticated culling and deduplication. In addition, the capabilities of the best eDiscovery software includes AIenhanced analysis, full review and tagging, automated redactions, and DIY productions. 
EDISON Data Science Framework (EDSF) 
The EDISON Data Science Framework is a collection of documents that define the Data Science profession. Freely available, these documents have been developed to guide educators and trainers, emplyers and managers, and Data Scientists themselves. This collection of documents collectively breakdown the complexity of the skills and competences need to define Data Science as a professional practice. 
EDivisive with Medians (EDM) 
EDivisive with Medians (EDM) – employs energy statistics to detect divergence in mean. Note that EDM can also be used detect change in distribution in a given time series. EDM uses robust statistical metrics, viz., median, and estimates the statistical significance of a breakout through a permutation test. In addition, EDM is nonparametric. This is important since the distribution of production data seldom (if at all) follows the commonly assumed normal distribution or any other widely accepted model. 
Educational Data Mining (EDM) 
Educational Data Mining (EDM) describes a research field concerned with the application of data mining, machine learning and statistics to information generated from educational settings (e.g., universities and intelligent tutoring systems). At a high level, the field seeks to develop and improve methods for exploring this data, which often has multiple levels of meaningful hierarchy, in order to discover new insights about how people learn in the context of such settings. In doing so, EDM has contributed to theories of learning investigated by researchers in educational psychology and the learning sciences. The field is closely tied to that of learning analytics, and the two have been compared and contrasted. 
Edward  Probabilistic modeling is a powerful approach for analyzing empirical information. We describe Edward, a library for probabilistic modeling. Edward’s design reflects an iterative process pioneered by George Box: build a model of a phenomenon, make inferences about the model given data, and criticize the model’s fit to the data. Edward supports a broad class of probabilistic models, efficient algorithms for inference, and many techniques for model criticism. The library builds on top of TensorFlow to support distributed training and hardware such as GPUs. Edward enables the development of complex probabilistic models and their algorithms at a massive scale. 
Eesen Framework (Eesen) 
The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically simplifies the existing pipeline to build stateoftheart ASR systems. Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting contextindependent targets (phonemes or characters). To remove the need for pregenerated frame labels, we adopt the connectionist temporal classification (CTC) objective function to infer the alignments between speech and label sequences. A distinctive feature of Eesen is a generalized decoding approach based on weighted finitestate transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding. Experiments show that compared with the standard hybrid DNN systems, Eesen achieves comparable word error rates (WERs), while at the same time speeding up decoding significantly. 
Effect Size  In statistics, an effect size is a quantitative measure of the strength of a phenomenon. Examples of effect sizes are the correlation between two variables, the regression coefficient, the mean difference, or even the risk with which something happens, such as how many people survive after a heart attack for every one person that does not survive. For each type of effectsize, a larger absolute value always indicates a stronger effect. Effect sizes complement statistical hypothesis testing, and play an important role in statistical power analyses, sample size planning, and in metaanalyses. Especially in metaanalysis, where the purpose is to combine multiple effectsizes, the standard error of effectsize is of critical importance. The S.E. of effectsize is used to weight effectsizes when combining studies, so that large studies are considered more important than small studies in the analysis. The S.E. of effectsize is calculated differently for each type of effectsize, but generally only requires knowing the study’s sample size (N), or the number of observations in each group (n’s). Reporting effect sizes is considered good practice when presenting empirical research findings in many fields. The reporting of effect sizes facilitates the interpretation of the substantive, as opposed to the statistical, significance of a research result. Effect sizes are particularly prominent in social and medical research. Relative and absolute measures of effect size convey different information, and can be used complementarily. 
Effective Applications of the R Language (EARL) 
EARL is a Conference for users and developers of the open source R programming language. The primary focus of the Conference will be the commercial usage of R across a range of industry sectors with the aim of sharing knowledge and applications of the language. The EARL Conference Team is located at Mango Solutions, a data analysis company headquartered in the UK. 
Efficient Convolutional Network for Online Video Understanding (ECO) 
The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. (2) While there are local methods with fast perframe processing, the processing of the whole video is not efficient and hampers fast video retrieval or online classification of longterm activities. In this paper, we introduce a network architecture that takes longterm content into account and enables fast pervideo processing at the same time. The architecture is based on merging longterm content already in the network rather than in a posthoc fusion. Together with a sampling strategy, which exploits that neighboring frames are largely redundant, this yields highquality action classification and video captioning at up to 230 videos per second, where each video can consist of a few hundred frames. The approach achieves competitive performance across all datasets while being 10x to 80x faster than stateoftheart methods. 
Efficient Neural Architecture Search (ENAS) 
We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPUhours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new stateoftheart among all methods without posttraining processing. On the CIFAR10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%. 
Efficient Unitary Neural Network (EUNN) 
Using unitary (instead of general) matrices in artificial neural networks (ANNs) is a promising way to solve the gradient explosion/vanishing problem, as well as to enable ANNs to learn longterm correlations in the data. This approach appears particularly promising for Recurrent Neural Networks (RNNs). In this work, we present a new architecture for implementing an Efficient Unitary Neural Network (EUNNs); its main advantages can be summarized as follows. Firstly, the representation capacity of the unitary space in an EUNN is fully tunable, ranging from a subspace of SU(N) to the entire unitary space. Secondly, the computational complexity for training an EUNN is merely O(1) per parameter. Finally, we test the performance of EUNNs on the standard copying task, the pixelpermuted MNIST digit recognition benchmark as well as the Speech Prediction Test (TIMIT). We find that our architecture significantly outperforms both other stateoftheart unitary RNNs and the LSTM architecture, in terms of the final performance and/or the wallclock training speed. EUNNs are thus promising alternatives to RNNs and LSTMs for a wide variety of applications. 
EffNet  With the ever increasing application of Convolutional Neural Networks to costumer products the need emerges for models which can efficiently run on embedded, mobile hardware. Slimmer models have therefore become a hot research topic with multiple different approaches which vary from binary networks to revised convolution layers. We offer our contribution to the latter and propose a novel convolution block which significantly reduces the computational burden while surpassing the current stateoftheart. Our model, dubbed EffNet, is optimised for models which are slim to begin with and is created to tackle issues in existing models such as MobileNet and ShuffleNet. 
Egocentric Spatial Memory (ESM) 
Egocentric Spatial Memory 
EgoCoder  Programming has been an important skill for researchers and practitioners in computer science and other related areas. To learn basic programing skills, a longtime systematic training is usually required for beginners. According to a recent market report, the computer software market is expected to continue expanding at an accelerating speed, but the market supply of qualified software developers can hardly meet such a huge demand. In recent years, the surge of text generation research works provides the opportunities to address such a dilemma through automatic program synthesis. In this paper, we propose to make our try to solve the program synthesis problem from a data mining perspective. To address the problem, a novel generative model, namely EgoCoder, will be introduced in this paper. EgoCoder effectively parses program code into abstract syntax trees (ASTs), where the tree nodes will contain the program code/comment content and the tree structure can capture the program logic flows. Based on a new unit model called Hsu, EgoCoder can effectively capture both the hierarchical and sequential patterns in the program ASTs. Extensive experiments will be done to compare EgoCoder with the stateoftheart text generation methods, and the experimental results have demonstrated the effectiveness of EgoCoder in addressing the program synthesis problem. 
EgoNet  EgoNet (Egocentric Network Study Software) for the collection and analysis of egocentric social network data. It helps the user to collect and analyse all the egocentric network data (all social network data of a website on the Internet), and provide general global network measures and data matrixes that can be used for further analysis by other software. The egonet is the result of the links that it gives and receives certain address on the Internet, and EgoNet is dedicated to collecting information about them and present it in a way useful to the users. Egonet is written in Java, so that the computer where it is going to be used must have the JRE installed. EgoNet is open source software, licensed under GPL. Anomaly detection in static networks using egonets 
Ehlers’s Autocorrelation Periodogram  The point of the Ehlers Autocorrelation Periodogram is to dynamically set a period between a minimum and a maximum period length. While I leave the exact explanation of the mechanic to Dr. Ehlers’s book, for all practical intents and purposes, in my opinion, the punchline of this method is to attempt to remove a massive source of overfitting from trading system creationnamely specifying a lookback period. 
Eigen  Eigen is a highlevel C++ library of template headers for linear algebra, matrix and vector operations, numerical solvers and related algorithms. Eigen is an open source library licensed under MPL2 starting from version 3.1.1. Earlier versions were licensed under LGPL3+. Eigen is often noted for its elegant API, versatile fixed and dynamic matrix capabilities and a range of dense and sparse solvers. To achieve high performance, Eigen utilizes explicit vectorization for the SSE 2/3/4, ARM NEON, and AltiVec instruction sets. RcppEigen 
Eigenface  Eigenfaces is the name given to a set of eigenvectors when they are used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby (1987) and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the highdimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set. 
EigenNetwork  In many applications, the interdependencies among a set of $N$ time series ${ x_{nk}, k>0 }_{n=1}^{N}$ are well captured by a graph or network $G$. The network itself may change over time as well (i.e., as $G_k$). We expect the network changes to be at a much slower rate than that of the time series. This paper introduces eigennetworks, networks that are building blocks to compose the actual networks $G_k$ capturing the dependencies among the time series. These eigennetworks can be estimated by first learning the time series of graphs $G_k$ from the data, followed by a Principal Network Analysis procedure. Algorithms for learning both the original time series of graphs and the eigennetworks are presented and discussed. Experiments on simulated and real time series data demonstrate the performance of the learning and the interpretation of the eigennetworks. 
EigenPro 2.0  In recent years machine learning methods that nearly interpolate the data have achieved remarkable success. In many settings achieving nearzero training error leads to excellent test results. In this work we show how the mathematical and conceptual simplicity of interpolation can be harnessed to construct a framework for very efficient, scalable and accurate kernel machines. Our main innovation is in constructing kernel machines that output solutions mathematically equivalent to those obtained using standard kernels, yet capable of fully utilizing the available computing power of a parallel computational resource, such as GPU. Such utilization is key to strong performance since much of the computational resource capability is wasted by the standard iterative methods. The computational resource and data adaptivity of our learned kernels is based on theoretical convergence bounds. The resulting algorithm, which we call EigenPro 2.0, is accurate, principled and very fast. For example, using a single GPU, training on ImageNet with $1.3\times 10^6$ data points and $1000$ labels takes under an hour, while smaller datasets, such as MNIST, take seconds. Moreover, as the parameters are chosen analytically, based on the theory, little tuning beyond selecting the kernel and kernel parameter is needed, further facilitating the practical use of these methods. 
EigenRec  Sparsity presents one of the major challenges of Collaborative Filtering. Graphbased methods are known to alleviate its effects, however their use is often computationally prohibitive; LatentFactor methods, on the other hand, present a reasonable and viable alternative. In this paper, we introduce EigenRec; a versatile and efficient LatentFactor framework for TopN Recommendations, that generalizes the wellknown PureSVD algorithm (a) providing intuition about its inner structure, (b) paving the path towards improving its efficacy and, at the same time, (c) reducing its complexity. One of our central goals in this work is to ensure the applicability of our method in realistic bigdata scenarios. To this end, we propose building our model using a computationally efficient Lanczosbased procedure, we discuss its Parallel Implementation in distributed computing environments, and we verify its favourable performance using realworld datasets. Furthermore, from a qualitative point of view, a comprehensive set of experiments on the MovieLens and the Yahoo!R2Music datasets based on widely applied performance metrics, indicate that EigenRec outperforms several stateoftheart algorithms, in terms of Standard and LongTail recommendation accuracy, exhibiting low susceptibility to sparsity, even in its most extreme manifestations the ColdStart problems. 
Eigenvalues, Eigenvectors  An eigenvector of a square matrix is a nonzero vector that, when the matrix is multiplied by , yields a constant multiple of , the multiplier being commonly denoted by d. That is Av = dv. The number d is called the eigenvalue of A corresponding to v. 
Eikosogram  Eikosograms provide a nice visual representation of statistical correlation, because when the two variables are independent, then the value of one, say X, doesn’t affect the probability of the second, Y. This visually translates into a horizontal pattern, which easily contrasts with a staircase shape that occurs when the variables are dependent or correlated. http://…/eikosograms 
Elapsed Time based Dynamic Passes Combinedcounting (ETDPC) 
➘ “Variable Size based Fixed Passes Combinedcounting” 
Elastic Functional Principal Component Regression  We study regression using functional predictors in situations where these functions contain both phase and amplitude variability. In other words, the functions are misaligned due to errors in time measurements, and these errors can significantly degrade both model estimation and prediction performance. The current techniques either ignore the phase variability, or handle it via preprocessing, i.e., use an offtheshelf technique for functional alignment and phase removal. We develop a functional principal component regression model which has comprehensive approach in handling phase and amplitude variability. The model utilizes a mathematical representation of the data known as the squareroot slope function. These functions preserve the $\mathbf{L}^2$ norm under warping and are ideally suited for simultaneous estimation of regression and warping parameters. Using both simulated and realworld data sets, we demonstrate our approach and evaluate its prediction performance relative to current models. In addition, we propose an extension to functional logistic and multinomial logistic regression 
Elastic Net Regularization  In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods. 
Elastic Neural Network  We propose a new framework for image classification with deep neural networks. The framework introduces intermediate outputs to the computational graph of a network. This enables flexible control of the computational load and balances the tradeoff between accuracy and execution time. Moreover, we present an interesting finding that the intermediate outputs can act as a regularizer at training time, improving the prediction accuracy. In the experimental section we demonstrate the performance of our proposed framework with various commonly used pretrained deep networks in the use case of apparent age estimation. 
Elasticsearch  Elasticsearch is a search server based on Lucene. It provides a distributed, multitenantcapable fulltext search engine with a RESTful web interface and schemafree JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine. 
Elasticsearch, Logstash and Kibana (ELK Stack) 
ELK stands for Elasticsearch, Logstash and Kibana. Brief definitions: Logstash: It is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs. It is fully free and fully open source. Elasticsearch: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenantcapable fulltext search engine with a RESTful web interface and schemafree JSON documents. Kibana: A nifty tool to visualize logs and timestamped data. 
ElementwiseAttention Gate (EleAttG) 
Recurrent neural networks (RNNs) are capable of modeling the temporal dynamics of complex sequential information. However, the structures of existing RNN neurons mainly focus on controlling the contributions of current and historical information but do not explore the different importance levels of different elements in an input vector of a time slot. We propose adding a simple yet effective ElementwiseAttention Gate (EleAttG) to an RNN block (e.g., all RNN neurons in a network layer) that empowers the RNN neurons to have the attentiveness capability. For an RNN block, an EleAttG is added to adaptively modulate the input by assigning different levels of importance, i.e., attention, to each element/dimension of the input. We refer to an RNN block equipped with an EleAttG as an EleAttRNN block. Specifically, the modulation of the input is content adaptive and is performed at fine granularity, being elementwise rather than inputwise. The proposed EleAttG, as an additional fundamental unit, is general and can be applied to any RNN structures, e.g., standard RNN, Long ShortTerm Memory (LSTM), or Gated Recurrent Unit (GRU). We demonstrate the effectiveness of the proposed EleAttRNN by applying it to the action recognition tasks on both 3D human skeleton data and RGB videos. Experiments show that adding attentiveness through EleAttGs to RNN blocks significantly boosts the power of RNNs. 
Eligibility Traces  Eligibility traces are one of the basic mechanisms of reinforcement learning. For example, in the popular TD(lambda) algorithm, the lambda refers to the use of an eligibility trace. Almost any temporaldifference (TD) method, such as Qlearning or Sarsa, can be combined with eligibility traces to obtain a more general method that may learn more efficiently. There are two ways to view eligibility traces. The more theoretical view, which we emphasize here, is that they are a bridge from TD to Monte Carlo methods. When TD methods are augmented with eligibility traces, they produce a family of methods spanning a spectrum that has Monte Carlo methods at one end and onestep TD methods at the other. In between are intermediate methods that are often better than either extreme method. In this sense eligibility traces unify TD and Monte Carlo methods in a valuable and revealing way. The other way to view eligibility traces is more mechanistic. From this perspective, an eligibility trace is a temporary record of the occurrence of an event, such as the visiting of a state or the taking of an action. The trace marks the memory parameters associated with the event as eligible for undergoing learning changes. When a TD error occurs, only the eligible states or actions are assigned credit or blame for the error. Thus, eligibility traces help bridge the gap between events and training information. Like TD methods themselves, eligibility traces are a basic mechanism for temporal credit assignment. 
ELimination Et Choix Traduisant la REalité (ELECTRE) 
ELECTRE is a family of multicriteria decision analysis methods that originated in Europe in the mid1960s. The acronym ELECTRE stands for: ELimination Et Choix Traduisant la REalité (ELimination and Choice Expressing REality). The method was first proposed by Bernard Roy and his colleagues at SEMA consultancy company. A team at SEMA was working on the concrete, multiple criteria, realworld problem of how firms could decide on new activities and had encountered problems using a weighted sum technique. Bernard Roy was called in as a consultant and the group devised the ELECTRE method. As it was first applied in 1965, the ELECTRE method was to choose the best action(s) from a given set of actions, but it was soon applied to three main problems: choosing, ranking and sorting. The method became more widely known when a paper by B. Roy appeared in a French operations research journal. It evolved into ELECTRE I (electre one) and the evolutions have continued with ELECTRE II, ELECTRE III, ELECTRE IV, ELECTRE IS and ELECTRE TRI (electre tree), to mention a few. Bernard Roy is widely recognized as the father of the ELECTRE method, which was one of the earliest approaches in what is sometimes known as the French School of decision making. It is usually classified as an “outranking method” of decision making. There are two main parts to an ELECTRE application: first, the construction of one or several outranking relations, which aims at comparing in a comprehensive way each pair of actions; second, an exploitation procedure that elaborates on the recommendations obtained in the first phase. The nature of the recommendation depends on the problem being addressed: choosing, ranking or sorting. Usually the Electre Methods are used to discard some alternatives to the problem, which are unacceptable. After that we can use another MCDA to select the best one. The Advantage of using the Electre Methods before is that we can apply another MCDA with a restricted set of alternatives saving much time. Criteria in ELECTRE methods have two distinct sets of parameters: the importance coefficients and the veto thresholds. OutrankingTools 
ELiSH  Deep Neural Networks have been shown to be beneficial for a variety of tasks, in particular allowing for endtoend learning and reducing the requirement for manual design decisions. However, still many parameters have to be chosen in advance, also raising the need to optimize them. One important, but often ignored system parameter is the selection of a proper activation function. Thus, in this paper we target to demonstrate the importance of activation functions in general and show that for different tasks different activation functions might be meaningful. To avoid the manual design or selection of activation functions, we build on the idea of genetic algorithms to learn the best activation function for a given task. In addition, we introduce two new activation functions, ELiSH and HardELiSH, which can easily be incorporated in our framework. In this way, we demonstrate for three different image classification benchmarks that different activation functions are learned, also showing improved results compared to typically used baselines. 
Elite Based Guided Local Search (EBGLS) 
Local search is a basic building block in memetic algorithms. Guided Local Search (GLS) can improve the efficiency of local search. By changing the guide function, GLS guides a local search to escape from locally optimal solutions and find better solutions. The key component of GLS is its penalizing mechanism which determines which feature is selected to penalize when the search is trapped in a locally optimal solution. The original GLS penalizing mechanism only makes use of the cost and the current penalty value of each feature. It is well known that many combinatorial optimization problems have a big valley structure, i.e., the better a solution is, the more the chance it is closer to a globally optimal solution. This paper proposes to use big valley structure assumption to improve the GLS penalizing mechanism. An improved GLS algorithm called Elite Biased GLS (EBGLS) is proposed. EBGLS records and maintains an elite solution as an estimate of the globally optimal solutions, and reduces the chance of penalizing the features in this solution. We have systematically tested the proposed algorithm on the symmetric traveling salesman problem. Experimental results show that EBGLS is significantly better than GLS. ➘ “Guided Local Search” 
Ellipsoid Method for Linear Programming  In this paper, ellipsoid method for linear programming is derived using only minimal knowledge of algebra and matrices. Unfortunately, most authors first describe the algorithm, then later prove its correctness, which requires a good knowledge of linear algebra. 
ELM with Local Connections (ELMLC) 
This paper is concerned with the sparsification of the inputhidden weights of ELM (Extreme Learning Machine). For ordinary feedforward neural networks, the sparsification is usually done by introducing certain regularization technique into the learning process of the network. But this strategy can not be applied for ELM, since the inputhidden weights of ELM are supposed to be randomly chosen rather than to be learned. To this end, we propose a modified ELM, called ELMLC (ELM with local connections), which is designed for the sparsification of the inputhidden weights as follows: The hidden nodes and the input nodes are divided respectively into several corresponding groups, and an input node group is fully connected with its corresponding hidden node group, but is not connected with any other hidden node group. As in the usual ELM, the hiddeninput weights are randomly given, and the hiddenoutput weights are obtained through a least square learning. In the numerical simulations on some benchmark problems, the new ELMCL behaves better than the traditional ELM. 
Elo Rating System  The Elo rating system is a method for calculating the relative skill levels of players in competitorversuscompetitor games such as chess. It is named after its creator Arpad Elo, a Hungarianborn American physics professor. The Elo system was invented as an improved chess rating system and is also used in many other games. It has also been adapted for use as a rating system for multiplayer competition in a number of video games, and has been adapted to team sports including soccer (association football), American college football, basketball, Major League Baseball, competitive programming, and ESports. The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other multiple times are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent’s is expected to win 64% of the time; if the difference is 200 points, then the expected win proportion for the stronger player is 76%. 
EMAML  ➘ “Krazy World” 
EmbeddedGraph  In this paper, we propose a new type of graph, denoted as ’embeddedgraph’, and its theory, which employs a distributed representation to describe the relations on the graph edges. Embeddedgraphs can express linguistic and complicated relations, which cannot be expressed by the existing edgegraphs or weightedgraphs. We introduce the mathematical definition of embeddedgraph, translation, edge distance, and graph similarity. We can transform an embeddedgraph into a weightedgraph and a weightedgraph into an edgegraph by the translation method and by threshold calculation, respectively. The edge distance of an embeddedgraph is a distance based on the components of a target vector, and it is calculated through cosine similarity with the target vector. The graph similarity is obtained considering the relations with linguistic complexity. In addition, we provide some examples and data structures for embeddedgraphs in this paper. 
EmbNum  Among the fundamental questions in computer science, at least two have a deep impact on mathematics. What can computation compute How many steps does a computation require to solve an instance of the 3SAT problem Our work addresses the first question, by introducing a new model called the xmachine. The xmachine executes Turing machine instructions and two special types of instructions. Quantum random instructions are physically realizable with a quantum random number generator. Meta instructions can add new states and add new instructions to the xmachine. A countable set of xmachines is constructed, each with a finite number of states and instructions; each xmachine can compute a Turing incomputable language, whenever the quantum randomness measurements behave like unbiased Bernoulli trials. In 1936, Alan Turing posed the halting problem for Turing machines and proved that this problem is unsolvable for Turing machines. Consider an enumeration E_a(i) = (M_i, T_i) of all Turing machines M_i and initial tapes T_i. Does there exist an xmachine X that has at least one evolutionary path X –> X_1 –> X_2 –> . . . –> X_m, so at the mth stage xmachine X_m can correctly determine for 0 <= i <= m whether M_i’s execution on tape T_i eventually halts We demonstrate an xmachine Q(x) that has one such evolutionary path. The existence of this evolutionary path suggests that David Hilbert was not misguided to propose in 1900 that mathematicians search for finite processes to help construct mathematical proofs. Our refinement is that we cannot use a fixed computer program that behaves according to a fixed set of mechanical rules. We must pursue methods that exploit randomness and selfmodification so that the complexity of the program can increase as it computes. 
EMLNET  In this work, we apply stateoftheart Convolutional Neural Network(CNN) architectures for saliency prediction. Our results show that better saliency features can be delivered by a deeper CNN model. However, it is very spaceconsuming to apply a complex model due to the large size of input images. The space complexity becomes even more problematic when we extract features from multiple convolutional layers or different models. In this paper, we propose a modular saliency system which aims at splitting the whole network into small modules. The main difference in our approach s that the encoder and decoder can be separately trained for the scalability. Furthermore, the encoder can contain more than one CNN model to extract features and the models can have different architectures or pretrained on different datasets. This parallel design allows us to better utilize the computational space in order to apply more powerful encoder. More importantly, our network can be easily expanded almost without extra spaces, other pretrained CNN models can be combined for a wider range of visual knowledge. We denote our expandable multilayer network as EMLNET in this paper. Our method is evaluated on three public saliency benchmarks, SALICON, MIT300 and CAT2000. The proposed EMLNET achieves stateoftheart results on the metric of Normalized Scanpath Saliency using a modified loss function. 
Emotional Chatting Machine (EMC) 
Emotional intelligence is one of the key factors to the success of dialogue systems or conversational agents. In this paper, we propose Emotional Chatting Machine (ECM) which generates responses that are appropriate not only at the content level (relevant and grammatical) but also at the emotion level (consistent emotional expression). To the best of our knowledge, this is the first work that addresses the emotion factor in largescale conversation generation. ECM addresses the factor in three ways: modeling highlevel abstraction of emotion expression by embedding emotion categories, changing of implicit internal emotion states, and using explicit emotion expressions with an external emotion vocabulary. Experiments show that our model can generate responses appropriate not only in content but also in emotion. 
Emotional Word Vector (EVEC) 
It is important for machines to interpret human emotions properly for better humanmachine communications, as emotion is an essential part of humantohuman communications. One aspect of emotion is reflected in the language we use. How to represent emotions in texts is a challenge in natural language processing (NLP). Although continuous vector representations like word2vec have become the new norm for NLP problems, their limitations are that they do not take emotions into consideration and can unintentionally contain bias toward certain identities like different genders. This thesis focuses on improving existing representations in both word and sentence levels by explicitly taking emotions inside text and model bias into account in their training process. Our improved representations can help to build more robust machine learning models for affectrelated text classification like sentiment/emotion analysis and abusive language detection. We first propose representations called emotional word vectors (EVEC), which is learned from a convolutional neural network model with an emotionlabeled corpus, which is constructed using hashtags. Secondly, we extend to learning sentencelevel representations with a huge corpus of texts with the pseudo task of recognizing emojis. Our results show that, with the representations trained from millions of tweets with weakly supervised labels such as hashtags and emojis, we can solve sentiment/emotion analysis tasks more effectively. Lastly, as examples of model bias in representations of existing approaches, we explore a specific problem of automatic detection of abusive language. We address the issue of gender bias in various neural network models by conducting experiments to measure and reduce those biases in the representations in order to build more robust classification models. 
Emphatic TemporalDifference Learning Algorithm (ETD) 
In this paper we present the first empirical study of the emphatic temporaldifference learning algorithm (ETD), comparing it with conventional temporaldifference learning, in particular, with linear TD(0), on onpolicy and offpolicy variations of the Mountain Car problem. The initial motivation for developing ETD was that it has good convergence properties under \emph{off}policy training (Sutton, Mahmood \& White 2016), but it is also a new algorithm for the \emph{on}policy case. In both our onpolicy and offpolicy experiments, we found that each method converged to a characteristic asymptotic level of error, with ETD better than TD(0). TD(0) achieved a still lower error level temporarily before falling back to its higher asymptote, whereas ETD never showed this kind of ‘bounce’. In the offpolicy case (in which TD(0) is not guaranteed to converge), ETD was significantly slower. 
Empirical Bayes Geometric Mean (EBGM) 
Adjusted estimate for the relative reporting ratio. Example: if EBGM=3.9 for acetaminophenhepatic failure, then this drugevent combination occurred in the data 3.9 times more frequently than expected under the assumption of no association between the drug and the event. openEBGM 
Empirical Bayes Matrix Factorization (EBMF) 
Matrix factorization methods – including Factor analysis (FA), and Principal Components Analysis (PCA) – are widely used for inferring and summarizing structure in multivariate data. Many matrix factorization methods exist, corresponding to different assumptions on the elements of the underlying matrix factors. For example, many recent methods use a penalty or prior distribution to achieve sparse representations (‘Sparse FA/PCA’). Here we introduce a general Empirical Bayes approach to matrix factorization (EBMF), whose key feature is that it uses the observed data to estimate prior distributions on matrix elements. We derive a correspondinglygeneral variational fitting algorithm, which reduces fitting EBMF to solving a simpler problem – the socalled ‘normal means’ problem. We implement this general algorithm, but focus particular attention on the use of sparsityinducing priors that are unimodal at 0. This yields a sparse EBMF approach – essentially a version of sparse FA/PCA – that automatically adapts the amount of sparsity to the data. We demonstrate the benefits of our approach through both numerical comparisons with competing methods and through analysis of data from the GTEx (Genotype Tissue Expression) project on genetic associations across 44 human tissues. In numerical comparisons EBMF often provides more accurate inferences than other methods. In the GTEx data, EBMF identifies interpretable structure that concords with known relationships among human tissues. Software implementing our approach is available at https://…/flashr. 
Empirical Equilibrium  We introduce empirical equilibrium, the prediction in a game that selects the Nash equilibria that can be approximated by a sequence of payoffmonotone distributions, a welldocumented proxy for empirically plausible behavior. Then, we reevaluate implementation theory based on this equilibrium concept. We show that in a partnership dissolution environment with complete information, two popular auctions that are essentially equivalent for the Nash equilibrium prediction, can be expected to differ in fundamental ways when they are operated. Besides the direct policy implications, two general consequences follow. First, a mechanism designer may not be constrained by typical invariance properties. Second, a mechanism designer who does not account for the empirical plausibility of equilibria may inadvertently design implicitly biased mechanisms. 
Empirical Likelihood (EL) 
Empirical likelihood (EL) is an estimation method in statistics. Empirical likelihood estimates require few assumptions about the error distribution compared to similar methods like maximum likelihood. EL can handle data well as long as it is independent and identically distributed (iid). EL performs well even when the distribution is asymmetric or censored. EL methods are also useful since they can easily incorporate constraints and prior information. Art Owen pioneered work in this area with his 1988 paper. 
Empirical Orthogonal Function Analysis (EOF) 
In statistics, EOF analysis is known as Principal Component Analysis (PCA). As such, EOF analysis is sometimes classified as a multivariate statistical technique. 
Empirical Orthogonal Teleconnections (EOT) 
Calculating functions empirically and orthogonally from a given spacetime dataset. The method is rooted in multiple linear regression and yields solutions that are orthogonal in one direction, either space or time. remote 
Empirical Requirements Research Classifier (ERRC) 
Research must be reproducible in order to make an impact on science and to contribute to the body of knowledge in our field. Yet studies have shown that 70% of research from academic labs cannot be reproduced. In software engineering, and more specifically requirements engineering (RE), reproducible research is rare, with datasets not always available or methods not fully described. This lack of reproducible research hinders progress, with researchers having to replicate an experiment from scratch. A researcher starting out in RE has to sift through conference papers, finding ones that are empirical, then must look through the data available from the empirical paper (if any) to make a preliminary determination if the paper can be reproduced. This paper addresses two parts of that problem, identifying RE papers and identifying empirical papers within the RE papers. Recent RE and empirical conference papers were used to learn features and to build an automatic classifier to identify RE and empirical papers. We introduce the Empirical Requirements Research Classifier (ERRC) method, which uses natural language processing and machine learning to perform supervised classification of conference papers. We compare our method to a baseline keywordbased approach. To evaluate our approach, we examine sets of papers from the IEEE Requirements Engineering conference and the IEEE International Symposium on Software Testing and Analysis. We found that the ERRC method performed better than the baseline method in all but a few cases. 
Enclosure Diagram  The enclosure diagram is also space filling, using containment rather than adjacency to represent the hierarchy. Introduced by Ben Shneiderman in 1991, a treemap recursively subdivides area into rectangles. As with adjacency diagrams, the size of any node in the tree is quickly revealed. 
Encoder Based Lifelong Learning  This paper introduces a new lifelong learning solution where a single model is trained for a sequence of tasks. The main challenge that vision systems face in this context is catastrophic forgetting: as they tend to adapt to the most recently seen task, they lose performance on the tasks that were learned previously. Our method aims at preserving the knowledge of the previous tasks while learning a new one by using autoencoders. For each task, an undercomplete autoencoder is learned, capturing the features that are crucial for its achievement. When a new task is presented to the system, we prevent the reconstructions of the features with these autoencoders from changing, which has the effect of preserving the information on which the previous tasks are mainly relying. At the same time, the features are given space to adjust to the most recent environment as only their projection into a low dimension submanifold is controlled. The proposed system is evaluated on image classification tasks and shows a reduction of forgetting over the stateoftheart 
Encoder CFGDecoder  Semantic parsing can be defined as the process of mapping natural language sentences into a machine interpretable, formal representation of its meaning. Semantic parsing using LSTM encoderdecoder neural networks have become promising approach. However, human automated translation of natural language does not provide grammaticality guarantees for the sentences generate such a guarantee is particularly important for practical cases where a data base query can cause critical errors if the sentence is ungrammatical. In this work, we propose an neural architecture called Encoder CFGDecoder, whose output conforms to a given contextfree grammar. Results are show for any implementation of such architecture display its correctness and providing benchmark accuracy levels better than the literature. 
Encog  Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported. Most Encog training algoritms are multithreaded and scale well to multicore hardware. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms. Encog has been in active development since 2008. Encog: Library of Interchangeable Machine Learning Models for Java and C# 
End of Potential Line (EOPL) 
We introduce the problem EndOfPotentialLine and the corresponding complexity class EOPL of all problems that can be reduced to it in polynomial time. This class captures problems that admit a single combinatorial proof of their joint membership in the complexity classes PPAD of fixpoint problems and PLS of local search problems. EOPL is a combinatoriallydefined alternative to the class CLS (for Continuous Local Search), which was introduced in with the goal of capturing the complexity of some wellknown problems in PPAD $\cap$ PLS that have resisted, in some cases for decades, attempts to put them in polynomial time. Two of these are Contraction, the problem of finding a fixpoint of a contraction map, and PLCP, the problem of solving a Pmatrix Linear Complementarity Problem. We show that EndOfPotentialLine is in CLS via a twoway reduction to EndOfMeteredLine. The latter was defined in to show query and cryptographic lower bounds for CLS. Our two main results are to show that both PLContraction (PiecewiseLinear Contraction) and PLCP are in EOPL. Our reductions imply that the promise versions of PLContraction and PLCP are in the promise class UniqueEOPL, which corresponds to the case of a single potential line. This also shows that simplestochastic, discounted, meanpayoff, and parity games are in EOPL. Using the insights from our reduction for PLContraction, we obtain the first polynomialtime algorithms for finding fixed points of contraction maps in fixed dimension for any $\ell_p$ norm, where previously such algorithms were only known for the $\ell_2$ and $\ell_\infty$ norms. Our reduction from PLCP to EndOfPotentialLine allows a technique of Aldous to be applied, which in turn gives the fastestknown randomized algorithm for the PLCP. 
Endogenous Variable  In a statistical model, a parameter or variable is said to be endogenous when there is a correlation between the parameter or variable and the error term. Endogeneity can arise as a result of measurement error, autoregression with autocorrelated errors, simultaneity and omitted variables. Broadly, a loop of causality between the independent and dependent variables of a model leads to endogeneity. For example, in a simple supply and demand model, when predicting the quantity demanded in equilibrium, the price is endogenous because producers change their price in response to demand and consumers change their demand in response to price. In this case, the price variable is said to have total endogeneity once the demand and supply curves are known. In contrast, a change in consumer tastes or preferences would be an exogenous change on the demand curve. 
Energybased Exploration of Random Features (EERF) 
The randomizedfeature approach has been successfully employed in largescale kernel approximation and supervised learning. The distribution from which the random features are drawn impacts the number of features required to efficiently perform a learning task. Recently, it has been shown that employing datadependent randomization improves the performance in terms of the required number of random features. In this paper, we are concerned with the randomizedfeature approach in supervised learning for good generalizability. We propose the Energybased Exploration of Random Features (EERF) algorithm based on a datadependent score function that explores the set of possible features and exploits the promising regions. We prove that the proposed score function with high probability recovers the spectrum of the best fit within the model class. Our empirical results on several benchmark datasets further verify that our method requires smaller number of random features to achieve a certain generalization error compared to the stateoftheart while introducing negligible preprocessing overhead. EERF can be implemented in a few lines of code and requires no additional tuning parameters. 
EnergyNet  We present ENERGYNET , a new framework for analyzing and building artificial neural network architectures. Our approach adaptively learns the structure of the networks in an unsupervised manner. The methodology is based upon the theoretical guarantees of the energy function of restricted Boltzmann machines (RBM) of infinite number of nodes. We present experimental results to show that the final network adapts to the complexity of a given problem. 
ENet  The ability to perform pixelwise semantic segmentation in realtime is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point operations and have long runtimes that hinder their usability. In this paper, we propose a novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation. ENet is up to 18× faster, requires 75× less FLOPs, has 79× less parameters, and provides similar or better accuracy to existing models. We have tested it on CamVid, Cityscapes and SUN datasets and report on comparisons with existing stateoftheart methods, and the tradeoffs between accuracy and processing time of a network. We present performance measurements of the proposed architecture on embedded systems and suggest possible software improvements that could make ENet even faster. 
Enhanced Least Absolute Shrinkage Operator (ELASSO) 
elasso 
Ensemble Bayesian Optimization (EBO) 
Bayesian Optimization (BO) has been shown to be a very effective paradigm for tackling hard blackbox and nonconvex optimization problems encountered in Machine Learning. Despite these successes, the computational complexity of the underlying function approximation has restricted the use of BO to problems that can be handled with less than a few thousand function evaluations. Harder problems like those involving functions operating in very high dimensional spaces may require hundreds of thousands or millions of evaluations or more and become computationally intractable to handle using standard Bayesian Optimization methods. In this paper, we propose Ensemble Bayesian Optimization (EBO) to overcome this problem. Unlike conventional BO methods that operate on a single posterior GP model, EBO works with an ensemble of posterior GP models. Further, we represent each GP model using tile coding random features and an additive function structure. Our approach generates speedups by parallelizing the time consuming hyperparameter posterior inference and functional evaluations on hundreds of cores and aggregating the models in every iteration of BO. Our extensive experimental evaluation shows that EBO can speed up the posterior inference between 23 orders of magnitude (400 times in one experiment) compared to the stateoftheart by putting data into Mondrian bins without sacrificing the sample quality. We demonstrate the ability of EBO to handle sampleintensive hard optimization problems by applying it to a rover navigation problem with tens of thousands of observations. 
Ensemble Empirical Mode Decomposition (EEMD) 
This approach consists of sifting an ensemble of white noiseadded signal (data) and treats the mean as the final true result. Finite, not infinitesimal, amplitude white noise is necessary to force the ensemble to exhaust all possible solutions in the sifting process, thus making the different scale signals to collate in the proper intrinsic mode functions (IMF) dictated by the dyadic filter banks. As EEMD is a timespace analysis method, the added white noise is averaged out with sufficient number of trials; the only persistent part that survives the averaging process is the component of the signal (original data), which is then treated as the true and more physical meaningful answer. The effect of the added white noise is to provide a uniform reference frame in the timefrequency space; therefore, the added noise collates the portion of the signal of comparable scale in one IMF. With this ensemble mean, one can separate scales naturally without any a priori subjective criterion selection as in the intermittence test for the original EMD algorithm. This new approach utilizes the full advantage of the statistical characteristics of white noise to perturb the signal in its true solution neighborhood, and to cancel itself out after serving its purpose; therefore, it represents a substantial improvement over the original EMD and is a truly noiseassisted data analysis (NADA) method. Rlibeemd 
Ensemble Methods  In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble refers only to a concrete finite set of alternative models, but typically allows for much more flexible structure to exist between those alternatives. 
Ensemble Partial Least Squares Regression (EnPLS) 
enpls 
EnsembleDAgger  While imitation learning is often used in robotics, this approach often suffers from data mismatch and compounding errors. DAgger is an iterative algorithm that addresses these issues by aggregating training data from both the expert and novice policies, but does not consider the impact of safety. We present a probabilistic extension to DAgger, which attempts to quantify the confidence of the novice policy as a proxy for safety. Our method, EnsembleDAgger, approximates a GP using an ensemble of neural networks. Using the variance as a measure of confidence, we compute a decision rule that captures how much we doubt the novice, thus determining when it is safe to allow the novice to act. With this approach, we aim to maximize the novice’s share of actions, while constraining the probability of failure. We demonstrate improved safety and learning performance compared to other DAgger variants and classic imitation learning on an inverted pendulum and in the MuJoCo HalfCheetah environment. 
Enterprise Control Language / DataCentric Programming Language (ECL) 
ECL is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process big data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions. 
Enterprise Data Hub (EDH) 
Organizations everywhere are grappling with how to manage their growing big data sets from ERP and ecommerce systems, log files, sensor data, social media and more. Apache Hadoop provides a costeffective enterprise data hub (EDH) to store, transform, cleanse, filter, analyze and gain new value from all kinds of data. 
Enterprise Information Flow (EIF) 
What is Enterprise Information Flow? The concept is closely connected to its neighboring disciplines: Information Flow, Data Lineage Analysis and Metadata Management. But it’s not the same. This new buzzword is only beginning to be recognized, so let’s get a head start. Information Flow focuses on information processing when it comes to security, throughput optimization and transporters; Data Lineage Analysis studies the way data is transferred between systems; and Metadata Management is all about metadata structure and purpose. Why do we need a new concept then? · Big data is getting really big. From internal company systems, social networks and external data from partners to automatically collected data – a huge amount of information needs to be properly dealt with. The high volume of data is of course connected to the high volume of contributing sources: dozens of online channels, the previously mentioned social networks, portable devices, blogs, news and video content. And every source needs to be correctly described, attributed and integrated into the company’s Enterprise Information Flow. · Systems are getting more and more complicated. EIF needs to be ready not only for big data coming from a wide variety of channels, but also for the many different ways data is transformed and processed inside the system. Old school transformation methods like ETL and SQL scripts are easy and usually wellaccounted for, but cracks start to show when it comes to the semantic analysis of nonstructured data, Google’s search algorithms, Facebook’s preferential algorithms, automated quality assurance scripts or artificial intelligence methods used for predictive analysis. When it comes to transformations, it’s critical to know how security and other specific attributes change. Another key point is deciding if the information is created or just transformed. · New routes between systems. The number of different ways to transfer data between systems is rapidly growing. Classic ETL and extract transfers are joined by more complicated systems based on SOA, PBM and ESB. It’s also necessary to be ready for new approaches like Data Federation and Logical Data Warehouse, where data saving is not persistent. · Different data types. It’s not about relational data or text anymore. You need to be ready for NoSQL databases, hyperlinks, video, graphics, xml, semistructured data and other types of information. In complicated environments like these, current solutions fail. New approaches need to be more complex, as the new systems are. It’s necessary to follow data not only on a physical level, but also through more layers of logical abstraction. Let’s sum it up into two main angles of Enterprise Information Flow: 1) New information necessary for decision making appears. Where does it come from? When was it created? Who’s responsible for its quality? 2) Who uses my information and how? Those two sets of questions are vital to Enterprise Information Flow which is a standard part of Enterprise Information Management. Any organization who takes its data seriously is searching for answers anyway, but EIF can provide a more comprehensive overview and merge existing solutions from currently separated fields into one complex policy. A complex solution is precisely what you need, when you’re dealing with complex systems. 
Entity Neighbors  Knowledge Graph Embedding (KGE) aims to represent entities and relations of knowledge graph in a lowdimensional continuous vector space. Recent works focus on incorporating structural knowledge with additional information, such as entity descriptions, relation paths and so on. However, common used additional information usually contains plenty of noise, which makes it hard to learn valuable representation. In this paper, we propose a new kind of additional information, called entity neighbors, which contain both semantic and topological features about given entity. We then develop a deep memory network model to encode information from neighbors. Employing a gating mechanism, representations of structure and neighbors are integrated into a joint representation. The experimental results show that our model outperforms existing KGE methods utilizing entity descriptions and achieves stateoftheart metrics on 4 datasets. 
Entity Resolution (ER) 
Entity Resolution (ER), the problem of extracting, matching and resolving entity mentions in structured and unstructured data, is a longstanding challenge in database management, information retrieval, machine learning, natural language processing and statistics. Ironically, different subdisciplines refer to it by a variety of names, including record linkage, deduplication, coreference resolution, reference reconciliation, object consolidation, identity uncertainty and database hardening. Accurate and fast ER has huge practical implications in a wide variety of commercial, scientific and security domains. Despite the long history of work on ER there is still a surprising diversity of approaches – including rule based methods, pairwise classification, clustering approaches, and richer forms of probabilistic inference – and a lack of guiding theory. Meanwhile, in the age of big data, the need for high quality entity resolution is only growing. We are inundated with more and more data that needs to be integrated, aligned and matched before further utility can be extracted. 
Entity2Topic (E2T) 
A major proportion of a text summary includes important entities found in the original text. These entities build up the topic of the summary. Moreover, they hold commonsense information once they are linked to a knowledge base. Based on these observations, this paper investigates the usage of linked entities to guide the decoder of a neural text summarizer to generate concise and better summaries. To this end, we leverage on an offtheshelf entity linking system (ELS) to extract linked entities and propose Entity2Topic (E2T), a module easily attachable to a sequencetosequence model that transforms a list of entities into a vector representation of the topic of the summary. Current available ELS’s are still not sufficiently effective, possibly introducing unresolved ambiguities and irrelevant entities. We resolve the imperfections of the ELS by (a) encoding entities with selective disambiguation, and (b) pooling entity vectors using firm attention. By applying E2T to a simple sequencetosequence model with attention mechanism as base model, we see significant improvements of the performance in the Gigaword (sentence to title) and CNN (long document to multisentence highlights) summarization datasets by at least 2 ROUGE points. 
EntityDuet Neural Ranking Model (EDRM) 
This paper presents the EntityDuet Neural Ranking Model (EDRM), which introduces knowledge graphs to neural search systems. EDRM represents queries and documents by their words and entity annotations. The semantics from knowledge graphs are integrated in the distributed representations of their entities, while the ranking is conducted by interactionbased neural ranking networks. The two components are learned endtoend, making EDRM a natural combination of entityoriented search and neural information retrieval. Our experiments on a commercial search log demonstrate the effectiveness of EDRM. Our analyses reveal that knowledge graph semantics significantly improve the generalization ability of neural ranking models. 
Entropic Spectral Learning  We present a novel algorithm for learning the spectral density of large scale networks using stochastic trace estimation and the method of maximum entropy. The complexity of the algorithm is linear in the number of nonzero elements of the matrix, offering a computational advantage over other algorithms. We apply our algorithm to the problem of community detection in large networks. We show stateoftheart performance on both synthetic and real datasets. 
Entropy  In information theory, entropy is a measure of the uncertainty in a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message. Entropy is typically measured in bits, nats, or bans. Shannon entropy is the average unpredictability in a random variable, which is equivalent to its information content. Shannon entropy provides an absolute limit on the best possible lossless encoding or compression of any communication, assuming that the communication may be represented as a sequence of independent and identically distributed random variables. 
Entropy Agglomeration (EA) 
➘ “Entropy Agglomeration” 
Entropy cMeans (ECM) 
Fuzzy clustering methods identify naturally occurring clusters in a dataset, where the extent to which different clusters are overlapped can differ. Most methods have a parameter to fix the level of fuzziness. However, the appropriate level of fuzziness depends on the application at hand. This paper presents Entropy $c$Means (ECM), a method of fuzzy clustering that simultaneously optimizes two contradictory objective functions, resulting in the creation of fuzzy clusters with different levels of fuzziness. This allows ECM to identify clusters with different degrees of overlap. ECM optimizes the two objective functions using two multiobjective optimization methods, Nondominated Sorting Genetic Algorithm II (NSGAII), and Multiobjective Evolutionary Algorithm based on Decomposition (MOEA/D). We also propose a method to select a suitable tradeoff clustering from the Pareto front. Experiments on challenging synthetic datasets as well as realworld datasets show that ECM leads to better cluster detection compared to the conventional fuzzy clustering methods as well as previously used multiobjective methods for fuzzy clustering. 
Episodic Memory Deep QNetwork (EMDQN) 
Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interaction with the environments to obtain satisfactory performance. Recently, episodic memory based RL has attracted attention due to its ability to latch on good actions quickly. In this paper, we present a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep QNetworks (EMDQN), which leverages episodic memory to supervise an agent during training. Experiments show that our proposed method can lead to better sample efficiency and is more likely to find good policies. It only requires 1/5 of the interactions of DQN to achieve many stateoftheart performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms. 
EpsilonGreedy Algorithm  To get you started thinking algorithmically about the ExploreExploit dilemma, we’re going to teach you how to code up one of the simplest possible algorithms for trading off exploration and exploitation. This algorithm is called the epsilonGreedy algorithm. In computer science, a greedy algorithm is an algorithm that always takes whatever action seems best at the present moment, even when that decision might lead to bad long term consequences. The epsilonGreedy algorithm is almost a greedy algorithm because it generally exploits the best available option, but every once in a while the epsilonGreedy algorithm explores the other available options. As we’ll see, the term epsilon in the algorithm’s name refers to the odds that the algorithm explores instead of exploiting. Let’s be more specific. The epsilonGreedy algorithm works by randomly oscillating between Cynthia’s vision of purely randomized experimentation and Bob’s instinct to maximize profits. The epsilonGreedy algorithm is one of the easiest bandit algorithms to understand because it tries to be fair to the two opposite goals of exploration and exploitation by using a mechanism that even a little kid could understand: it just flips a coin. While there are a few details we’ll have to iron out to make that statement precise, the big idea behind the epsilonGreedy algorithm really is that simple: if you flip a coin and it comes up heads, you … 
epsilonResNet  A family of super deep networks, referred to as residual networks or ResNet, achieved recordbeating performance in various visual tasks such as image recognition, object detection, and semantic segmentation. The ability to train very deep networks naturally pushed the researchers to use enormous resources to achieve the best performance. Consequently, in many applications super deep residual networks were employed for just a marginal improvement in performance. In this paper, we propose $\epsilon$ResNet that allows us to automatically discard redundant layers, which produces responses that are smaller than a threshold $\epsilon$, without any loss in performance. The $\epsilon$ResNet architecture can be achieved using a few additional rectified linear units in the original ResNet. Our method does not use any additional variables nor numerous trials like other hyperparameter optimization techniques. The layer selection is achieved using a single training process and the evaluation is performed on CIFAR10, CIFAR100, SVHN, and ImageNet datasets. In some instances, we achieve about 80\% reduction in the number of parameters. 
EpsilonSubgradient Descent  Minimax optimization plays a key role in adversarial training of machine learning algorithms, such as learning generative models, domain adaptation, privacy preservation, and robust learning. In this paper, we demonstrate the failure of alternating gradient descent in minimax optimization problems due to the discontinuity of solutions of the inner maximization. To address this, we propose a new epsilonsubgradient descent algorithm that addresses this problem by simultaneously tracking K candidate solutions. Practically, the algorithm can find solutions that previous saddlepoint algorithms cannot find, with only a sublinear increase of complexity in K. We analyze the conditions under which the algorithm converges to the true solution in detail. A significant improvement in stability and convergence speed of the algorithm is observed in simple representative problems, GAN training, and domainadaptation problems. 
Equivalent Class Optimization (ECOpt) 
It has been widely observed that many activation functions and pooling methods of neural network models have (positive) rescalinginvariant property, including ReLU, PReLU, maxpooling, and average pooling, which makes fullyconnected neural networks (FNNs) and convolutional neural networks (CNNs) invariant to (positive) rescaling operation across layers. This may cause unneglectable problems with their optimization: (1) different NN models could be equivalent, but their gradients can be very different from each other; (2) it can be proven that the loss functions may have many spurious critical points in the redundant weight space. To tackle these problems, in this paper, we first characterize the rescalinginvariant properties of NN models using equivalent classes and prove that the dimension of the equivalent class space is significantly smaller than the dimension of the original weight space. Then we represent the loss function in the compact equivalent class space and develop novel algorithms that conduct optimization of the NN models directly in the equivalent class space. We call these algorithms Equivalent Class Optimization (abbreviated as ECOpt) algorithms. Moreover, we design efficient tricks to compute the gradients in the equivalent class, which almost have no extra computational complexity as compared to standard backpropagation (BP). We conducted experimental study to demonstrate the effectiveness of our proposed new optimization algorithms. In particular, we show that by using the idea of ECOpt, we can significantly improve the accuracy of the learned model (for both FNN and CNN), as compared to using conventional stochastic gradient descent algorithms. 
Erase Rectified Linear Unit (EraseReLU) 
For most stateoftheart architectures, Rectified Linear Unit (ReLU) becomes a standard component accompanied by each layer. Although ReLU can ease the network training to an extent, the character of blocking negative values may suppress the propagation of useful information and leads to the difficulty of optimizing very deep Convolutional Neural Networks (CNNs). Moreover, stacking of layers with nonlinear activations is hard to approximate the intrinsic linear transformations between feature representations. In this paper, we investigate the effect of erasing ReLUs of certain layers and apply it to various representative architectures. We name our approach as ‘EraseReLU’. It can ease the optimization and improve the generalization performance for very deep CNN models. In experiments, this method successfully improves the performance of various representative architectures, and we report the improved results on SVHN, CIFAR10/100, and ImageNet1k. By using EraseReLU, we achieve stateoftheart singlemodel performance on CIFAR100 with 83.47% accuracy. Codes will be released soon. 
ERL^2  ➘ “Krazy World” 
Error Correction Model (ECM) 
An error correction model belongs to a category of multiple time series models most commonly used for data where the underlying variables have a longrun stochastic trend, also known as cointegration. ECMs are a theoreticallydriven approach useful for estimating both shortterm and longterm effects of one time series on another. The term errorcorrection relates to the fact that lastperiods deviation from a longrun equilibrium, the error, influences its shortrun dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables. ecm 
Error Matrix  
ErrorRobust MultiView Clustering (EMVC) 
In the era of big data, data may come from multiple sources, known as multiview data. Multiview clustering aims at generating better clusters by exploiting complementary and consistent information from multiple views rather than relying on the individual view. Due to inevitable system errors caused by datacaptured sensors or others, the data in each view may be erroneous. Various types of errors behave differently and inconsistently in each view. More precisely, error could exhibit as noise and corruptions in reality. Unfortunately, none of the existing multiview clustering approaches handle all of these error types. Consequently, their clustering performance is dramatically degraded. In this paper, we propose a novel Markov chain method for ErrorRobust MultiView Clustering (EMVC). By decomposing each view into a shared transition probability matrix and error matrix and imposing structured sparsityinducing norms on error matrices, we characterize and handle typical types of errors explicitly. To solve the challenging optimization problem, we propose a new efficient algorithm based on Augmented Lagrangian Multipliers and prove its convergence rigorously. Experimental results on various synthetic and realworld datasets show the superiority of the proposed EMVC method over the baseline methods and its robustness against different types of errors. 
Escort  Deep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Although pruning can remove more than 80% of the weights, it actually hurts inference performance (speed) when running models on GPUs. Two major problems cause this unsatisfactory performance on GPUs. First, lowering convolution onto matrix multiplication reduces data reuse opportunities and wastes memory bandwidth. Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when running on massively parallel GPUs. To overcome these two limitations, we propose Escort, an efficient sparse convolutional neural networks on GPUs. Instead of using the lowering method, we choose to compute the sparse convolutions directly. We then orchestrate the parallelism and locality for the direct sparse convolution kernel, and apply customized optimization techniques to further improve performance. Evaluation on NVIDIA GPUs show that Escort can improve sparse convolution speed by 2.63x and 3.07x, and inference speed by 1.38x and 1.60x, compared to CUBLAS and CUSPARSE respectively. 
ESN Recurrent Autoencoder (ESNRAE) 
It is a widely accepted fact that data representations intervene noticeably in machine learning tools. The more they are well defined the better the performance results are. Feature extractionbased methods such as autoencoders are conceived for finding more accurate data representations from the original ones. They efficiently perform on a specific task in terms of 1) high accuracy, 2) large short term memory and 3) low execution time. Echo State Network (ESN) is a recent specific kind of Recurrent Neural Network which presents very rich dynamics thanks to its reservoirbased hidden layer. It is widely used in dealing with complex nonlinear problems and it has outperformed classical approaches in a number of tasks including regression, classification, etc. In this paper, the noticeable dynamism and the large memory provided by ESN and the strength of Autoencoders in feature extraction are gathered within an ESN Recurrent Autoencoder (ESNRAE). In order to bring up sturdier alternative to conventional reservoirbased networks, not only single layer basic ESN is used as an autoencoder, but also MultiLayer ESN (MLESNRAE). The new features, once extracted from ESN’s hidden layer, are applied to classification tasks. The classification rates rise considerably compared to those obtained when applying the original data features. An accuracybased comparison is performed between the proposed recurrent AEs and two variants of an ELM feedforward AEs (Basic and ML) in both of noise free and noisy environments. The empirical study reveals the main contribution of recurrent connections in improving the classification performance results. 
ESPnet  This paper introduces a new open source platform for endtoend speech processing named ESPnet. ESPnet mainly focuses on endtoend automatic speech recognition (ASR), and adopts widelyused dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks. 
Espresso  There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required for the forward propagation of CNNs, in a binary file less than 400KB, without any external dependencies. Although it is mainly designed to take advantage of massive GPU parallelism, Espresso also provides an equivalent CPU implementation for CNNs. Espresso provides special convolutional and dense layers for BCNNs, leveraging bitpacking and bitwise computations for efficient execution. These techniques provide a speedup of matrixmultiplication routines, and at the same time, reduce memory usage when storing parameters and activations. We experimentally show that Espresso is significantly faster than existing implementations of optimized binary neural networks ($\approx$ 2 orders of magnitude). Espresso is released under the Apache 2.0 license and is available at http://…/espresso. 
Esri  Esri´s GIS (geographic information systems) mapping software helps you understand and visualize data to make decisions based on the best information and analysis. 
Essence Vector Model (EV) 
In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations, our major contributions in this paper are twofold. First, we propose a novel unsupervised paragraph embedding method, named the essence vector (EV) model, which aims at not only distilling the most representative information from a paragraph but also excluding the general background information to produce a more informative lowdimensional vector representation for the paragraph. Second, in view of the increasing importance of spoken content processing, an extension of the EV model, named the denoising essence vector (DEV) model, is proposed. The DEV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition. 
Essential Histogram  The histogram is widely used as a simple, exploratory display of data, but it is usually not clear how to choose the number and size of bins for this purpose. We construct a confidence set of distribution functions that optimally address the two main tasks of the histogram: estimating probabilities and detecting features such as increases and (anti)modes in the distribution. We define the essential histogram as the histogram in the confidence set with the fewest bins. Thus the essential histogram is the simplest visualization of the data that optimally achieves the main tasks of the histogram. We provide a fast algorithm for computing a slightly relaxed version of the essential histogram, which still possesses most of its beneficial theoretical properties, and we illustrate our methodology with examples. An Rpackage is available online. 
Estimability  http://…/lenth.pdf http://…/ch06.pdf estimability 
Estimation of Distribution Algorithm (EDA) 
Estimation of distribution algorithms (EDAs), sometimes called probabilistic modelbuilding genetic algorithms (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding the uniform distribution over admissible solutions and ending with the model that generates only the global optima. EDAs belong to the class of evolutionary algorithms. The main difference between EDAs and most conventional evolutionary algorithms is that evolutionary algorithms generate new candidate solutions using an implicit distribution defined by one or more variation operators, whereas EDAs use an explicit probability distribution encoded by a Bayesian network, a multivariate normal distribution, or another model class. Similarly as other evolutionary algorithms, EDAs can be used to solve optimization problems defined over a number of representations from vectors to LISP style S expressions, and the quality of candidate solutions is often evaluated using one or more objective functions. LevelBased Analysis of the Univariate Marginal Distribution Algorithm 
Euclidean Distance  In mathematics, the Euclidean distance or Euclidean metric is the ‘ordinary’ distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space (or even any inner product space) becomes a metric space. The associated norm is called the Euclidean norm. 
EvE  ➘ “GENESYS” 
Even Initialization  In this paper, we propose a new weight initialization method called even initialization for wide and deep nonlinear neural networks with the ReLU activation function. We prove that no poor local minimum exists in the initial loss landscape in the wide and deep nonlinear neural network initialized by the even initialization method that we propose. Specifically, in the initial loss landscape of such a wide and deep ReLU neural network model, the following four statements hold true: 1) the loss function is nonconvex and nonconcave; 2) every local minimum is a global minimum; 3) every critical point that is not a global minimum is a saddle point; and 4) bad saddle points exist. We also show that the weight values initialized by the even initialization method are contained in those initialized by both of the (often used) standard initialization and He initialization methods. 
Event Driven Architeture (EDA) 
Eventdriven architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of, and reaction to events. An event can be defined as “a significant change in state”. For example, when a consumer purchases a car, the car’s state changes from “for sale” to “sold”. A car dealer’s system architecture may treat this state change as an event whose occurrence can be made known to other applications within the architecture. From a formal perspective, what is produced, published, propagated, detected or consumed is a (typically asynchronous) message called the event notification, and not the event itself, which is the state change that triggered the message emission. 
Event History Analysis  Event history analysis deals with data obtained by observing individuals over time, focusing on events occurring for the individuals under observation. Important applications are to life events of humans in demography, life insurance mathematics, epidemiology, and sociology. The basic data are the times of occurrence of the events and the types of events that occur. The standard approach to the analysis of such data is to use multistate models; a basic example is finitestate Markov processes in continuous time. Censoring and truncation are defining features of the area. This review comments specifically on three areas that are current subjects of active development, all motivated by demands from applications: sampling patterns, the possibility of causal interpretation of the analyses, and the levels and interpretation of variability. eha 
Event Schema Induction (ESI) 

Event Stream Processing (ESP) 
Event stream processing, or ESP, is a set of technologies designed to assist the construction of eventdriven information systems. ESP technologies include event visualization, event databases, eventdriven middleware, and event processing languages, or complex event processing (CEP). In practice, the terms ESP and CEP are often used interchangeably. ESP deals with the task of processing streams of event data with the goal of identifying the meaningful pattern within those streams, employing techniques such as detection of relationships between multiple events, event correlation, event hierarchies, and other aspects such as causality, membership and timing. ESP enables many different applications such as algorithmic trading in financial services, RFID event processing applications, fraud detection, process monitoring, and locationbased services in telecommunications. 
EventCentric Temporal Knowledge Graph (EventKG) 
One of the key requirements to facilitate semantic analytics of information regarding contemporary and historical events on the Web, in the news and in social media is the availability of reference knowledge repositories containing comprehensive representations of events and temporal relations. Existing knowledge graphs, with popular examples including DBpedia, YAGO and Wikidata, focus mostly on entitycentric information and are insufficient in terms of their coverage and completeness with respect to events and temporal relations. EventKG presented in this paper is a multilingual eventcentric temporal knowledge graph that addresses this gap. EventKG incorporates over 690 thousand contemporary and historical events and over 2.3 million temporal relations extracted from several largescale knowledge graphs and semistructured sources and makes them available through a canonical representation. 
EventKG+TL  The provision of multilingual eventcentric temporal knowledge graphs such as EventKG enables structured access to representations of a large number of historical and contemporary events in a variety of language contexts. Timelines provide an intuitive way to facilitate an overview of events related to a \textit{query entity} – i.e. an entity or an event of user interest – over a certain period of time. In this paper, we present \eventTL{} – a novel system that generates crosslingual event timelines using EventKG and facilitates an overview of the languagespecific event relevance and popularity along with the crosslingual differences. 
Evidence based Data Analysis  What I think the statistical community needs to invest time and energy into is what I call “evidencebased data analysis”. What do I mean by this? Most data analyses are not the simple classroom exercises that we’ve all done involving linear regression or twosample ttests. Most of the time, you have to obtain the data, clean that data, remove outliers, impute missing values, transform variables and on and on, even before you fit any sort of model. Then there’s model selection, model fitting, diagnostics, sensitivity analysis, and more. So a data analysis is really pipeline of operations where the output of one stage becomes the input of another. The basic idea behind evidencebased data analysis is that for each stage of that pipeline, we should be using the best method, justified by appropriate statistical research that provides evidence favoring one method over another. If we cannot reasonable agree on a best method for a given stage in the pipeline, then we have a gap that needs to be filled. So we fill it! 
Evidence Lower Bound (ELBO) 
(Section: 2.2 The Evidence Lower Bound) Filtering Variational Objectives ➘ “Filtering Variational Objectives” 
EvidenceDriven StateMerging (EDSM) 
Human in the Loop: Interactive Passive Automata Learning via EvidenceDriven StateMerging Algorithms 
Evidential CMedoids (ECMdd) 
In this work, a new prototypebased clustering method named Evidential CMedoids (ECMdd), which belongs to the family of medoidbased clustering for proximity data, is proposed as an extension of Fuzzy CMedoids (FCMdd) on the theoretical framework of belief functions. In the application of FCMdd and original ECMdd, a single medoid (prototype), which is supposed to belong to the object set, is utilized to represent one class. For the sake of clarity, this kind of ECMdd using a single medoid is denoted by sECMdd. In real clustering applications, using only one pattern to capture or interpret a class may not adequately model different types of group structure and hence limits the clustering performance. In order to address this problem, a variation of ECMdd using multiple weighted medoids, denoted by wECMdd, is presented. Unlike sECMdd, in wECMdd objects in each cluster carry various weights describing their degree of representativeness for that class. This mechanism enables each class to be represented by more than one object. Experimental results in synthetic and real data sets clearly demonstrate the superiority of sECMdd and wECMdd. Moreover, the clustering results by wECMdd can provide richer information for the inner structure of the detected classes with the help of prototype weights. 
Evolution Strategy (ES) 
In computer science, an evolution strategy (ES) is an optimization technique based on ideas of adaptation and evolution. It belongs to the general class of evolutionary computation or artificial evolution methodologies. 
Evolutionary Algorithm (EA) 
In artificial intelligence, an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic populationbased metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators. Artificial evolution (AE) describes a process involving individual evolutionary algorithms; EAs are individual components that participate in an AE. 
Evolutionary CostSensitive Deep Belief Network  Imbalanced data with a skewed class distribution are common in many realworld applications. Deep Belief Network (DBN) is a machine learning technique that is effective in classification tasks. However, conventional DBN does not work well for imbalanced data classification because it assumes equal costs for each class. To deal with this problem, costsensitive approaches assign different misclassification costs for different classes without disrupting the true data sample distributions. However, due to lack of prior knowledge, the misclassification costs are usually unknown and hard to choose in practice. Moreover, it has not been well studied as to how costsensitive learning could improve DBN performance on imbalanced data problems. This paper proposes an evolutionary costsensitive deep belief network (ECSDBN) for imbalanced classification. ECSDBN uses adaptive differential evolution to optimize the misclassification costs based on training data, that presents an effective approach to incorporating the evaluation measure (i.e. Gmean) into the objective function. We first optimize the misclassification costs, then apply them to deep belief network. Adaptive differential evolution optimization is implemented as the optimization algorithm that automatically updates its corresponding parameters without the need of prior domain knowledge. The experiments have shown that the proposed approach consistently outperforms the stateoftheart on both benchmark datasets and realworld dataset for fault diagnosis in tool condition monitoring. 
Evolutionary DEep Networks (EDEN) 
Deep neural networks continue to show improved performance with increasing depth, an encouraging trend that implies an explosion in the possible permutations of network architectures and hyperparameters for which there is little intuitive guidance. To address this increasing complexity, we propose Evolutionary DEep Networks (EDEN), a computationally efficient neuroevolutionary algorithm which interfaces to any deep neural network platform, such as TensorFlow. We show that EDEN evolves simple yet successful architectures built from embedding, 1D and 2D convolutional, max pooling and fully connected layers along with their hyperparameters. Evaluation of EDEN across seven image and sentiment classification datasets shows that it reliably finds good networks — and in three cases achieves stateoftheart results — even on a single GPU, in just 624 hours. Our study provides a first attempt at applying neuroevolution to the creation of 1D convolutional networks for sentiment analysis including the optimisation of the embedding layer. 
Evolutionary Generative Adversarial Network (EGAN) 
Generative adversarial networks (GAN) have been effective for learning generative models for realworld data. However, existing GANs (GAN and its variants) tend to suffer from training problems such as instability and mode collapse. In this paper, we propose a novel GAN framework called evolutionary generative adversarial networks (EGAN) for stable GAN training and improved generative performance. Unlike existing GANs, which employ a predefined adversarial objective function alternately training a generator and a discriminator, we utilize different adversarial training objectives as mutation operations and evolve a population of generators to adapt to the environment (i.e., the discriminator). We also utilize an evaluation mechanism to measure the quality and diversity of generated samples, such that only wellperforming generator(s) are preserved and used for further training. In this way, EGAN overcomes the limitations of an individual adversarial training objective and always preserves the best offspring, contributing to progress in and the success of GANs. Experiments on several datasets demonstrate that EGAN achieves convincing generative performance and reduces the training problems inherent in existing GANs. 
Evolutionary Reinforcement Learning (ERL) 
Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack of effective exploration, and brittle convergence properties that are extremely sensitive to hyperparameters. Collectively, these challenges severely limit the applicability of these approaches to real world problems. Evolutionary Algorithms (EAs), a class of black box optimization techniques inspired by natural evolution, are well suited to address each of these three challenges. However, EAs typically suffer with high sample complexity and struggle to solve problems that require optimization of a large number of parameters. In this paper, we introduce Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into the EA population periodically to inject gradient information into the EA. ERL inherits EA’s ability of temporal credit assignment with a fitness metric, effective exploration with a diverse set of policies, and stability of a populationbased approach and complements it with offpolicy DRL’s ability to leverage gradients for higher sample efficiency and faster learning. Experiments in a range of challenging continuous control benchmark tasks demonstrate that ERL significantly outperforms prior DRL and EA methods, achieving stateoftheart performances. 
Evolving Fuzzy System (EFS) 
Evolving fuzzy systems (EFS) can be defined as selfdeveloping, selflearning fuzzy rulebased or neurofuzzy systems that have both their parameters but also (more importantly) their structure selfadapting online. They are usually associated with streaming data and online (often realtime) modes of operation. In a narrower sense they can be seen as adaptive fuzzy systems. The difference is that evolving fuzzy systems assume online adaptation of system structure in addition to the parameter adaptation which is usually associated with the term adaptive. They also allow for adaptation of the learning mechanism. Therefore, evolving assumes a higher level of adaptation. In this definition the English word evolving is used with its core meaning as described in the Oxford dictionary (Hornby, 1974; p.294), namely unfolding; developing; being developed, naturally and gradually. Often evolving is used in relation to so called evolutionary and genetic algorithms. The meaning of the term evolutionary is defined in the Oxford dictionary as development of more complicated forms of life (plants, animals) from earlier and simpler forms. EFS consider a gradual development of the underlying (fuzzy or neurofuzzy) system structure and do not deal with such phenomena specific for the evolutionary and genetic algorithms as chromosomes crossover, mutation, selection and reproduction, parents and offsprings. ➘ “Evolving Intelligent System” 
Evolving Intelligent System (EIS) 
The term Evolving was first used to describe an intelligent system in 1996 by B. Carse, T. Fogarty and A Munro for a fuzzy rulebased controller where its parameters and structure were learnt simultaneously using a Genetic Algorithm. Years later, alternative methods to learn an evolving intelligent system (EIS) via Incremental learning were suggested as a neurofuzzy algorithm by N. Kasabov in 1998 and a rulebased model by P. Angelov in 1999. EIS are usually associated with, streaming data and online (often realtime) modes of operation. They can be seen as adaptive intelligent systems. EIS assumes online adaptation of system structure in addition to the parameter adaptation which is usually associated with the term ‘incremental’ from Incremental learning. They have been studied as a methodological solution to learn from streaming data exhibiting nonstationary behaviours by M. SayedMouchaweh and E. Lughofer. An important subarea of EIS is represented by Evolving Fuzzy Systems (EFS) (a comprehensive survey written by E. Lughofer including realworld applications can be found in ), which rely on fuzzy systems architecture and incrementally update, evolve and prune fuzzy sets and fuzzy rules on demand and onthefly. One of the major strengths of EFS, compared to other forms of evolving system models, is that they are able to support some sort of interpretability and understandability for experts and users. This opens possibilities for enriched humanmachine interaction’s scenarios, where the users may ‘communicate’ with an online evolving system in form of knowledge exchange (active learning (machine learning) and teaching). This concept is currently motivated and discussed in the evolving systems community under the term HumanInspired Evolving Machines and respected as ‘one future’ generation of ‘EIS’. ➘ “PANFIS++” 
Exact Soft ConfidenceWeighted Learning  In this paper, we propose a new Soft ConfidenceWeighted (SCW) online learning scheme, which enables the conventional confidenceweighted learning method to handle nonseparable cases. Unlike the previous confidenceweighted learning algorithms, the proposed soft confidenceweighted learning method enjoys all the four salient properties: (i) large margin training, (ii) confidence weighting, (iii) capability to handle nonseparable data, and (iv) adaptive margin. Our experimental results show that the proposed SCW algorithms significantly outperform the original CW algorithm. When comparing with a variety of stateoftheart algorithms (including AROW, NAROW and NHERD), we found that SCW generally achieves better or at least comparable predictive accuracy, but enjoys significant advantage of computational efficiency (i.e., smaller number of updates and lower time cost). 
Excess Relative Risk Model  rERR 
Excess Risk  In statistics, excess risk is a measure of the relationship between a specified risk factor and a specified outcome (such as contracting a disease). It is the difference between two proportions. In epidemiology it is typically defined to be the difference between the proportion of subjects in a population with a particular disease who were exposed to a specified risk factor and the proportion of subjects with that same disease who were not exposed. 
ExGUtils  The study of reaction times and their underlying cognitive processes is an important field in Psychology. Reaction times are usually modeled through the exGaussian distribution, because it provides a good fit to multiple empirical data. The complexity of this distribution makes the use of computational tools an essential element in the field. Therefore, there is a strong need for efficient and versatile computational tools for the research in this area. In this manuscript we discuss some mathematical details of the exGaussian distribution and apply the ExGUtils package, a set of functions and numerical tools, programmed for python, developed for numerical analysis of data involving the exGaussian probability density. In order to validate the package, we present an extensive analysis of fits obtained with it, discuss advantages and differences between the least squares and maximum likelihood methods and quantitatively evaluate the goodness of the obtained fits (which is usually an overlooked point in most literature in the area). The analysis done allows one to identify outliers in the empirical datasets and criteriously determine if there is a need for data trimming and at which points it should be done. 
Exogenous Variable  Exogenous variables in causal modeling are the variables with no causal links (arrows) leading to them from other variables in the model. In other words, exogenous variables have no explicit causes within the model. The concept of exogenous variable is fundamental in path analysis and structural equation modeling. The complementary concept is endogenous variable. 
Expectation Maximization (EM) 
In statistics, an expectationmaximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the loglikelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected loglikelihood found on the E step. These parameterestimates are then used to determine the distribution of the latent variables in the next E step. 
Expectation Propagation (EP) 
Expectation Propagation (EP) is a technique in Bayesian machine learning. EP finds approximations to a probability distribution. It uses an iterative approach that leverages the factorization structure of the target distribution. It differs from other Bayesian approximation approaches such as Variational Bayesian methods. 
ExpectationBiasing  Stateoftheart forecasting methods using Recurrent Neural Net works (RNN) based on LongShort Term Memory (LSTM) cells have shown exceptional performance targeting shorthorizon forecasts, e.g given a set of predictor features, forecast a target value for the next few time steps in the future. However, in many applications, the performance of these methods decays as the forecasting horizon extends beyond these few time steps. This paper aims to explore the challenges of longhorizon forecasting using LSTM networks. Here, we illustrate the longhorizon forecasting problem in datasets from neuroscience and energy supply management. We then propose expectationbiasing, an approach motivated by the literature of Dynamic Belief Networks, as a solution to improve longhorizon forecasting using LSTMs. We propose two LSTM architectures along with two methods for expectation biasing that significantly outperforms standard practice. 
Expected EnergyBased Restricted Boltzmann Machine (EERBM) 

Expected Improvement (EI) 
Improving the Expected Improvement Algorithm 
Expected Policy Gradient (EPG) 
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussian policies and quadric critics and then extend it to an analytical method for the universal case, covering a broad class of actors and critics, including Gaussian, exponential families, and reparameterised policies with bounded support. For Gaussian policies, we show that it is optimal to explore using covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions. EPG also provides a general framework for reasoning about policy gradient methods, which we use to establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. Furthermore, we prove that EPG reduces the variance of the gradient estimates without requiring deterministic policies and with little computational overhead. Finally, we show that EPG outperforms existing approaches on six challenging domains involving the simulated control of physical systems. 
EXPected Similarity Estimation (EXPoSE) 
We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernelbased and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being significant faster than techniques with the same discriminant power. 
Expected Utility Hypothesis (EUH) 
In economics, game theory, and decision theory the expected utility hypothesis is a hypothesis concerning people’s preferences with regard to choices that have uncertain outcomes (gambles). This hypothesis states that if specific axioms are satisfied, the subjective value associated with an individual’s gamble is the statistical expectation of that individual’s valuations of the outcomes of that gamble. This hypothesis has proved useful to explain some popular choices that seem to contradict the expected value criterion (which takes into account only the sizes of the payouts and the probabilities of occurrence), such as occur in the contexts of gambling and insurance. Daniel Bernoulli initiated this hypothesis in 1738. Until the midtwentieth century, the standard term for the expected utility was the moral expectation, contrasted with ‘mathematical expectation’ for the expected value. The von NeumannMorgenstern utility theorem provides necessary and sufficient conditions under which the expected utility hypothesis holds. From relatively early on, it was accepted that some of these conditions would be violated by real decisionmakers in practice but that the conditions could be interpreted nonetheless as ‘axioms’ of rational choice. Work by Anand (1993) argues against this normative interpretation and shows that ‘rationality’ does not require transitivity, independence or completeness. This view is now referred to as the ‘modern view’ and Anand argues that despite the normative and evidential difficulties the general theory of decisionmaking based on expected utility is an insightful first order approximation that highlights some important fundamental principles of choice, even if it imposes conceptual and technical limits on analysis which need to be relaxed in real world settings where knowledge is less certain or preferences are more sophisticated. 
Expected Value of Partial Perfect Information (EVPPI) 
http://…/WP130003.pdf http://…/seqposterSMDMfina.pdf BCEA 
Expected Value of Perfect Information (EVPI) 
In decision theory, the expected value of perfect information (EVPI) is the price that one would be willing to pay in order to gain access to perfect information. http://…/seqposterSMDMfina.pdf 
Experimental Design Problem  Experimental design is a classical problem in statistics and has also found new applications in machine learning. In the experimental design problem, the aim is to estimate an unknown vector x in mdimensions from linear measurements where a Gaussian noise is introduced in each measurement. The goal is to pick k out of the given n experiments so as to make the most accurate estimate of the unknown parameter x. Given a set S of chosen experiments, the most likelihood estimate x’ can be obtained by a least squares computation. ➚ “Design of Experiments” 
Expert Iteration  Solving sequential decision making problems, such as text parsing, robotic control, and game playing, requires a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration, a novel algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. In contrast, standard Deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that our method substantially outperforms Policy Gradients in the board game Hex, winning 84.4% of games against it when trained for equal time. 
Explainability  ➚ “Collaborative Blackbox and RUle Set Hybrid” 
Explainable Artificial Intelligence (XAI) 
To generate trust with their users, Explainable Artificial Intelligence (XAI) systems need to include an explanation model that can communicate the internal decisions, behaviours and actions to the interacting humans. Successful explanation involves both cognitive and social processes. In this paper we focus on the challenge of meaningful interaction between an explainer and an explainee and investigate the structural aspects of an explanation in order to propose a human explanation dialog model. We follow a bottomup approach to derive the model by analysing transcripts of 398 different explanation dialog types. We use grounded theory to code and identify key components of which an explanation dialog consists. We carry out further analysis to identify the relationships between components and sequences and cycles that occur in a dialog. We present a generalized state model obtained by the analysis and compare it with an existing conceptual dialog model of explanation. 
Explainable Security (XSec) 
The Defense Advanced Research Projects Agency (DARPA) recently launched the Explainable Artificial Intelligence (XAI) program that aims to create a suite of new AI techniques that enable end users to understand, appropriately trust, and effectively manage the emerging generation of AI systems. In this paper, inspired by DARPA’s XAI program, we propose a new paradigm in security research: Explainable Security (XSec). We discuss the “Six Ws” of XSec (Who What Where When Why and How ) and argue that XSec has unique and complex characteristics: XSec involves several different stakeholders (i.e., the system’s developers, analysts, users and attackers) and is multifaceted by nature (as it requires reasoning about system model, threat model and properties of security, privacy and trust as well as about concrete attacks, vulnerabilities and countermeasures). We define a roadmap for XSec that identifies several possible research directions. 
Explanatory Artificial Intelligence (XAI) 
➚ “Explainable Artificial Intelligence” 
Explicit Semantic Analysis (ESA) 
In natural language processing and information retrieval, explicit semantic analysis (ESA) is a vectorial representation of text (individual words or entire documents) that uses a document corpus as a knowledge base. Specifically, in ESA, a word is represented as a column vector in the tfidf matrix of the text corpus and a document (string of words) is represented as the centroid of the vectors representing its words. Typically, the text corpus is Wikipedia, though other corpora including the Open Directory Project have been used. ESA was designed by Evgeniy Gabrilovich and Shaul Markovitch as a means of improving text categorization and has been used by this pair of researchers to compute what they refer to as ‘semantic relatedness’ by means of cosine similarity between the aforementioned vectors, collectively interpreted as a space of ‘concepts explicitly defined and described by humans’, where Wikipedia articles (or ODP entries, or otherwise titles of documents in the knowledge base corpus) are equated with concepts. The name ‘explicit semantic analysis’ contrasts with latent semantic analysis (LSA), because the use of a knowledge base makes it possible to assign humanreadable labels to the concepts that make up the vector space. ESA, as originally posited by Gabrilovich and Markovitch, operates under the assumption that the knowledge base contains topically orthogonal concepts. However, it was later shown by Anderka and Stein that ESA also improves the performance of information retrieval systems when it is based not on Wikipedia, but on the Reuters corpus of newswire articles, which does not satisfy the orthogonality property; in their experiments, Anderka and Stein used newswire stories as ‘concepts’. To explain this observation, links have been shown between ESA and the generalized vector space model. Gabrilovich and Markovitch replied to Anderka and Stein by pointing out that their experimental result was achieved using ‘a single application of ESA (text similarity)’ and ‘just a single, extremely small and homogenous test collection of 50 news documents’. Crosslanguage explicit semantic analysis (CLESA) is a multilingual generalization of ESA. CLESA exploits a documentaligned multilingual reference collection (e.g., again, Wikipedia) to represent a document as a languageindependent concept vector. The relatedness of two documents in different languages is assessed by the cosine similarity between the corresponding vector representations. http://…explicitsemanticanalysisesaexplained 
Exploration Potential  We introduce exploration potential, a quantity for that measures how much a reinforcement learning agent has explored its environment class. In contrast to information gain, exploration potential takes the problem’s reward structure into account. This leads to an exploration criterion that is both necessary and sufficient for asymptotic optimality (learning to act optimally across the entire environment class). Our experiments in multiarmed bandits use exploration potential to illustrate how different algorithms make the tradeoff between exploration and exploitation. 
Explorative Landscape Analysis (ELA) 
Exploratory Landscape Analysis subsumes a number of techniques employed to obtain knowledge about the properties of an unknown optimization problem, especially insofar as these properties are important for the performance of optimization algorithms. Wher flacco 
Exploratory Data Analysis (EDA) 
In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA. 
Exploratory Factor Analysis (EFA) 
In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a scale is a collection of questions used to measure a particular research topic) and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. An example of a measured variable would be the physical height of a human being. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis. EFA is based on the common factor model. Within the common factor model, a function of common factors, unique factors, and errors of measurements expresses measured variables. Common factors influence two or more measured variables, while each unique factor influences only one measured variable and does not explain correlations among measured variables. EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to confirmatory factor analysis (CFA). EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method. http://…/factornew.htm 
Exponential Moving Average (EMA) 
An exponential moving average (EMA), also known as an exponentially weighted moving average (EWMA), is a type of infinite impulse response filter that applies weighting factors which decrease exponentially. The weighting for each older datum decreases exponentially, never reaching zero. The graph at right shows an example of the weight decrease. 
Exponential Random Graph Models (ERGM) 
Exponential random graph models (ERGMs) are a family of statistical models for analyzing data about social and other networks. Many metrics exist to describe the structural features of an observed network such as the density, centrality, or assortativity. However, these metrics describe the observed network which is only one instance of a large number of possible alternative networks. This set of alternative networks may have similar or dissimilar structural features. To support statistical inference on the processes influencing the formation of network structure, a statistical model should consider the set of all possible alternative networks weighted on their similarity to an observed network. However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like linear regression. Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of confounding processes, efficiently representing complex structures, and linking locallevel processes to globallevel properties. Degree Preserving Randomization, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks. 
Exponential Smoothing  Exponential smoothing is a technique that can be applied to time series data, either to produce smoothed data for presentation, or to make forecasts. The time series data themselves are a sequence of observations. The observed phenomenon may be an essentially random process, or it may be an orderly, but noisy, process. Whereas in the simple moving average the past observations are weighted equally, exponential smoothing assigns exponentially decreasing weights over time. Exponential smoothing is commonly applied to financial market and economic data, but it can be used with any discrete set of repeated measurements. The simplest form of exponential smoothing should be used only for data without any systematic trend or seasonal components. 
ExponentialGeneralized Truncated Logarithmic (EGTL) 
In this paper, we introduce a new twoparameter lifetime distribution, called the exponentialgeneralized truncated logarithmic (EGTL) distribution, by compounding the exponential and generalized truncated logarithmic distributions. Our procedure generalizes the exponentiallogarithmic (EL) distribution modelling the reliability of systems by the use of firstorder concepts, where the minimum lifetime is considered (Tahmasbi 2008). In our approach, we assume that a system fails if a given number k of the components fails and then, we consider the kthsmallest value of lifetime instead of the minimum lifetime. The reliability and failure rate functions as well as their properties are presented for some special cases. The estimation of the parameters is attained by the maximum likelihood, the expectation maximization algorithm, the method of moments and the Bayesian approach, with a simulation study performed to illustrate the different methods of estimation. The application study is illustrated based on two real data sets used in many applications of reliability. 
Exposure  Machine learning models based on neural networks and deep learning are being rapidly adopted for many purposes. What those models learn, and what they may share, is a significant concern when the training data may contain secrets and the models are public — e.g., when a model helps users compose text messages using models trained on all users’ messages. This paper presents exposure: a simpletocompute metric that can be applied to any deep learning model for measuring the memorization of secrets. Using this metric, we show how to extract those secrets efficiently using blackbox API access. Further, we show that unintended memorization occurs early, is not due to overfitting, and is a persistent issue across different types of models, hyperparameters, and training strategies. We experiment with both realworld models (e.g., a stateoftheart translation model) and datasets (e.g., the Enron email dataset, which contains users’ credit card numbers) to demonstrate both the utility of measuring exposure and the ability to extract secrets. Finally, we consider many defenses, finding some ineffective (like regularization), and others to lack guarantees. However, by instantiating our own differentiallyprivate recurrent model, we validate that by appropriately investing in the use of stateoftheart techniques, the problem can be resolved, with high utility. 
Extended Autoregressive Model (EAR) 
Generative models (GMs) such as Generative Adversary Network (GAN) and Variational AutoEncoder (VAE) have thrived these years and achieved high quality results in generating new samples. Especially in Computer Vision, GMs have been used in image inpainting, denoising and completion, which can be treated as the inference from observed pixels to corrupted pixels. However, images are hierarchically structured which are quite different from many realworld inference scenarios with nonhierarchical features. These inference scenarios contain heterogeneous stochastic variables and irregular mutual dependences. Traditionally they are modeled by Bayesian Network (BN). However, the learning and inference of BN model are NPhard thus the number of stochastic variables in BN is highly constrained. In this paper, we adapt typical GMs to enable heterogeneous learning and inference in polynomial time.We also propose an extended autoregressive (EAR) model and an EAR with adversary loss (EARA) model and give theoretical results on their effectiveness. Experiments on several BN datasets show that our proposed EAR model achieves the best performance in most cases compared to other GMs. Except for black box analysis, we’ve also done a serial of experiments on Markov border inference of GMs for white box analysis and give theoretical results. 
Extended Autoregressive Model with Adversary Loss (EARA) 
Generative models (GMs) such as Generative Adversary Network (GAN) and Variational AutoEncoder (VAE) have thrived these years and achieved high quality results in generating new samples. Especially in Computer Vision, GMs have been used in image inpainting, denoising and completion, which can be treated as the inference from observed pixels to corrupted pixels. However, images are hierarchically structured which are quite different from many realworld inference scenarios with nonhierarchical features. These inference scenarios contain heterogeneous stochastic variables and irregular mutual dependences. Traditionally they are modeled by Bayesian Network (BN). However, the learning and inference of BN model are NPhard thus the number of stochastic variables in BN is highly constrained. In this paper, we adapt typical GMs to enable heterogeneous learning and inference in polynomial time.We also propose an extended autoregressive (EAR) model and an EAR with adversary loss (EARA) model and give theoretical results on their effectiveness. Experiments on several BN datasets show that our proposed EAR model achieves the best performance in most cases compared to other GMs. Except for black box analysis, we’ve also done a serial of experiments on Markov border inference of GMs for white box analysis and give theoretical results. 
Extended Bayesian Information Criterion (EBIC) 
The ordinary Bayes information criterion is too liberal for model selection when the model space is large. In this article, we reexamine the Bayesian paradigm for model selection and propose an extended family of Bayes information criteria. The new criteria take into account both the number of unknown parameters and the complexity of the model space. Their consistency is established, in particular allowing the number of covariates to increase to in nity with the sample size. Their performance in various situations is evaluated by simulation studies. It is demonstrated that the extended Bayes information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. The extended Bayes information criteria are extremely useful for variable selection in problems with a moderate sample size but a huge number of covariates, especially in genomewide association studies, which are now an active area in genetics research. Some keywords: Bayesian paradigm; Consistency; Genomewide association study; Tour nament approach; Variable selection. 
Extended Fourier Amplitude Sensitivity Test  Excluding irrelevant features in a pattern recognition task plays an important role in maintaining a simpler machine learning model and optimizing the computational efficiency. Nowadays with the rise of large scale datasets, feature selection is in great demand as it becomes a central issue when facing highdimensional datasets. The present study provides a new measure of saliency for features by employing a Sensitivity Analysis (SA) technique called the extended Fourier amplitude sensitivity test, and a welltrained Feedforward Neural Network (FNN) model, which ultimately leads to the selection of a promising optimal feature subset. Ideas of the paper are mainly demonstrated based on adopting FNN model for feature selection in classification problems. But in the end, a generalization framework is discussed in order to give insights into the usage in regression problems as well as expressing how other function approximate models can be deployed. Effectiveness of the proposed method is verified by result analysis and data visualization for a series of experiments over several wellknown datasets drawn from UCI machine learning repository. 
Extended Kalman Filter (EKF) 
In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. In the case of well defined transition models, the EKF has been considered the de facto standard in the theory of nonlinear state estimation, navigation systems and GPS. 
Extended PCA (XPCA) 
Principal component analysis (PCA) is arguably the most popular tool in multivariate exploratory data analysis. In this paper, we consider the question of how to handle heterogeneous variables that include continuous, binary, and ordinal. In the probabilistic interpretation of lowrank PCA, the data has a normal multivariate distribution and, therefore, normal marginal distributions for each column. If some marginals are continuous but not normal, the semiparametric copulabased principal component analysis (COCA) method is an alternative to PCA that combines a Gaussian copula with nonparametric marginals. If some marginals are discrete or semicontinuous, we propose a new extended PCA (XPCA) method that also uses a Gaussian copula and nonparametric marginals and accounts for discrete variables in the likelihood calculation by integrating over appropriate intervals. Like PCA, the factors produced by XPCA can be used to find latent structure in data, build predictive models, and perform dimensionality reduction. We present the new model, its induced likelihood function, and a fitting algorithm which can be applied in the presence of missing data. We demonstrate how to use XPCA to produce an estimated full conditional distribution for each data point, and use this to produce to provide estimates for missing data that are automatically range respecting. We compare the methods as applied to simulated and realworld data sets that have a mixture of discrete and continuous variables. 
Extended Space Forest  The Extended Space Forest is a new method for decision tree construction in which training is done with input vectors including all the original features and their random combinations. The combinations are generated with a difference operator applied to random pairs of original features. The experimental results show that extended space versions of ensemble algorithms have better performance than the original ensemble algorithms. To investigate the success dynamics of the Extended Space Forest, the individual accuracy and diversity creation powers of ensemble algorithms are compared. The Extended Space Forest creates more diversity when it uses all the input features than Bagging and Rotation Forest. It also results in more individual accuracy when it uses random selection of the features than Random Subspace and Random Forest methods. It needs more training time because of using more features than the original algorithms. But its testing time is lower than the others because it generates less complex base learners. 
eXtensible Neural Machine Translation toolkit (XNMT) 
This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin guishes itself from other opensource NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configuration system, and demonstrate its utility on the tasks of machine translation, speech recognition, and multitasked machine translation/parsing. XNMT is available opensource at https://…/xnmt 
Exterior Distance Function (EDF) 
We introduce and study exterior distance function (EDF) and correspondent exterior point method (EPM) for convex optimization. The EDF is a classical Lagrangian for an equivalent problem obtained from the initial one by monotone transformation of both the objective function and the constraints. The constraints transformation is scaled by a positive scaling parameter. Thus, the EDF is a particular realization of the Nonlinear Rescaling (NR) principle. Along with the ‘center’, the EDF has two extra tools: the barrier (scaling) parameter and the vector of Lagrange multipliers. We show that EPM generates primal – dual sequence, which converges to the primal – dual solution in value under minimum assumption on the input data. Moreover, the convergence is taking place under any fixed interior point as a ‘center’ and any fixed positive scaling parameter, just due to the Lagrange multipliers update. If the second order sufficient optimality condition is satisfied, then the EPM converges with Qlinear rate under any fixed interior point as a ‘center’ and any fixed, but large enough positive scaling parameter. 
Exterior Point Method (EPM) 
➘ “Exterior Distance Function” 
Extract, Transform, Analyse and Load (ET(A)L) 
The ET(AL) is another form of reduction mechanism, which is why the analytics aspect is included to ensure that the data that gets through is the data that is needed, and that the junk and noise that has no recognisable value, gets cleaned out early and often. 
ExtractTransformLoad (ETL) 
In computing, extract, transform, and load (ETL) refers to a process in database usage and especially in data warehousing that: *Extracts data from outside sources *Transforms it to fit operational needs, which can include quality levels *Loads it into the end target (database, more specifically, operational data store, data mart, or data warehouse) ETL systems are commonly used to integrate data from multiple applications, typically developed and supported by different vendors or hosted on separate computer hardware. The disparate systems containing the original data are frequently managed and operated by different employees. For example a cost accounting system may combine data from payroll, sales and purchasing. 
Extrapolation Compression  Optimizing distributed learning systems is an art of balancing between computation and communication. There have been two lines of research that try to deal with slower networks: {\em quantization} for low bandwidth networks, and {\em decentralization} for high latency networks. In this paper, we explore a natural question: {\em can the combination of both decentralization and quantization lead to a system that is robust to both bandwidth and latency?} Although the system implication of such combination is trivial, the underlying theoretical principle and algorithm design is challenging: simply quantizing data sent in a decentralized training algorithm would accumulate the error. In this paper, we develop a framework of quantized, decentralized training and propose two different strategies, which we call {\em extrapolation compression} and {\em difference compression}. We analyze both algorithms and prove both converge at the rate of $O(1/\sqrt{nT})$ where $n$ is the number of workers and $T$ is the number of iterations, matching the {\rc convergence} rate for full precision, centralized training. We evaluate our algorithms on training deep learning models, and find that our proposed algorithm outperforms the best of merely decentralized and merely quantized algorithm significantly for networks with {\em both} high latency and low bandwidth. 
Extremal Depth (ED) 
We propose a new notion called `extremal depth’ (ED) for functional data, discuss its properties, and compare its performance with existing concepts. The proposed notion is based on a measure of extreme `outlyingness’. ED has several desirable properties that are not shared by other notions and is especially well suited for obtaining central regions of functional data and function spaces. In particular: a) the central region achieves the nominal (desired) simultaneous coverage probability; b) there is a correspondence between EDbased (simultaneous) central regions and appropriate pointwise central regions; and c) the method is resistant to certain classes of functional outliers. The paper examines the performance of ED and compares it with other depth notions. Its usefulness is demonstrated through applications to constructing central regions, functional boxplots, outlier detection, and simultaneous confidence bands in regression problems. 
Extreme Bounds Analysis (EBA) 
The basic idea of extreme bounds analysis is quite simple. We are interested in finding out which variables from the set X are robustly associated with the dependent variable y. To do so, we run a large number of regression models. Each has y as the dependent variable and includes a set of standard explanatory variables F that are included in each regression model. In addition, each model includes a different subset D of the variables in X. Following the convention in the literature, we will refer to F as the free variables and to X as the doubtful variables. Some subset of the doubtful variables X might be socalled focus variables that are of particular interest to the researcher. The doubtful variables 4 ExtremeBounds: Extreme Bounds Analysis in R whose regression coefficients retain their statistical significance in a large enough proportion of estimated models are declared to be robust, whereas those that do not are labelled fragile. ExtremeBounds 
Extreme Function Theory  We introduce an extreme function theory as a novel method by which probabilistic novelty detection may be performed with functions, where the functions are represented by timeseries of (potentially multivariate) discrete observations. We set the method within the framework of Gaussian processes (GP), which offers a convenient means of constructing a distribution over functions. Whereas conventional novelty detection methods aim to identify individually extreme data points, with respect to a model of normality constructed using examples of ‘normal’ data points, the proposed method aims to identify extreme functions, with respect to a model of normality constructed using examples of ‘normal’ functions, where those functions are represented by timeseries of observations. The method is illustrated using synthetic data, physiological data acquired from a large clinical trial, and a benchmark timeseries dataset. 
Extreme Gradient Boosting  Extreme Gradient Boosting, which is an efficient implementation of gradient boosting framework. xgboost 
Extreme Learning Machine (ELM) 
Extreme learning machine (ELM) is a modification of single layer feedforward network (SLFN) where learning is quite similar to the reservoir computing. ELMR,elmNNRcpp 
Extreme Machine Learning (ELM) 
➘ “Extreme Learning Machine” 
Extreme MultiLabel Learning using Distributional Semantics (ExMLDS) 
We present a novel and scalable label embedding framework for largescale multilabel learning a.k.a ExMLDS (Extreme MultiLabel Learning using Distributional Semantics). Our approach draws inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings for natural language processing tasks. Learning such embeddings can be reduced to a certain matrix factorization. Our approach is novel in that it highlights interesting connections between label embedding methods used for multilabel learning and paragraph/document embedding methods commonly used for learning representations of text data. The framework can also be easily extended to incorporate auxiliary information such as labellabel correlations; this is crucial especially when there are a lot of missing labels in the training data. We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed learning methods perform favorably compared to several baselines and stateoftheart methods for largescale multilabel learning. 
Extreme Studentized Deviate (ESD) 
The generalized extreme Studentized deviate (ESD) test is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution. The primary limitation of the Grubbs test and the TietjenMoore test is that the suspected number of outliers, k, must be specified exactly. If k is not specified correctly, this can distort the conclusions of these tests. On the other hand, the generalized ESD test only requires that an upper bound for the suspected number of outliers be specified. Given the upper bound, r, the generalized ESD test essentially performs r separate tests: a test for one outlier, a test for two outliers, and so on up to r outliers. 
Extreme Summarization  We introduce extreme summarization, a new singledocument summarization task which does not favor extractive strategies and calls for an abstractive modeling approach. The idea is to create a short, onesentence news summary answering the question ‘What is the article about?’. We collect a realworld, largescale dataset for this task by harvesting online articles from the British Broadcasting Corporation (BBC). We propose a novel abstractive model which is conditioned on the article’s topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures longrange dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and stateoftheart abstractive approaches when evaluated automatically and by humans. 
Extreme Value Analysis (EVA) 
➘ “Extreme Value Theory” Introduction to Extreme Value Analysis https://…/Extremes.pdf hkevp,revdbayes,threshr 
Extreme Value Learning (EVL) 
The novel unseen classes can be formulated as the extreme values of known classes. This inspired the recent works on openset recognition \cite{Scheirer_2013_TPAMI,Scheirer_2014_TPAMIb,EVM}, which however can have no way of naming the novel unseen classes. To solve this problem, we propose the Extreme Value Learning (EVL) formulation to learn the mapping from visual feature to semantic space. To model the margin and coverage distributions of each class, the Vocabularyinformed Learning (ViL) is adopted by using vast open vocabulary in the semantic space. Essentially, by incorporating the EVL and ViL, we for the first time propose a novel semantic embedding paradigm — Vocabularyinformed Extreme Value Learning (ViEVL), which embeds the visual features into semantic space in a probabilistic way. The learned embedding can be directly used to solve supervised learning, zeroshot and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of proposed frameworks. 
Extreme Value Theory (EVT) 
Extreme value theory (EVT) or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the probability of events that are more extreme than any previously observed. Extreme value analysis is widely used in many disciplines, such as structural engineering, finance, earth sciences, traffic prediction, and geological engineering. For example, EVA might be used in the field of hydrology to estimate the probability of an unusually large flooding event, such as the 100year flood. Similarly, for the design of a breakwater, a coastal engineer would seek to estimate the 50year wave and design the structure accordingly. Extreme Value Theory for Open Set Classification – GPD and GEV Classifiers 
Advertisements