# WhatIs-E

 Eager Execution Eager execution is an imperative, define-by-run interface where operations are executed immediately as they are called from Python. This makes it easier to get started with TensorFlow, and can make research and development more intuitive. The benefits of eager execution include: · Fast debugging with immediate run-time errors and integration with Python tools · Support for dynamic models using easy-to-use Python control flow · Strong support for custom and higher-order gradients · Almost all of the available TensorFlow operations Eager execution is available now as an experimental feature, so we’re looking for feedback from the community to guide our direction. ➘ “Tensorflow” Eager Learning In artificial intelligence, eager learning is a learning method in which the system tries to construct a general, input independent target function during training of the system, as opposed to lazy learning, where generalization beyond the training data is delayed until a query is made to the system. The main advantage gained in employing an eager learning method, such as an artificial neural network, is that the target function will be approximated globally during training, thus requiring much less space than a lazy learning system. Eager learning systems also deal much better with noise in the training data. Eager learning is an example of offline learning, in which post-training queries to the system have no effect on the system itself, and thus the same query to the system will always produce the same result. The main disadvantage with eager learning is that it is generally unable to provide good local approximations in the target function. EagleEye Deep neural networks (DNNs) are inherently vulnerable to adversarial inputs: such maliciously crafted samples trigger DNNs to misbehave, leading to detrimental consequences for DNN-powered systems. The fundamental challenges of mitigating adversarial inputs stem from their adaptive and variable nature. Existing solutions attempt to improve DNN resilience against specific attacks; yet, such static defenses can often be circumvented by adaptively engineered inputs or by new attack variants. Here, we present EagleEye, an attack-agnostic adversarial tampering analysis engine for DNN-powered systems. Our design exploits the {\em minimality principle} underlying many attacks: to maximize the attack’s evasiveness, the adversary often seeks the minimum possible distortion to convert genuine inputs to adversarial ones. We show that this practice entails the distinct distributional properties of adversarial inputs in the input space. By leveraging such properties in a principled manner, EagleEye effectively discriminates adversarial inputs and even uncovers their correct classification outputs. Through extensive empirical evaluation using a range of benchmark datasets and DNN models, we validate EagleEye’s efficacy. We further investigate the adversary’s possible countermeasures, which implies a difficult dilemma for her: to evade EagleEye’s detection, excessive distortion is necessary, thereby significantly reducing the attack’s evasiveness regarding other detection mechanisms. EAP A good clustering algorithm should not only be able to discover clusters of arbitrary shapes (global view) but also provide additional information, which can be used to gain more meaningful insights into the internal structure of the clusters (local view). In this work we use the mathematical framework of factor graphs and message passing algorithms to optimize a pairwise similarity based cost function, in the same spirit as was done in Affinity Propagation. Using this framework we develop two variants of a new clustering algorithm, EAP and SHAPE. EAP/SHAPE can not only discover clusters of arbitrary shapes but also provide a rich local view in the form of meaningful local representatives (exemplars) and connections between these local exemplars. We discuss how this local information can be used to gain various insights about the clusters including varying relative cluster densities and indication of local strength in different regions of a cluster . We also discuss how this can help an analyst in discovering and resolving potential inconsistencies in the results. The efficacy of EAP/SHAPE is shown by applying it to various synthetic and real world benchmark datasets. EARL In order to answer natural language questions over knowledge graphs, most processing pipelines involve entity and relation linking. Traditionally, entity linking and relation linking has been performed either as dependent sequential tasks or independent parallel tasks. In this paper, we propose a framework, called EARL, which performs entity linking and relation linking as a joint single task. EARL is modeled on an optimised variation of GeneralisedTravelling Salesperson Problem. The system determines the best semantic connection between all keywords of the question by referring to the knowledge graph. This is achieved by exploiting the connection density between entity candidates and relation candidates. We have empirically evaluated the framework on a dataset with 3000 complex questions. Our system surpasses state-of-the-art scores for entity linking task by reporting an accuracy of 0.67against 0.40 from the next best entity linker Early Stopping In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration. Up to a point, this improves the learner’s performance on data outside of the training set. Past that point, however, improving the learner’s fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation. Earnings Before Interest, Taxes, Depreciation and Amortization(EBIDTA) A company’s earnings before interest, taxes, depreciation, and amortization (EBITDA) is an accounting metric computed by considering a company’s earnings before interest payments, tax, depreciation, and amortization are subtracted for any final accounting of its income and expenses. The EBITDA of a business gives an indication of its current operational profitability, i.e., how much profit it makes with its present assets and its operations on the products it produces and sells. Earth Mover’s Distance(EMD) In computer science, the earth mover’s distance (EMD) is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric. Informally, if the distributions are interpreted as two different ways of piling up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by which it is moved. The above definition is valid only if the two distributions have the same integral (informally, if the two piles have the same amount of dirt), as in normalized histograms or probability density functions. In that case, the EMD is equivalent to the 1st Mallows distance or 1st Wasserstein distance between the two distributions. ➘ “Wasserstein Metric” ease.ml/ci Continuous integration is an indispensable step of modern software engineering practices to systematically manage the life cycles of system development. Developing a machine learning model is no difference – it is an engineering process with a life cycle, including design, implementation, tuning, testing, and deployment. However, most, if not all, existing continuous integration engines do not support machine learning as first-class citizens. In this paper, we present ease.ml/ci, to our best knowledge, the first continuous integration system for machine learning. The challenge of building ease.ml/ci is to provide rigorous guarantees, e.g., single accuracy point error tolerance with 0.999 reliability, with a practical amount of labeling effort, e.g., 2K labels per test. We design a domain specific language that allows users to specify integration conditions with reliability constraints, and develop simple novel optimizations that can lower the number of labels required by up to two orders of magnitude for test conditions popularly used in real production systems. Easy Convolution and Random Pooling(ECP) Convolution operations dominate the overall execution time of Convolutional Neural Networks (CNNs). This paper proposes an easy yet efficient technique for both Convolutional Neural Network training and testing. The conventional convolution and pooling operations are replaced by Easy Convolution and Random Pooling (ECP). In ECP, we randomly select one pixel out of four and only conduct convolution operations of the selected pixel. As a result, only a quarter of the conventional convolution computations are needed. Experiments demonstrate that the proposed EasyConvPooling can achieve 1.45x speedup on training time and 1.64x on testing time. What’s more, a speedup of 5.09x on pure Easy Convolution operations is obtained compared to conventional convolution operations. Easy Data Augmentation(EDA) We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use. Easy Positive Triplet Mining Deep metric learning seeks to define an embedding where semantically similar images are embedded to nearby locations, and semantically dissimilar images are embedded to distant locations. Substantial work has focused on loss functions and strategies to learn these embeddings by pushing images from the same class as close together in the embedding space as possible. In this paper, we propose an alternative, loosened embedding strategy that requires the embedding function only map each training image to the most similar examples from the same class, an approach we call ‘Easy Positive’ mining. We provide a collection of experiments and visualizations that highlight that this Easy Positive mining leads to embeddings that are more flexible and generalize better to new unseen data. This simple mining strategy yields recall performance that exceeds state of the art approaches (including those with complicated loss functions and ensemble methods) on image retrieval datasets including CUB, Stanford Online Products, In-Shop Clothes and Hotels-50K. Easy Transfer Learning(EasyTL) Transfer learning aims at transferring knowledge from a well-labeled domain to a similar but different domain with limited or no labels. Unfortunately, existing learning-based methods often involve intensive model selection and hyperparameter tuning to obtain good results. Moreover, cross-validation is not possible for tuning hyperparameters since there are often no labels in the target domain. This would restrict wide applicability of transfer learning especially in computationally-constraint devices such as wearables. In this paper, we propose a practically Easy Transfer Learning (EasyTL) approach which requires no model selection and hyperparameter tuning, while achieving competitive performance. By exploiting intra-domain structures, EasyTL is able to learn both non-parametric transfer features and classifiers. Extensive experiments demonstrate that, compared to state-of-the-art traditional and deep methods, EasyTL satisfies the Occam’s Razor principle: it is extremely easy to implement and use while achieving comparable or better performance in classification accuracy and much better computational efficiency. Additionally, it is shown that EasyTL can increase the performance of existing transfer feature learning methods. EasyConvPooling ➚ “Easy Convolution and Random Pooling” EC3 Classification and clustering algorithms have been proved to be successful individually in different contexts. Both of them have their own advantages and limitations. For instance, although classification algorithms are more powerful than clustering methods in predicting class labels of objects, they do not perform well when there is a lack of sufficient manually labeled reliable data. On the other hand, although clustering algorithms do not produce label information for objects, they provide supplementary constraints (e.g., if two objects are clustered together, it is more likely that the same label is assigned to both of them) that one can leverage for label prediction of a set of unknown objects. Therefore, systematic utilization of both these types of algorithms together can lead to better prediction performance. In this paper, We propose a novel algorithm, called EC3 that merges classification and clustering together in order to support both binary and multi-class classification. EC3 is based on a principled combination of multiple classification and multiple clustering methods using an optimization function. We theoretically show the convexity and optimality of the problem and solve it by block coordinate descent method. We additionally propose iEC3, a variant of EC3 that handles imbalanced training data. We perform an extensive experimental analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known standalone classifiers, 5 ensemble classifiers, and 2 existing methods that merge classification and clustering) on 13 standard benchmark datasets. We show that our methods outperform other baselines for every single dataset, achieving at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than the best baseline), more resilient to noise and class imbalance than the best baseline method. ECC Many DNN-enabled vision applications constantly operate under severe energy constraints such as unmanned aerial vehicles, Augmented Reality headsets, and smartphones. Designing DNNs that can meet a stringent energy budget is becoming increasingly important. This paper proposes ECC, a framework that compresses DNNs to meet a given energy constraint while minimizing accuracy loss. The key idea of ECC is to model the DNN energy consumption via a novel bilinear regression function. The energy estimate model allows us to formulate DNN compression as a constrained optimization that minimizes the DNN loss function over the energy constraint. The optimization problem, however, has nontrivial constraints. Therefore, existing deep learning solvers do not apply directly. We propose an optimization algorithm that combines the essence of the Alternating Direction Method of Multipliers (ADMM) framework with gradient-based learning algorithms. The algorithm decomposes the original constrained optimization into several subproblems that are solved iteratively and efficiently. ECC is also portable across different hardware platforms without requiring hardware knowledge. Experiments show that ECC achieves higher accuracy under the same or lower energy budget compared to state-of-the-art resource-constrained DNN compression techniques. Echo State Network(ESN) Among the various architectures of Recurrent Neural Networks, Echo State Networks (ESNs) emerged due to their simplified and inexpensive training procedure. These networks are known to be sensitive to the setting of hyper-parameters, which critically affect their behaviour. Results show that their performance is usually maximized in a narrow region of hyper-parameter space called edge of chaos. Finding such a region requires searching in hyper-parameter space in a sensible way: hyper-parameter configurations marginally outside such a region might yield networks exhibiting fully developed chaos, hence producing unreliable computations. The performance gain due to optimizing hyper-parameters can be studied by considering the memory–nonlinearity trade-off, i.e., the fact that increasing the nonlinear behavior of the network degrades its ability to remember past inputs, and vice-versa. In this paper, we propose a model of ESNs that eliminates critical dependence on hyper-parameters, resulting in networks that provably cannot enter a chaotic regime and, at the same time, denotes nonlinear behaviour in phase space characterised by a large memory of past inputs, comparable to the one of linear networks. Our contribution is supported by experiments corroborating our theoretical findings, showing that the proposed model displays dynamics that are rich-enough to approximate many common nonlinear systems used for benchmarking. Eclat Algorithm The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains. The basic idea for the eclat algorithm is use tidset intersections to compute the support of a candidate itemset avoiding the generation of subsets that does not exist in the prefix tree. Ecological Regression Ecological regression is a statistical technique used especially in political science and history to estimate group voting behavior from aggregate data. For example, if counties have a known Democratic vote (in percentage) D, and a known percentage of Catholics, C, then run the linear regression of dependent variable D against independent variable C. This gives D = a + bC. When C = 1 (100% Catholic) this gives the estimated Democratic vote as a+b. When C = 0 (0% Catholic), this gives the estimated non-Catholic vote as a. For example, if the regression gives D = .22 + .45C, then the estimated Catholic vote is 67% Democratic and the non-Catholic vote is 22% Democratic. The technique has been often used in litigation brought under the Voting Rights Act of 1965 to see how blacks and whites voted. Ecologically-Inspired GENetic Approach for Neural Network Structure Searching(EIGEN) Designing the structure of neural networks is considered one of the most challenging tasks in deep learning. Recently, a few approaches have been proposed to automatically search for the optimal structure of neural networks, however, they suffer from either prohibitive computation cost (e.g., 256 Hours on 250 GPU in [1]) or unsatisfactory performance compared to those of hand-crafted neural networks. In this paper, we propose an Ecologically-Inspired GENetic approach for neural network structure search (EIGEN), that includes succession, mimicry and gene duplication. Specifically, we first use primary succession to rapidly evolve a community of poor initialized neural network structures into a more diverse community, followed by a secondary succession stage for fine-grained searching based on the networks from the primary succession. Extinction is applied in both stages to reduce computational cost. Mimicry is employed during the entire evolution process to help the inferior networks imitate the behavior of a superior network and gene duplication is utilized to duplicate the learned blocks of novel structures, both of which help to find the better network structures. Extensive experimental results show that our proposed approach can achieve the similar or better performance compared to the existing genetic approaches with dramatically reduced computation cost. For example, the network discovered by our approach on CIFAR-100 dataset achieves 78.1% test accuracy under 120 GPU hours, compared to 77.0% test accuracy in more than 65, 536 GPU hours in [1]. Econometrics Econometrics is the application of mathematics, statistical methods, and, more recently, computer science, to economic data and is described as the branch of economics that aims to give empirical content to economic relations. More precisely, it is “the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference.” An introductory economics textbook describes econometrics as allowing economists “to sift through mountains of data to extract simple relationships.” The first known use of the term “econometrics” (in cognate form) was by Polish economist Pawel Ciompa in 1910. Ragnar Frisch is credited with coining the term in the sense in which it is used today. Econometrics is the intersection of economics, mathematics, and statistics. Econometrics adds empirical content to economic theory allowing theories to be tested and used for forecasting and policy evaluation. Edge Intelligence With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet-of-Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions Bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new inter-discipline, edge AI or edge intelligence, is beginning to receive a tremendous amount of interest. However, research on edge intelligence is still in its infancy stage, and a dedicated venue for exchanging the recent advances of edge intelligence is highly desired by both the computer system and artificial intelligence communities. To this end, we conduct a comprehensive survey of the recent research efforts on edge intelligence. Specifically, we first review the background and motivation for artificial intelligence running at the network edge. We then provide an overview of the overarching architectures, frameworks and emerging key technologies for deep learning model towards training/inference at the network edge. Finally, we discuss future research opportunities on edge intelligence. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions and inspire further research ideas on edge intelligence. Edge Source Coding Data compression is an efficient technique to save data storage and transmission costs. However, traditional data compression methods always ignore the impact of user preferences on the statistical distributions of symbols transmitted over the links. Notice that the development of big data technologies and popularization of smart devices enable analyses on user preferences based on data collected from personal handsets. This paper presents a user preference aware lossless data compression method, termed edge source coding, to compress data at the network edge. An optimization problem is formulated to minimize the expected number of bits needed to represent a requested content item in edge source coding. For edge source coding under discrete user preferences, DCA (difference of convex functions programming algorithm) based and k-means++ based algorithms are proposed to give codebook designs. For edge source coding under continuous user preferences, a sampling method is applied to give codebook designs. In addition, edge source coding is extended to the two-user case and codebooks are elaborately designed to utilize multicasting opportunities. Both theoretical analysis and simulations demonstrate the optimal codebook design should take into account user preferences. edge2vec Representation learning for networks provides a new way to mine graphs. Although current researches in this area are able to generate reliable results of node embeddings, they are still limited to homogeneous networks in which all nodes and edges are of the same type. While, increasingly, graphs are heterogeneous with multiple node- and edge- types in the real world. Existing heterogeneous embedding methods are mostly task-based or only able to deal with limited types of node & edge. To tackle this challenge, in this paper, an edge2vec model is proposed to represent nodes in ways that incorporate edge semantics represented as different edge-types in heterogeneous networks. An edge-type transition matrix is optimized from an Expectation-Maximization (EM) framework as an extra criterion of a biased node random walk on networks, and a biased skip-gram model is leveraged to learn node embeddings based on the random walks afterwards. edge2vec is validated and evaluated using three medical domain problems on an ensemble of complex medical networks (more than 10 node- \& edge- types): medical entity classification, compound-gene binding prediction, and medical information searching cost. Results show that by considering edge semantics, edge2vec significantly outperforms other state-of-art models on all three tasks. Edge-Conditioned Convolution(ECC) A graph is a powerful concept for representation of relations between pairs of entities. Data with underlying graph structure can be found across many disciplines and there is a natural desire for understanding such data better. Deep learning (DL) has achieved significant breakthroughs in a variety of machine learning tasks in recent years, especially where data is structured on a grid, such as in text, speech, or image understanding. However, surprisingly little has been done to explore the applicability of DL on arbitrary graph-structured data directly. The goal of this thesis is to investigate architectures for DL on graphs and study how to transfer, adapt or generalize concepts that work well on sequential and image data to this domain. We concentrate on two important primitives: embedding graphs or their nodes into a continuous vector space representation (encoding) and, conversely, generating graphs from such vectors back (decoding). To that end, we make the following contributions. First, we introduce Edge-Conditioned Convolutions (ECC), a convolution-like operation on graphs performed in the spatial domain where filters are dynamically generated based on edge attributes. The method is used to encode graphs with arbitrary and varying structure. Second, we propose SuperPoint Graph, an intermediate point cloud representation with rich edge attributes encoding the contextual relationship between object parts. Based on this representation, ECC is employed to segment large-scale point clouds without major sacrifice in fine details. Third, we present GraphVAE, a graph generator allowing us to decode graphs with variable but upper-bounded number of nodes making use of approximate graph matching for aligning the predictions of an autoencoder with its inputs. The method is applied to the task of molecule generation. Edge-Labeling Graph Neural Network(EGNN) In this paper, we propose a novel edge-labeling graph neural network (EGNN), which adapts a deep neural network on the edge-labeling graph, for few-shot learning. The previous graph neural network (GNN) approaches in few-shot learning have been based on the node-labeling framework, which implicitly models the intra-cluster similarity and the inter-cluster dissimilarity. In contrast, the proposed EGNN learns to predict the edge-labels rather than the node-labels on the graph that enables the evolution of an explicit clustering by iteratively updating the edge-labels with direct exploitation of both intra-cluster similarity and the inter-cluster dissimilarity. It is also well suited for performing on various numbers of classes without retraining, and can be easily extended to perform a transductive inference. The parameters of the EGNN are learned by episodic training with an edge-labeling loss to obtain a well-generalizable model for unseen low-data problem. On both of the supervised and semi-supervised few-shot image classification tasks with two benchmark datasets, the proposed EGNN significantly improves the performances over the existing GNNs. Edgent As the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resource-constrained mobile devices is, however, by no means trivial, since it incurs high performance and energy overhead. While offloading DNNs to the cloud for execution suffers unpredictable performance, due to the uncontrolled long wide-area network latency. To address these challenges, in this paper, we propose Edgent, a collaborative and on-demand DNN co-inference framework with device-edge synergy. Edgent pursues two design knobs: (1) DNN partitioning that adaptively partitions DNN computation between device and edge, in order to leverage hybrid computation resources in proximity for real-time DNN inference. (2) DNN right-sizing that accelerates DNN inference through early-exit at a proper intermediate DNN layer to further reduce the computation latency. The prototype implementation and extensive evaluations based on Raspberry Pi demonstrate Edgent’s effectiveness in enabling on-demand low-latency edge intelligence. EdgeSegNet In this study, we introduce EdgeSegNet, a compact deep convolutional neural network for the task of semantic segmentation. A human-machine collaborative design strategy is leveraged to create EdgeSegNet, where principled network design prototyping is coupled with machine-driven design exploration to create networks with customized module-level macroarchitecture and microarchitecture designs tailored for the task. Experimental results showed that EdgeSegNet can achieve semantic segmentation accuracy comparable with much larger and computationally complex networks (>20x} smaller model size than RefineNet) as well as achieving an inference speed of ~38.5 FPS on an NVidia Jetson AGX Xavier. As such, the proposed EdgeSegNet is well-suited for low-power edge scenarios. Edgeworth Series The Gram-Charlier A series (named in honor of Jørgen Pedersen Gram and Carl Charlier), and the Edgeworth series (named in honor of Francis Ysidro Edgeworth) are series that approximate a probability distribution in terms of its cumulants. The series are the same; but, the arrangement of terms (and thus the accuracy of truncating the series) differ. http://…/EdgeworthSeries.html EW edGNN The ability of a graph neural network (GNN) to leverage both the graph topology and graph labels is fundamental to building discriminative node and graph embeddings. Building on previous work, we theoretically show that edGNN, our model for directed labeled graphs, is as powerful as the Weisfeiler–Lehman algorithm for graph isomorphism. Our experiments support our theoretical findings, confirming that graph neural networks can be used effectively for inference problems on directed graphs with both node and edge labels. Code available at https://…/edGNN. eDiscovery Electronic discovery (or ‘eDiscovery’) is the process of identifying, preserving, collecting, analyzing, reviewing, and producing electronically stored information (ESI). Structured and unstructured data analysis is at the core of eDiscovery. Even routine matters regularly involve hundreds of gigabytes of data that much be analyzed for relevancy and privilege. Most often undertaken for litigation and regulatory compliance, eDiscovery processes and the underlying data mining technology are also deployed for internal investigations, due diligence, financial contract analysis, privacy impact assessments (including GDPR), and data breach responses. Undoubtedly, eDiscovery efforts are crucial to ongoing success in today’s modern corporation. For effective eDiscovery, enterprises need to be able to search through information across their entire enterprise, including both structured (e.g. databases) and unstructured data (e.g. emails, images), and effectively analyze content. The best eDiscovery software will integrate with existing systems and litigation-ready policies. It enables targeted data collections, sophisticated culling and de-duplication. In addition, the capabilities of the best eDiscovery software includes AI-enhanced analysis, full review and tagging, automated redactions, and DIY productions. EDISON Data Science Framework(EDSF) The EDISON Data Science Framework is a collection of documents that define the Data Science profession. Freely available, these documents have been developed to guide educators and trainers, emplyers and managers, and Data Scientists themselves. This collection of documents collectively breakdown the complexity of the skills and competences need to define Data Science as a professional practice. E-Divisive with Medians(EDM) E-Divisive with Medians (EDM) – employs energy statistics to detect divergence in mean. Note that EDM can also be used detect change in distribution in a given time series. EDM uses robust statistical metrics, viz., median, and estimates the statistical significance of a breakout through a permutation test. In addition, EDM is non-parametric. This is important since the distribution of production data seldom (if at all) follows the commonly assumed normal distribution or any other widely accepted model. EDRL We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution. Our key insight is that DRL algorithms can be decomposed as the combination of some statistical estimator and a method for imputing a return distribution consistent with that set of statistics. With this new understanding, we are able to provide improved analyses of existing DRL algorithms as well as construct a new algorithm (EDRL) based upon estimation of the expectiles of the return distribution. We compare EDRL with existing methods on a variety of MDPs to illustrate concrete aspects of our analysis, and develop a deep RL variant of the algorithm, ER-DQN, which we evaluate on the Atari-57 suite of games. Educational Data Mining(EDM) Educational Data Mining (EDM) describes a research field concerned with the application of data mining, machine learning and statistics to information generated from educational settings (e.g., universities and intelligent tutoring systems). At a high level, the field seeks to develop and improve methods for exploring this data, which often has multiple levels of meaningful hierarchy, in order to discover new insights about how people learn in the context of such settings. In doing so, EDM has contributed to theories of learning investigated by researchers in educational psychology and the learning sciences. The field is closely tied to that of learning analytics, and the two have been compared and contrasted. Edward Probabilistic modeling is a powerful approach for analyzing empirical information. We describe Edward, a library for probabilistic modeling. Edward’s design reflects an iterative process pioneered by George Box: build a model of a phenomenon, make inferences about the model given data, and criticize the model’s fit to the data. Edward supports a broad class of probabilistic models, efficient algorithms for inference, and many techniques for model criticism. The library builds on top of TensorFlow to support distributed training and hardware such as GPUs. Edward enables the development of complex probabilistic models and their algorithms at a massive scale. Eesen Framework(Eesen) The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically simplifies the existing pipeline to build state-of-the-art ASR systems. Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting context-independent targets (phonemes or characters). To remove the need for pre-generated frame labels, we adopt the connectionist temporal classification (CTC) objective function to infer the alignments between speech and label sequences. A distinctive feature of Eesen is a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding. Experiments show that compared with the standard hybrid DNN systems, Eesen achieves comparable word error rates (WERs), while at the same time speeding up decoding significantly. EffAcTS Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters. It is particularly relevant for transferring policies learned in a simulation environment to the real world. Several existing approaches involve sampling large batches of trajectories which reflect the differences in various possible environments, and then selecting some subset of these to learn robust policies, such as the ones that result in the worst performance. We propose an active learning based framework, EffAcTS, to selectively choose model parameters for this purpose so as to collect only as much data as necessary to select such a subset. We apply this framework to an existing method, namely EPOpt, and experimentally validate the gains in sample efficiency and the performance of our approach on standard continuous control tasks. We also present a Multi-Task Learning perspective to the problem of Robust Policy Search, and draw connections from our proposed framework to existing work on Multi-Task Learning. Effect Size In statistics, an effect size is a quantitative measure of the strength of a phenomenon. Examples of effect sizes are the correlation between two variables, the regression coefficient, the mean difference, or even the risk with which something happens, such as how many people survive after a heart attack for every one person that does not survive. For each type of effect-size, a larger absolute value always indicates a stronger effect. Effect sizes complement statistical hypothesis testing, and play an important role in statistical power analyses, sample size planning, and in meta-analyses. Especially in meta-analysis, where the purpose is to combine multiple effect-sizes, the standard error of effect-size is of critical importance. The S.E. of effect-size is used to weight effect-sizes when combining studies, so that large studies are considered more important than small studies in the analysis. The S.E. of effect-size is calculated differently for each type of effect-size, but generally only requires knowing the study’s sample size (N), or the number of observations in each group (n’s). Reporting effect sizes is considered good practice when presenting empirical research findings in many fields. The reporting of effect sizes facilitates the interpretation of the substantive, as opposed to the statistical, significance of a research result. Effect sizes are particularly prominent in social and medical research. Relative and absolute measures of effect size convey different information, and can be used complementarily. Effective Applications of the R Language(EARL) EARL is a Conference for users and developers of the open source R programming language. The primary focus of the Conference will be the commercial usage of R across a range of industry sectors with the aim of sharing knowledge and applications of the language. The EARL Conference Team is located at Mango Solutions, a data analysis company headquartered in the UK. Effective Geometry Monte Carlo In this work, we address the systematic biases and random errors stemming from finite step sizes encountered in diffusion simulations. We introduce the Effective Geometry Monte Carlo (EG-MC) simulation algorithm which modifies the geometry of the receiver. We motivate our approach in a 1D toy model and then apply our findings to a spherical absorbing receiver in a 3D unbounded environment. We show that with minimal computational cost, the impulse response of this receiver can be precisely simulated using EG-MC. Afterwards, we demonstrate the accuracy of our simulations and give tight constraints on the single free parameter in EG-MC. Finally, we comment on the range of applicability of our results. While we present the EG-MC algorithm for the specific case of molecular diffusion, we believe that analogous methods with effective geometry manipulations can be utilized to approach a variety of problems in other branches of physics such as condensed matter physics and cosmological large scale structure simulations. Efficient Alternating Direction Multiplier Method(ADMM) Feature selection can efficiently identify the most informative features with respect to the target feature used in training. However, state-of-the-art vector-based methods are unable to encapsulate the relationships between feature samples into the feature selection process, thus leading to significant information loss. To address this problem, we propose a new graph-based structurally interacting elastic net method for feature selection. Specifically, we commence by constructing feature graphs that can incorporate pairwise relationship between samples. With the feature graphs to hand, we propose a new information theoretic criterion to measure the joint relevance of different pairwise feature combinations with respect to the target feature graph representation. This measure is used to obtain a structural interaction matrix where the elements represent the proposed information theoretic measure between feature pairs. We then formulate a new optimization model through the combination of the structural interaction matrix and an elastic net regression model for the feature subset selection problem. This allows us to a) preserve the information of the original vectorial space, b) remedy the information loss of the original feature space caused by using graph representation, and c) promote a sparse solution and also encourage correlated features to be selected. Because the proposed optimization problem is non-convex, we develop an efficient alternating direction multiplier method (ADMM) to locate the optimal solutions. Extensive experiments on various datasets demonstrate the effectiveness of the proposed methods. Efficient Convolutional Network for Online Video Understanding(ECO) The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. (2) While there are local methods with fast per-frame processing, the processing of the whole video is not efficient and hampers fast video retrieval or online classification of long-term activities. In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time. The architecture is based on merging long-term content already in the network rather than in a post-hoc fusion. Together with a sampling strategy, which exploits that neighboring frames are largely redundant, this yields high-quality action classification and video captioning at up to 230 videos per second, where each video can consist of a few hundred frames. The approach achieves competitive performance across all datasets while being 10x to 80x faster than state-of-the-art methods. Efficient Evolution of Neural Architecture(EENA) Latest algorithms for automatic neural architecture search perform remarkable but basically directionless in search space and computational expensive in the training of every intermediate architecture. In this paper, we propose a method for efficient architecture search called EENA (Efficient Evolution of Neural Architecture) with mutation and crossover operations guided by the information have already been learned to speed up this process and consume less computational effort by reducing redundant searching and training. On CIFAR-10 classification, EENA using minimal computational resources (0.65 GPU-days) can design highly effective neural architecture which achieves 2.56% test error with 8.47M parameters. Furthermore, The best architecture discovered is also transferable for CIFAR-100. Efficient Neural Architecture Search(ENAS) We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%. Efficient Tuning of Lasso(ET-Lasso) The L1 regularization (Lasso) has proven to be a versatile tool to select relevant features and estimate the model coefficients simultaneously. Despite its popularity, it is very challenging to guarantee the feature selection consistency of Lasso. One way to improve the feature selection consistency is to select an ideal tuning parameter. Traditional tuning criteria mainly focus on minimizing the estimated prediction error or maximizing the posterior model probability, such as cross-validation and BIC, which may either be time-consuming or fail to control the false discovery rate (FDR) when the number of features is extremely large. The other way is to introduce pseudo-features to learn the importance of the original ones. Recently, the Knockoff filter is proposed to control the FDR when performing feature selection. However, its performance is sensitive to the choice of the expected FDR threshold. Motivated by these ideas, we propose a new method using pseudo-features to obtain an ideal tuning parameter. In particular, we present the Efficient Tuning of Lasso (ET-Lasso) to separate active and inactive features by adding permuted features as pseudo-features in linear models. The pseudo-features are constructed to be inactive by nature, which can be used to obtain a cutoff to select the tuning parameter that separates active and inactive features. Experimental studies on both simulations and real-world data applications are provided to show that ET-Lasso can effectively and efficiently select active features under a wide range of different scenarios. Efficient Unitary Neural Network(EUNN) Using unitary (instead of general) matrices in artificial neural networks (ANNs) is a promising way to solve the gradient explosion/vanishing problem, as well as to enable ANNs to learn long-term correlations in the data. This approach appears particularly promising for Recurrent Neural Networks (RNNs). In this work, we present a new architecture for implementing an Efficient Unitary Neural Network (EUNNs); its main advantages can be summarized as follows. Firstly, the representation capacity of the unitary space in an EUNN is fully tunable, ranging from a subspace of SU(N) to the entire unitary space. Secondly, the computational complexity for training an EUNN is merely O(1) per parameter. Finally, we test the performance of EUNNs on the standard copying task, the pixel-permuted MNIST digit recognition benchmark as well as the Speech Prediction Test (TIMIT). We find that our architecture significantly outperforms both other state-of-the-art unitary RNNs and the LSTM architecture, in terms of the final performance and/or the wall-clock training speed. EUNNs are thus promising alternatives to RNNs and LSTMs for a wide variety of applications. EffNet With the ever increasing application of Convolutional Neural Networks to costumer products the need emerges for models which can efficiently run on embedded, mobile hardware. Slimmer models have therefore become a hot research topic with multiple different approaches which vary from binary networks to revised convolution layers. We offer our contribution to the latter and propose a novel convolution block which significantly reduces the computational burden while surpassing the current state-of-the-art. Our model, dubbed EffNet, is optimised for models which are slim to begin with and is created to tackle issues in existing models such as MobileNet and ShuffleNet. Egocentric Spatial Memory(ESM) Egocentric Spatial Memory EgoCoder Programming has been an important skill for researchers and practitioners in computer science and other related areas. To learn basic programing skills, a long-time systematic training is usually required for beginners. According to a recent market report, the computer software market is expected to continue expanding at an accelerating speed, but the market supply of qualified software developers can hardly meet such a huge demand. In recent years, the surge of text generation research works provides the opportunities to address such a dilemma through automatic program synthesis. In this paper, we propose to make our try to solve the program synthesis problem from a data mining perspective. To address the problem, a novel generative model, namely EgoCoder, will be introduced in this paper. EgoCoder effectively parses program code into abstract syntax trees (ASTs), where the tree nodes will contain the program code/comment content and the tree structure can capture the program logic flows. Based on a new unit model called Hsu, EgoCoder can effectively capture both the hierarchical and sequential patterns in the program ASTs. Extensive experiments will be done to compare EgoCoder with the state-of-the-art text generation methods, and the experimental results have demonstrated the effectiveness of EgoCoder in addressing the program synthesis problem. EgoNet EgoNet (Egocentric Network Study Software) for the collection and analysis of egocentric social network data. It helps the user to collect and analyse all the egocentric network data (all social network data of a website on the Internet), and provide general global network measures and data matrixes that can be used for further analysis by other software. The egonet is the result of the links that it gives and receives certain address on the Internet, and EgoNet is dedicated to collecting information about them and present it in a way useful to the users. Egonet is written in Java, so that the computer where it is going to be used must have the JRE installed. EgoNet is open source software, licensed under GPL. Anomaly detection in static networks using egonets Ehlers’s Autocorrelation Periodogram The point of the Ehlers Autocorrelation Periodogram is to dynamically set a period between a minimum and a maximum period length. While I leave the exact explanation of the mechanic to Dr. Ehlers’s book, for all practical intents and purposes, in my opinion, the punchline of this method is to attempt to remove a massive source of overfitting from trading system creation-namely specifying a lookback period. Eigen Eigen is a high-level C++ library of template headers for linear algebra, matrix and vector operations, numerical solvers and related algorithms. Eigen is an open source library licensed under MPL2 starting from version 3.1.1. Earlier versions were licensed under LGPL3+. Eigen is often noted for its elegant API, versatile fixed and dynamic matrix capabilities and a range of dense and sparse solvers. To achieve high performance, Eigen utilizes explicit vectorization for the SSE 2/3/4, ARM NEON, and AltiVec instruction sets. RcppEigen Eigenface Eigenfaces is the name given to a set of eigenvectors when they are used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby (1987) and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set. EigenNetwork In many applications, the interdependencies among a set of $N$ time series ${ x_{nk}, k>0 }_{n=1}^{N}$ are well captured by a graph or network $G$. The network itself may change over time as well (i.e., as $G_k$). We expect the network changes to be at a much slower rate than that of the time series. This paper introduces eigennetworks, networks that are building blocks to compose the actual networks $G_k$ capturing the dependencies among the time series. These eigennetworks can be estimated by first learning the time series of graphs $G_k$ from the data, followed by a Principal Network Analysis procedure. Algorithms for learning both the original time series of graphs and the eigennetworks are presented and discussed. Experiments on simulated and real time series data demonstrate the performance of the learning and the interpretation of the eigennetworks. EigenPro 2.0 In recent years machine learning methods that nearly interpolate the data have achieved remarkable success. In many settings achieving near-zero training error leads to excellent test results. In this work we show how the mathematical and conceptual simplicity of interpolation can be harnessed to construct a framework for very efficient, scalable and accurate kernel machines. Our main innovation is in constructing kernel machines that output solutions mathematically equivalent to those obtained using standard kernels, yet capable of fully utilizing the available computing power of a parallel computational resource, such as GPU. Such utilization is key to strong performance since much of the computational resource capability is wasted by the standard iterative methods. The computational resource and data adaptivity of our learned kernels is based on theoretical convergence bounds. The resulting algorithm, which we call EigenPro 2.0, is accurate, principled and very fast. For example, using a single GPU, training on ImageNet with $1.3\times 10^6$ data points and $1000$ labels takes under an hour, while smaller datasets, such as MNIST, take seconds. Moreover, as the parameters are chosen analytically, based on the theory, little tuning beyond selecting the kernel and kernel parameter is needed, further facilitating the practical use of these methods. EigenRec Sparsity presents one of the major challenges of Collaborative Filtering. Graph-based methods are known to alleviate its effects, however their use is often computationally prohibitive; Latent-Factor methods, on the other hand, present a reasonable and viable alternative. In this paper, we introduce EigenRec; a versatile and efficient Latent-Factor framework for Top-N Recommendations, that generalizes the well-known PureSVD algorithm (a) providing intuition about its inner structure, (b) paving the path towards improving its efficacy and, at the same time, (c) reducing its complexity. One of our central goals in this work is to ensure the applicability of our method in realistic big-data scenarios. To this end, we propose building our model using a computationally efficient Lanczos-based procedure, we discuss its Parallel Implementation in distributed computing environments, and we verify its favourable performance using real-world datasets. Furthermore, from a qualitative point of view, a comprehensive set of experiments on the MovieLens and the Yahoo!R2Music datasets based on widely applied performance metrics, indicate that EigenRec outperforms several state-of-the-art algorithms, in terms of Standard and Long-Tail recommendation accuracy, exhibiting low susceptibility to sparsity, even in its most extreme manifestations the Cold-Start problems. Eigenvalue Corrected Noisy Natural Gradient Variational Bayesian neural networks combine the flexibility of deep learning with Bayesian uncertainty estimation. However, inference procedures for flexible variational posteriors are computationally expensive. A recently proposed method, noisy natural gradient, is a surprisingly simple method to fit expressive posteriors by adding weight noise to regular natural gradient updates. Noisy K-FAC is an instance of noisy natural gradient that fits a matrix-variate Gaussian posterior with minor changes to ordinary K-FAC. Nevertheless, a matrix-variate Gaussian posterior does not capture an accurate diagonal variance. In this work, we extend on noisy K-FAC to obtain a more flexible posterior distribution called eigenvalue corrected matrix-variate Gaussian. The proposed method computes the full diagonal re-scaling factor in Kronecker-factored eigenbasis. Empirically, our approach consistently outperforms existing algorithms (e.g., noisy K-FAC) on regression and classification tasks. Eigenvalues, Eigenvectors An eigenvector of a square matrix is a non-zero vector that, when the matrix is multiplied by , yields a constant multiple of , the multiplier being commonly denoted by d. That is Av = dv. The number d is called the eigenvalue of A corresponding to v. Eikosogram Eikosograms provide a nice visual representation of statistical correlation, because when the two variables are independent, then the value of one, say X, doesn’t affect the probability of the second, Y. This visually translates into a horizontal pattern, which easily contrasts with a staircase shape that occurs when the variables are dependent or correlated. http://…/eikosograms EILearn We propose an algorithm for incremental learning of classifiers. The proposed method enables an ensemble of classifiers to learn incrementally by accommodating new training data. We use an effective mechanism to overcome the stability-plasticity dilemma. In incremental learning, the general convention is to use only the knowledge acquired in the previous phase but not the previously seen data. We follow this convention by retaining the previously acquired knowledge which is relevant and using it along with the current data. The performance of each classifier is monitored to eliminate the poorly performing classifiers in the subsequent phases. Experimental results show that the proposed approach outperforms the existing incremental learning approaches. einops Deep learning operations rethinked (supports tf, pytorch, chainer, gluon and others) Elapsed Time based Dynamic Passes Combined-counting(ETDPC) ➘ “Variable Size based Fixed Passes Combined-counting” ELASTIC Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). We argue that the scale policy should be learned from data. In this paper, we introduce ELASTIC, a simple, efficient and yet very effective approach to learn instance-specific scale policy from data. We formulate the scaling policy as a non-linear function inside the network’s structure that (a) is learned from data, (b) is instance specific, (c) does not add extra computation, and (d) can be applied on any network architecture. We applied ELASTIC to several state-of-the-art network architectures and showed consistent improvement without extra (sometimes even lower) computation on ImageNet classification, MSCOCO multi-label classification, and PASCAL VOC semantic segmentation. Our results show major improvement for images with scale challenges e.g. images with several small objects or objects with large scale variations. Our code and models will be publicly available soon. Elastic Architecture Transfer Mechanism for Accelerating Large-Scale Neural Architecture Search(EAT-NAS) Neural architecture search (NAS) methods have been proposed to release human experts from tedious architecture engineering. However, most current methods are constrained in small-scale search due to the issue of computational resources. Meanwhile, directly applying architectures searched on small datasets to large-scale tasks often bears no performance guarantee. This limitation impedes the wide use of NAS on large-scale tasks. To overcome this obstacle, we propose an elastic architecture transfer mechanism for accelerating large-scale neural architecture search (EAT-NAS). In our implementations, architectures are first searched on a small dataset (the width and depth of architectures are taken into consideration as well), e.g., CIFAR-10, and the best is chosen as the basic architecture. Then the whole architecture is transferred with elasticity. We accelerate the search process on a large-scale dataset, e.g., the whole ImageNet dataset, with the help of the basic architecture. What we propose is not only a NAS method but a mechanism for architecture-level transfer. In our experiments, we obtain two final models EATNet-A and EATNet-B that achieve competitive accuracies, 73.8% and 73.7% on ImageNet, respectively, which also surpass the models searched from scratch on ImageNet under the same settings. For computational cost, EAT-NAS takes only less than 5 days on 8 TITAN X GPUs, which is significantly less than the computational consumption of the state-of-the-art large-scale NAS methods. Elastic Distributed Training With an increasing demand for training powers for deep learning algorithms and the rapid growth of computation resources in data centers, it is desirable to dynamically schedule different distributed deep learning tasks to maximize resource utilization and reduce cost. In this process, different tasks may receive varying numbers of machines at different time, a setting we call elastic distributed training. Despite the recent successes in large mini-batch distributed training, these methods are rarely tested in elastic distributed training environments and suffer degraded performance in our experiments, when we adjust the learning rate linearly immediately with respect to the batch size. One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes. We therefore propose to smoothly adjust the learning rate over time to alleviate the influence of the noisy momentum estimation. Our experiments on image classification, object detection and semantic segmentation have demonstrated that our proposed Dynamic SGD method achieves stabilized performance when varying the number of GPUs from 8 to 128. We also provide theoretical understanding on the optimality of linear learning rate scheduling and the effects of stochastic momentum. Elastic Functional Principal Component Regression We study regression using functional predictors in situations where these functions contain both phase and amplitude variability. In other words, the functions are misaligned due to errors in time measurements, and these errors can significantly degrade both model estimation and prediction performance. The current techniques either ignore the phase variability, or handle it via pre-processing, i.e., use an off-the-shelf technique for functional alignment and phase removal. We develop a functional principal component regression model which has comprehensive approach in handling phase and amplitude variability. The model utilizes a mathematical representation of the data known as the square-root slope function. These functions preserve the $\mathbf{L}^2$ norm under warping and are ideally suited for simultaneous estimation of regression and warping parameters. Using both simulated and real-world data sets, we demonstrate our approach and evaluate its prediction performance relative to current models. In addition, we propose an extension to functional logistic and multinomial logistic regression Elastic Gossip Distributing Neural Network training is of particular interest for several reasons including scaling using computing clusters, training at data sources such as IOT devices and edge servers, utilizing underutilized resources across heterogeneous environments, and so on. Most contemporary approaches primarily address scaling using computing clusters and require high network bandwidth and frequent communication. This thesis presents an overview of standard approaches to distribute training and proposes a novel technique involving pairwise-communication using Gossip-like protocols, called Elastic Gossip. This approach builds upon an existing technique known as Elastic Averaging SGD (EASGD), and is similar to another technique called Gossiping SGD which also uses Gossip-like protocols. Elastic Gossip is empirically evaluated against Gossiping SGD using the MNIST digit recognition and CIFAR-10 classification tasks, using commonly used Neural Network architectures spanning Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). It is found that Elastic Gossip, Gossiping SGD, and All-reduce SGD perform quite comparably, even though the latter entails a substantially higher communication cost. While Elastic Gossip performs better than Gossiping SGD in these experiments, it is possible that a more thorough search over hyper-parameter space, specific to a given application, may yield configurations of Gossiping SGD that work better than Elastic Gossip. Elastic Net Regularization In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods. Elastic Network In this work we propose a framework for improving the performance of any deep neural network that may suffer from vanishing gradients. To address the vanishing gradient issue, we study a framework, where we insert an intermediate output branch after each layer in the computational graph and use the corresponding prediction loss for feeding the gradient to the early layers. The framework – which we name Elastic network – is tested with several well-known networks on CIFAR10 and CIFAR100 datasets, and the experimental results show that the proposed framework improves the accuracy on both shallow networks (e.g., MobileNet) and deep convolutional neural networks (e.g., DenseNet). We also identify the types of networks where the framework does not improve the performance and discuss the reasons. Finally, as a side product, the computational complexity of the resulting networks can be adjusted in an elastic manner by selecting the output branch according to current computational budget. Elastic Neural Network We propose a new framework for image classification with deep neural networks. The framework introduces intermediate outputs to the computational graph of a network. This enables flexible control of the computational load and balances the tradeoff between accuracy and execution time. Moreover, we present an interesting finding that the intermediate outputs can act as a regularizer at training time, improving the prediction accuracy. In the experimental section we demonstrate the performance of our proposed framework with various commonly used pretrained deep networks in the use case of apparent age estimation. Elasticsearch Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine. Elasticsearch, Logstash and Kibana(ELK Stack) ELK stands for Elasticsearch, Logstash and Kibana. Brief definitions: Logstash: It is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs. It is fully free and fully open source. Elasticsearch: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Kibana: A nifty tool to visualize logs and timestamped data. Element-wiseAttention Gate(EleAttG) Recurrent neural networks (RNNs) are capable of modeling the temporal dynamics of complex sequential information. However, the structures of existing RNN neurons mainly focus on controlling the contributions of current and historical information but do not explore the different importance levels of different elements in an input vector of a time slot. We propose adding a simple yet effective Element-wiseAttention Gate (EleAttG) to an RNN block (e.g., all RNN neurons in a network layer) that empowers the RNN neurons to have the attentiveness capability. For an RNN block, an EleAttG is added to adaptively modulate the input by assigning different levels of importance, i.e., attention, to each element/dimension of the input. We refer to an RNN block equipped with an EleAttG as an EleAtt-RNN block. Specifically, the modulation of the input is content adaptive and is performed at fine granularity, being element-wise rather than input-wise. The proposed EleAttG, as an additional fundamental unit, is general and can be applied to any RNN structures, e.g., standard RNN, Long Short-Term Memory (LSTM), or Gated Recurrent Unit (GRU). We demonstrate the effectiveness of the proposed EleAtt-RNN by applying it to the action recognition tasks on both 3D human skeleton data and RGB videos. Experiments show that adding attentiveness through EleAttGs to RNN blocks significantly boosts the power of RNNs. Eligibility Traces Eligibility traces are one of the basic mechanisms of reinforcement learning. For example, in the popular TD(lambda) algorithm, the lambda refers to the use of an eligibility trace. Almost any temporal-difference (TD) method, such as Q-learning or Sarsa, can be combined with eligibility traces to obtain a more general method that may learn more efficiently. There are two ways to view eligibility traces. The more theoretical view, which we emphasize here, is that they are a bridge from TD to Monte Carlo methods. When TD methods are augmented with eligibility traces, they produce a family of methods spanning a spectrum that has Monte Carlo methods at one end and one-step TD methods at the other. In between are intermediate methods that are often better than either extreme method. In this sense eligibility traces unify TD and Monte Carlo methods in a valuable and revealing way. The other way to view eligibility traces is more mechanistic. From this perspective, an eligibility trace is a temporary record of the occurrence of an event, such as the visiting of a state or the taking of an action. The trace marks the memory parameters associated with the event as eligible for undergoing learning changes. When a TD error occurs, only the eligible states or actions are assigned credit or blame for the error. Thus, eligibility traces help bridge the gap between events and training information. Like TD methods themselves, eligibility traces are a basic mechanism for temporal credit assignment. ELimination Et Choix Traduisant la REalité(ELECTRE) ELECTRE is a family of multi-criteria decision analysis methods that originated in Europe in the mid-1960s. The acronym ELECTRE stands for: ELimination Et Choix Traduisant la REalité (ELimination and Choice Expressing REality). The method was first proposed by Bernard Roy and his colleagues at SEMA consultancy company. A team at SEMA was working on the concrete, multiple criteria, real-world problem of how firms could decide on new activities and had encountered problems using a weighted sum technique. Bernard Roy was called in as a consultant and the group devised the ELECTRE method. As it was first applied in 1965, the ELECTRE method was to choose the best action(s) from a given set of actions, but it was soon applied to three main problems: choosing, ranking and sorting. The method became more widely known when a paper by B. Roy appeared in a French operations research journal. It evolved into ELECTRE I (electre one) and the evolutions have continued with ELECTRE II, ELECTRE III, ELECTRE IV, ELECTRE IS and ELECTRE TRI (electre tree), to mention a few. Bernard Roy is widely recognized as the father of the ELECTRE method, which was one of the earliest approaches in what is sometimes known as the French School of decision making. It is usually classified as an “outranking method” of decision making. There are two main parts to an ELECTRE application: first, the construction of one or several outranking relations, which aims at comparing in a comprehensive way each pair of actions; second, an exploitation procedure that elaborates on the recommendations obtained in the first phase. The nature of the recommendation depends on the problem being addressed: choosing, ranking or sorting. Usually the Electre Methods are used to discard some alternatives to the problem, which are unacceptable. After that we can use another MCDA to select the best one. The Advantage of using the Electre Methods before is that we can apply another MCDA with a restricted set of alternatives saving much time. Criteria in ELECTRE methods have two distinct sets of parameters: the importance coefficients and the veto thresholds. OutrankingTools ElimiNet The task of Reading Comprehension with Multiple Choice Questions, requires a human (or machine) to read a given passage, question pair and select one of the n given options. The current state of the art model for this task first computes a question-aware representation for the passage and then selects the option which has the maximum similarity with this representation. However, when humans perform this task they do not just focus on option selection but use a combination of elimination and selection. Specifically, a human would first try to eliminate the most irrelevant option and then read the passage again in the light of this new information (and perhaps ignore portions corresponding to the eliminated option). This process could be repeated multiple times till the reader is finally ready to select the correct option. We propose ElimiNet, a neural network-based model which tries to mimic this process. Specifically, it has gates which decide whether an option can be eliminated given the passage, question pair and if so it tries to make the passage representation orthogonal to this eliminated option (akin to ignoring portions of the passage corresponding to the eliminated option). The model makes multiple rounds of partial elimination to refine the passage representation and finally uses a selection module to pick the best option. We evaluate our model on the recently released large scale RACE dataset and show that it outperforms the current state of the art model on 7 out of the $13$ question types in this dataset. Further, we show that taking an ensemble of our elimination-selection based method with a selection based method gives us an improvement of 3.1% over the best-reported performance on this dataset. ELiSH Deep Neural Networks have been shown to be beneficial for a variety of tasks, in particular allowing for end-to-end learning and reducing the requirement for manual design decisions. However, still many parameters have to be chosen in advance, also raising the need to optimize them. One important, but often ignored system parameter is the selection of a proper activation function. Thus, in this paper we target to demonstrate the importance of activation functions in general and show that for different tasks different activation functions might be meaningful. To avoid the manual design or selection of activation functions, we build on the idea of genetic algorithms to learn the best activation function for a given task. In addition, we introduce two new activation functions, ELiSH and HardELiSH, which can easily be incorporated in our framework. In this way, we demonstrate for three different image classification benchmarks that different activation functions are learned, also showing improved results compared to typically used baselines. Elite Based Guided Local Search(EB-GLS) Local search is a basic building block in memetic algorithms. Guided Local Search (GLS) can improve the efficiency of local search. By changing the guide function, GLS guides a local search to escape from locally optimal solutions and find better solutions. The key component of GLS is its penalizing mechanism which determines which feature is selected to penalize when the search is trapped in a locally optimal solution. The original GLS penalizing mechanism only makes use of the cost and the current penalty value of each feature. It is well known that many combinatorial optimization problems have a big valley structure, i.e., the better a solution is, the more the chance it is closer to a globally optimal solution. This paper proposes to use big valley structure assumption to improve the GLS penalizing mechanism. An improved GLS algorithm called Elite Biased GLS (EB-GLS) is proposed. EB-GLS records and maintains an elite solution as an estimate of the globally optimal solutions, and reduces the chance of penalizing the features in this solution. We have systematically tested the proposed algorithm on the symmetric traveling salesman problem. Experimental results show that EB-GLS is significantly better than GLS. ➘ “Guided Local Search” ELKI This paper documents the release of the ELKI data mining framework, version 0.7.5. ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers data index structures such as the R*-tree that can provide major performance gains. ELKI is designed to be easy to extend for researchers and students in this domain, and welcomes contributions of additional methods. ELKI aims at providing a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms. We will first outline the motivation for this release, the plans for the future, and then give a brief overview over the new functionality in this version. We also include an appendix presenting an overview on the overall implemented functionality. Ellipsoid Method for Linear Programming In this paper, ellipsoid method for linear programming is derived using only minimal knowledge of algebra and matrices. Unfortunately, most authors first describe the algorithm, then later prove its correctness, which requires a good knowledge of linear algebra. Ellsberg’s Paradox The Ellsberg paradox is a paradox in decision theory in which people’s choices violate the postulates of subjective expected utility. It is generally taken to be evidence for ambiguity aversion. The paradox was popularized by Daniel Ellsberg, although a version of it was noted considerably earlier by John Maynard Keynes. The basic idea is that people overwhelmingly prefer taking on risk in situations where they know specific odds rather than an alternative risk scenario in which the odds are completely ambiguous – they will always choose a known probability of winning over an unknown probability of winning even if the known probability is low and the unknown probability could be a guarantee of winning. For example, given a choice of risks to take (such as bets), people ‘prefer the devil they know’ rather than assuming a risk where odds are difficult or impossible to calculate. Ellsberg proposed two separate thought experiments, the proposed choices in which contradict subjective expected utility. The 2-color problem involves bets on two urns, both of which contain balls of two different colors. The 3-color problem, described below, involves bets on a single urn, which contains balls of three different colors. ELM with Local Connections(ELM-LC) This paper is concerned with the sparsification of the input-hidden weights of ELM (Extreme Learning Machine). For ordinary feedforward neural networks, the sparsification is usually done by introducing certain regularization technique into the learning process of the network. But this strategy can not be applied for ELM, since the input-hidden weights of ELM are supposed to be randomly chosen rather than to be learned. To this end, we propose a modified ELM, called ELM-LC (ELM with local connections), which is designed for the sparsification of the input-hidden weights as follows: The hidden nodes and the input nodes are divided respectively into several corresponding groups, and an input node group is fully connected with its corresponding hidden node group, but is not connected with any other hidden node group. As in the usual ELM, the hidden-input weights are randomly given, and the hidden-output weights are obtained through a least square learning. In the numerical simulations on some benchmark problems, the new ELM-CL behaves better than the traditional ELM. Elo Rating System The Elo rating system is a method for calculating the relative skill levels of players in competitor-versus-competitor games such as chess. It is named after its creator Arpad Elo, a Hungarian-born American physics professor. The Elo system was invented as an improved chess rating system and is also used in many other games. It has also been adapted for use as a rating system for multiplayer competition in a number of video games, and has been adapted to team sports including soccer (association football), American college football, basketball, Major League Baseball, competitive programming, and E-Sports. The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other multiple times are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent’s is expected to win 64% of the time; if the difference is 200 points, then the expected win proportion for the stronger player is 76%. E-MAML ➘ “Krazy World” Embedded Index Coding(EIC) Motivated by applications in distributed storage and distributed computation, we introduce embedded index coding (EIC). EIC is a type of distributed index coding in which nodes in a distributed system act as both senders and receivers of information. We show how embedded index coding is related to index coding in general, and give characterizations and bounds on the communication costs of optimal embedded index codes. We also define task-based EIC, in which each sending node encodes and sends data blocks independently of the other nodes. Task-based EIC is more computationally tractable and has advantages in applications such as distributed storage, in which senders may complete their broadcasts at different times. Finally, we give heuristic algorithms for approximating optimal embedded index codes, and demonstrate empirically that these algorithms perform well. Embedded-Graph In this paper, we propose a new type of graph, denoted as ’embedded-graph’, and its theory, which employs a distributed representation to describe the relations on the graph edges. Embedded-graphs can express linguistic and complicated relations, which cannot be expressed by the existing edge-graphs or weighted-graphs. We introduce the mathematical definition of embedded-graph, translation, edge distance, and graph similarity. We can transform an embedded-graph into a weighted-graph and a weighted-graph into an edge-graph by the translation method and by threshold calculation, respectively. The edge distance of an embedded-graph is a distance based on the components of a target vector, and it is calculated through cosine similarity with the target vector. The graph similarity is obtained considering the relations with linguistic complexity. In addition, we provide some examples and data structures for embedded-graphs in this paper. Embedding Transformation Network with Attention(ETNA) Most companies utilize demographic information to develop their strategy in a market. However, such information is not available to most retail companies. Several studies have been conducted to predict the demographic attributes of users from their transaction histories, but they have some limitations. First, they focused on parameter sharing to predict all attributes but capturing task-specific features is also important in multi-task learning. Second, they assumed that all transactions are equally important in predicting demographic attributes. However, some transactions are more useful than others for predicting a certain attribute. Furthermore, decision making process of models cannot be interpreted as they work in a black-box manner. To address the limitations, we propose an Embedding Transformation Network with Attention (ETNA) model which shares representations at the bottom of the model structure and transforms them to task-specific representations using a simple linear transformation method. In addition, we can obtain more informative transactions for predicting certain attributes using the attention mechanism. The experimental results show that our model outperforms the previous models on all tasks. In our qualitative analysis, we show the visualization of attention weights, which provides business managers with some useful insights. EmbNum Among the fundamental questions in computer science, at least two have a deep impact on mathematics. What can computation compute How many steps does a computation require to solve an instance of the 3-SAT problem Our work addresses the first question, by introducing a new model called the x-machine. The x-machine executes Turing machine instructions and two special types of instructions. Quantum random instructions are physically realizable with a quantum random number generator. Meta instructions can add new states and add new instructions to the x-machine. A countable set of x-machines is constructed, each with a finite number of states and instructions; each x-machine can compute a Turing incomputable language, whenever the quantum randomness measurements behave like unbiased Bernoulli trials. In 1936, Alan Turing posed the halting problem for Turing machines and proved that this problem is unsolvable for Turing machines. Consider an enumeration E_a(i) = (M_i, T_i) of all Turing machines M_i and initial tapes T_i. Does there exist an x-machine X that has at least one evolutionary path X –> X_1 –> X_2 –> . . . –> X_m, so at the mth stage x-machine X_m can correctly determine for 0 <= i <= m whether M_i’s execution on tape T_i eventually halts We demonstrate an x-machine Q(x) that has one such evolutionary path. The existence of this evolutionary path suggests that David Hilbert was not misguided to propose in 1900 that mathematicians search for finite processes to help construct mathematical proofs. Our refinement is that we cannot use a fixed computer program that behaves according to a fixed set of mechanical rules. We must pursue methods that exploit randomness and self-modification so that the complexity of the program can increase as it computes. Embodied Question Answering(EmbodiedQA) Revisiting EmbodiedQA: A Simple Baseline and Beyond Embodied Visual Recognition Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded. In contrast, humans and other embodied agents have the ability to move in the environment, and actively control the viewing angle to better understand object shapes and semantics. In this work, we introduce the task of Embodied Visual Recognition (EVR): An agent is instantiated in a 3D environment close to an occluded target object, and is free to move in the environment to perform object classification, amodal object localization, and amodal object segmentation. To address this, we develop a new model called Embodied Mask R-CNN, for agents to learn to move strategically to improve their visual recognition abilities. We conduct experiments using the House3D environment. Experimental results show that: 1) agents with embodiment (movement) achieve better visual recognition performance than passive ones; 2) in order to improve visual recognition abilities, agents can learn strategical moving paths that are different from shortest paths. EmbraceNet Classification using multimodal data arises in many machine learning applications. It is crucial not only to model cross-modal relationship effectively but also to ensure robustness against loss of part of data or modalities. In this paper, we propose a novel deep learning-based multimodal fusion architecture for classification tasks, which guarantees compatibility with any kind of learning models, deals with cross-modal information carefully, and prevents performance degradation due to partial absence of data. We employ two datasets for multimodal classification tasks, build models based on our architecture and other state-of-the-art models, and analyze their performance on various situations. The results show that our architecture outperforms the other multimodal fusion architectures when some parts of data are not available. EML-NET In this work, we apply state-of-the-art Convolutional Neural Network(CNN) architectures for saliency prediction. Our results show that better saliency features can be delivered by a deeper CNN model. However, it is very space-consuming to apply a complex model due to the large size of input images. The space complexity becomes even more problematic when we extract features from multiple convolutional layers or different models. In this paper, we propose a modular saliency system which aims at splitting the whole network into small modules. The main difference in our approach s that the encoder and decoder can be separately trained for the scalability. Furthermore, the encoder can contain more than one CNN model to extract features and the models can have different architectures or pre-trained on different datasets. This parallel design allows us to better utilize the computational space in order to apply more powerful encoder. More importantly, our network can be easily expanded almost without extra spaces, other pre-trained CNN models can be combined for a wider range of visual knowledge. We denote our expandable multi-layer network as EML-NET in this paper. Our method is evaluated on three public saliency benchmarks, SALICON, MIT300 and CAT2000. The proposed EML-NET achieves state-of-the-art results on the metric of Normalized Scanpath Saliency using a modified loss function. Emotional Chatting Machine(EMC) Emotional intelligence is one of the key factors to the success of dialogue systems or conversational agents. In this paper, we propose Emotional Chatting Machine (ECM) which generates responses that are appropriate not only at the content level (relevant and grammatical) but also at the emotion level (consistent emotional expression). To the best of our knowledge, this is the first work that addresses the emotion factor in large-scale conversation generation. ECM addresses the factor in three ways: modeling high-level abstraction of emotion expression by embedding emotion categories, changing of implicit internal emotion states, and using explicit emotion expressions with an external emotion vocabulary. Experiments show that our model can generate responses appropriate not only in content but also in emotion. Emotional Word Vector(EVEC) It is important for machines to interpret human emotions properly for better human-machine communications, as emotion is an essential part of human-to-human communications. One aspect of emotion is reflected in the language we use. How to represent emotions in texts is a challenge in natural language processing (NLP). Although continuous vector representations like word2vec have become the new norm for NLP problems, their limitations are that they do not take emotions into consideration and can unintentionally contain bias toward certain identities like different genders. This thesis focuses on improving existing representations in both word and sentence levels by explicitly taking emotions inside text and model bias into account in their training process. Our improved representations can help to build more robust machine learning models for affect-related text classification like sentiment/emotion analysis and abusive language detection. We first propose representations called emotional word vectors (EVEC), which is learned from a convolutional neural network model with an emotion-labeled corpus, which is constructed using hashtags. Secondly, we extend to learning sentence-level representations with a huge corpus of texts with the pseudo task of recognizing emojis. Our results show that, with the representations trained from millions of tweets with weakly supervised labels such as hashtags and emojis, we can solve sentiment/emotion analysis tasks more effectively. Lastly, as examples of model bias in representations of existing approaches, we explore a specific problem of automatic detection of abusive language. We address the issue of gender bias in various neural network models by conducting experiments to measure and reduce those biases in the representations in order to build more robust classification models. Emphatic Temporal Difference(ETD) Should All Temporal Difference Learning Use Emphasis? ➘ “Emphatic Temporal-Difference Learning Algorithm” Emphatic Temporal-Difference Learning Algorithm(ETD) In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy variations of the Mountain Car problem. The initial motivation for developing ETD was that it has good convergence properties under \emph{off}-policy training (Sutton, Mahmood \& White 2016), but it is also a new algorithm for the \emph{on}-policy case. In both our on-policy and off-policy experiments, we found that each method converged to a characteristic asymptotic level of error, with ETD better than TD(0). TD(0) achieved a still lower error level temporarily before falling back to its higher asymptote, whereas ETD never showed this kind of ‘bounce’. In the off-policy case (in which TD(0) is not guaranteed to converge), ETD was significantly slower. Empirical Bayes Geometric Mean(EBGM) Adjusted estimate for the relative reporting ratio. Example: if EBGM=3.9 for acetaminophen-hepatic failure, then this drug-event combination occurred in the data 3.9 times more frequently than expected under the assumption of no association between the drug and the event. openEBGM Empirical Bayes Matrix Factorization(EBMF) Matrix factorization methods – including Factor analysis (FA), and Principal Components Analysis (PCA) – are widely used for inferring and summarizing structure in multivariate data. Many matrix factorization methods exist, corresponding to different assumptions on the elements of the underlying matrix factors. For example, many recent methods use a penalty or prior distribution to achieve sparse representations (‘Sparse FA/PCA’). Here we introduce a general Empirical Bayes approach to matrix factorization (EBMF), whose key feature is that it uses the observed data to estimate prior distributions on matrix elements. We derive a correspondingly-general variational fitting algorithm, which reduces fitting EBMF to solving a simpler problem – the so-called ‘normal means’ problem. We implement this general algorithm, but focus particular attention on the use of sparsity-inducing priors that are uni-modal at 0. This yields a sparse EBMF approach – essentially a version of sparse FA/PCA – that automatically adapts the amount of sparsity to the data. We demonstrate the benefits of our approach through both numerical comparisons with competing methods and through analysis of data from the GTEx (Genotype Tissue Expression) project on genetic associations across 44 human tissues. In numerical comparisons EBMF often provides more accurate inferences than other methods. In the GTEx data, EBMF identifies interpretable structure that concords with known relationships among human tissues. Software implementing our approach is available at https://…/flashr. Empirical Bayes Method Empirical Bayes methods are procedures for statistical inference in which the prior distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed before any data are observed. Despite this difference in perspective, empirical Bayes may be viewed as an approximation to a fully Bayesian treatment of a hierarchical model wherein the parameters at the highest level of the hierarchy are set to their most likely values, instead of being integrated out. Empirical Bayes, also known as maximum marginal likelihood, represents one approach for setting hyperparameters. Empirical Equilibrium We introduce empirical equilibrium, the prediction in a game that selects the Nash equilibria that can be approximated by a sequence of payoff-monotone distributions, a well-documented proxy for empirically plausible behavior. Then, we reevaluate implementation theory based on this equilibrium concept. We show that in a partnership dissolution environment with complete information, two popular auctions that are essentially equivalent for the Nash equilibrium prediction, can be expected to differ in fundamental ways when they are operated. Besides the direct policy implications, two general consequences follow. First, a mechanism designer may not be constrained by typical invariance properties. Second, a mechanism designer who does not account for the empirical plausibility of equilibria may inadvertently design implicitly biased mechanisms. Empirical Likelihood(EL) Empirical likelihood (EL) is an estimation method in statistics. Empirical likelihood estimates require few assumptions about the error distribution compared to similar methods like maximum likelihood. EL can handle data well as long as it is independent and identically distributed (iid). EL performs well even when the distribution is asymmetric or censored. EL methods are also useful since they can easily incorporate constraints and prior information. Art Owen pioneered work in this area with his 1988 paper. Empirical Orthogonal Function Analysis(EOF) In statistics, EOF analysis is known as Principal Component Analysis (PCA). As such, EOF analysis is sometimes classified as a multivariate statistical technique. Empirical Orthogonal Teleconnections(EOT) Calculating functions empirically and orthogonally from a given space-time dataset. The method is rooted in multiple linear regression and yields solutions that are orthogonal in one direction, either space or time. remote Empirical Requirements Research Classifier(ERRC) Research must be reproducible in order to make an impact on science and to contribute to the body of knowledge in our field. Yet studies have shown that 70% of research from academic labs cannot be reproduced. In software engineering, and more specifically requirements engineering (RE), reproducible research is rare, with datasets not always available or methods not fully described. This lack of reproducible research hinders progress, with researchers having to replicate an experiment from scratch. A researcher starting out in RE has to sift through conference papers, finding ones that are empirical, then must look through the data available from the empirical paper (if any) to make a preliminary determination if the paper can be reproduced. This paper addresses two parts of that problem, identifying RE papers and identifying empirical papers within the RE papers. Recent RE and empirical conference papers were used to learn features and to build an automatic classifier to identify RE and empirical papers. We introduce the Empirical Requirements Research Classifier (ERRC) method, which uses natural language processing and machine learning to perform supervised classification of conference papers. We compare our method to a baseline keyword-based approach. To evaluate our approach, we examine sets of papers from the IEEE Requirements Engineering conference and the IEEE International Symposium on Software Testing and Analysis. We found that the ERRC method performed better than the baseline method in all but a few cases. Empusa The RDF data model facilitates integration of diverse data available in structured and semi-structured formats. To obtain an RDF graph with a low amount of errors and internal redundancy, the chosen ontology must be consistently applied. However, with each addition of new diverse data the ontology must evolve thereby increasing its complexity, which could lead to accumulation of unintended erroneous composites. Thus, there is a need for a gatekeeping system that compares the intended content described in the ontology with the actual content of the resource. Here we present Empusa, a tool that has been developed to facilitate the creation of composite RDF resources from disparate sources. Empusa can be used to convert a schema into an associated application programming interface (API) that can be used to perform data consistency checks and generates Markdown documentation to make persistent URLs resolvable. In this way, the use of Empusa ensures consistency within and between the ontology (OWL), the Shape Expressions (ShEx) describing the graph structure, and the content of the resource. ENCI Although nonstationary data are more common in the real world, most existing causal discovery methods do not take nonstationarity into consideration. In this letter, we propose a kernel embedding-based approach, ENCI, for nonstationary causal model inference where data are collected from multiple domains with varying distributions. In ENCI, we transform the complicated relation of a cause-effect pair into a linear model of variables of which observations correspond to the kernel embeddings of the cause-and-effect distributions in different domains. In this way, we are able to estimate the causal direction by exploiting the causal asymmetry of the transformed linear model. Furthermore, we extend ENCI to causal graph discovery for multiple variables by transforming the relations among them into a linear nongaussian acyclic model. We show that by exploiting the nonstationarity of distributions, both cause-effect pairs and two kinds of causal graphs are identifiable under mild conditions. Experiments on synthetic and real-world data are conducted to justify the efficacy of ENCI over major existing methods. Enclosure Diagram The enclosure diagram is also space filling, using containment rather than adjacency to represent the hierarchy. Introduced by Ben Shneiderman in 1991, a treemap recursively subdivides area into rectangles. As with adjacency diagrams, the size of any node in the tree is quickly revealed. Encoder Based Lifelong Learning This paper introduces a new lifelong learning solution where a single model is trained for a sequence of tasks. The main challenge that vision systems face in this context is catastrophic forgetting: as they tend to adapt to the most recently seen task, they lose performance on the tasks that were learned previously. Our method aims at preserving the knowledge of the previous tasks while learning a new one by using autoencoders. For each task, an under-complete autoencoder is learned, capturing the features that are crucial for its achievement. When a new task is presented to the system, we prevent the reconstructions of the features with these autoencoders from changing, which has the effect of preserving the information on which the previous tasks are mainly relying. At the same time, the features are given space to adjust to the most recent environment as only their projection into a low dimension submanifold is controlled. The proposed system is evaluated on image classification tasks and shows a reduction of forgetting over the state-of-the-art Encoder CFG-Decoder Semantic parsing can be defined as the process of mapping natural language sentences into a machine interpretable, formal representation of its meaning. Semantic parsing using LSTM encoder-decoder neural networks have become promising approach. However, human automated translation of natural language does not provide grammaticality guarantees for the sentences generate such a guarantee is particularly important for practical cases where a data base query can cause critical errors if the sentence is ungrammatical. In this work, we propose an neural architecture called Encoder CFG-Decoder, whose output conforms to a given context-free grammar. Results are show for any implementation of such architecture display its correctness and providing benchmark accuracy levels better than the literature. Encog Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported. Most Encog training algoritms are multi-threaded and scale well to multicore hardware. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms. Encog has been in active development since 2008. Encog: Library of Interchangeable Machine Learning Models for Java and C# End of Potential Line(EOPL) We introduce the problem EndOfPotentialLine and the corresponding complexity class EOPL of all problems that can be reduced to it in polynomial time. This class captures problems that admit a single combinatorial proof of their joint membership in the complexity classes PPAD of fixpoint problems and PLS of local search problems. EOPL is a combinatorially-defined alternative to the class CLS (for Continuous Local Search), which was introduced in with the goal of capturing the complexity of some well-known problems in PPAD $\cap$ PLS that have resisted, in some cases for decades, attempts to put them in polynomial time. Two of these are Contraction, the problem of finding a fixpoint of a contraction map, and P-LCP, the problem of solving a P-matrix Linear Complementarity Problem. We show that EndOfPotentialLine is in CLS via a two-way reduction to EndOfMeteredLine. The latter was defined in to show query and cryptographic lower bounds for CLS. Our two main results are to show that both PL-Contraction (Piecewise-Linear Contraction) and P-LCP are in EOPL. Our reductions imply that the promise versions of PL-Contraction and P-LCP are in the promise class UniqueEOPL, which corresponds to the case of a single potential line. This also shows that simple-stochastic, discounted, mean-payoff, and parity games are in EOPL. Using the insights from our reduction for PL-Contraction, we obtain the first polynomial-time algorithms for finding fixed points of contraction maps in fixed dimension for any $\ell_p$ norm, where previously such algorithms were only known for the $\ell_2$ and $\ell_\infty$ norms. Our reduction from P-LCP to EndOfPotentialLine allows a technique of Aldous to be applied, which in turn gives the fastest-known randomized algorithm for the P-LCP. End-Game-First Curriculum Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning end-games first is that once the actions which lead to a terminal state are understood, it becomes possible to incrementally learn the consequences of actions that are further away from a terminal state – we call this an end-game-first curriculum. Currently the state-of-the-art machine learning player for general board games, AlphaZero by Google DeepMind, does not employ a structured training curriculum; instead learning from the entire game at all times. By employing an end-game-first training curriculum to train an AlphaZero inspired player, we empirically show that the rate of learning of an artificial player can be improved during the early stages of training when compared to a player not using a training curriculum. Endogenous Variable In a statistical model, a parameter or variable is said to be endogenous when there is a correlation between the parameter or variable and the error term. Endogeneity can arise as a result of measurement error, autoregression with autocorrelated errors, simultaneity and omitted variables. Broadly, a loop of causality between the independent and dependent variables of a model leads to endogeneity. For example, in a simple supply and demand model, when predicting the quantity demanded in equilibrium, the price is endogenous because producers change their price in response to demand and consumers change their demand in response to price. In this case, the price variable is said to have total endogeneity once the demand and supply curves are known. In contrast, a change in consumer tastes or preferences would be an exogenous change on the demand curve. End-to-End Neural Matching Framework(EENMF) E-commerce sponsored search contributes an important part of revenue for the e-commerce company. In consideration of effectiveness and efficiency, a large-scale sponsored search system commonly adopts a multi-stage architecture. We name these stages as \textit{ad retrieval}, \textit{ad pre-ranking} and \textit{ad ranking}. \textit{Ad retrieval} and \textit{ad pre-ranking} are collectively referred to as \textit{ad matching} in this paper. We propose an end-to-end neural matching framework (EENMF) to model two tasks—\textit{vector-based ad retrieval} and \textit{neural networks based ad pre-ranking}. Under the deep \textit{matching} framework, \textit{vector-based ad retrieval} harnesses user recent behavior sequence to retrieve relevant ad candidates without the constraint of keyword bidding. Simultaneously, the deep model is employed to perform the global pre-ranking of ad candidates from multiple retrieval paths effectively and efficiently. Besides, the proposed model tries to optimize the pointwise cross-entropy loss which is consistent with the objective of predict models in the ranking stage. We conduct extensive evaluation to validate the performance of the proposed framework. In the real traffic of a large-scale e-commerce sponsored search, the proposed approach significantly outperforms the baseline. Energy-based Exploration of Random Features(EERF) The randomized-feature approach has been successfully employed in large-scale kernel approximation and supervised learning. The distribution from which the random features are drawn impacts the number of features required to efficiently perform a learning task. Recently, it has been shown that employing data-dependent randomization improves the performance in terms of the required number of random features. In this paper, we are concerned with the randomized-feature approach in supervised learning for good generalizability. We propose the Energy-based Exploration of Random Features (EERF) algorithm based on a data-dependent score function that explores the set of possible features and exploits the promising regions. We prove that the proposed score function with high probability recovers the spectrum of the best fit within the model class. Our empirical results on several benchmark datasets further verify that our method requires smaller number of random features to achieve a certain generalization error compared to the state-of-the-art while introducing negligible pre-processing overhead. EERF can be implemented in a few lines of code and requires no additional tuning parameters. EnergyNet We present ENERGYNET , a new framework for analyzing and building artificial neural network architectures. Our approach adaptively learns the structure of the networks in an unsupervised manner. The methodology is based upon the theoretical guarantees of the energy function of restricted Boltzmann machines (RBM) of infinite number of nodes. We present experimental results to show that the final network adapts to the complexity of a given problem. ENet The ability to perform pixel-wise semantic segmentation in real-time is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point operations and have long run-times that hinder their usability. In this paper, we propose a novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation. ENet is up to 18× faster, requires 75× less FLOPs, has 79× less parameters, and provides similar or better accuracy to existing models. We have tested it on CamVid, Cityscapes and SUN datasets and report on comparisons with existing state-of-the-art methods, and the trade-offs between accuracy and processing time of a network. We present performance measurements of the proposed architecture on embedded systems and suggest possible software improvements that could make ENet even faster. Enhanced Concept Profiling Framework(ECPF) When concept drift is detected during classification in a data stream, a common remedy is to retrain a framework’s classifier. However, this loses useful information if the classifier has learnt the current concept well, and this concept will recur again in the future. Some frameworks retain and reuse classifiers, but it can be time-consuming to select an appropriate classifier to reuse. These frameworks rarely match the accuracy of state-of-the-art ensemble approaches. For many data stream tasks, speed is important: fast, accurate frameworks are needed for time-dependent applications. We propose the Enhanced Concept Profiling Framework (ECPF), which aims to recognise recurring concepts and reuse a classifier trained previously, enabling accurate classification immediately following a drift. The novelty of ECPF is in how it uses similarity of classifications on new data, between a new classifier and existing classifiers, to quickly identify the best classifier to reuse. It always trains both a new classifier and a reused classifier, and retains the more accurate classifier when concept drift occurs. Finally, it creates a copy of reused classifiers, so a classifier well-suited for a recurring concept will not be impacted by being trained on a different concept. In our experiments, ECPF classifies significantly more accurately than a state-of-the-art classifier reuse framework (Diversity Pool) and a state-of-the-art ensemble technique (Adaptive Random Forest) on synthetic datasets with recurring concepts. It classifies real-world datasets five times faster than Diversity Pool, and six times faster than Adaptive Random Forest and is not significantly less accurate than either. Enhanced Least Absolute Shrinkage Operator(ELASSO) elasso Enhanced Recurrent Neural Network(ERNN) Recently, neural networks have shown promising results for named entity recognition (NER), which needs a number of labeled data to for model training. When meeting a new domain (target domain) for NER, there is no or a few labeled data, which makes domain NER much more difficult. As NER has been researched for a long time, some similar domain already has well labelled data (source domain). Therefore, in this paper, we focus on domain NER by studying how to utilize the labelled data from such similar source domain for the new target domain. We design a kernel function based instance transfer strategy by getting similar labelled sentences from a source domain. Moreover, we propose an enhanced recurrent neural network (ERNN) by adding an additional layer that combines the source domain labelled data into traditional RNN structure. Comprehensive experiments are conducted on two datasets. The comparison results among HMM, CRF and RNN show that RNN performs bette than others. When there is no labelled data in domain target, compared to directly using the source domain labelled data without selecting transferred instances, our enhanced RNN approach gets improvement from 0.8052 to 0.9328 in terms of F1 measure. Enhanced Representation through kNowledge IntEgration(ERNIE) We present a novel language representation model enhanced by knowledge called ERNIE (Enhanced Representation through kNowledge IntEgration). Inspired by the masking strategy of BERT, ERNIE is designed to learn language representation enhanced by knowledge masking strategies, which includes entity-level masking and phrase-level masking. Entity-level strategy masks entities which are usually composed of multiple words.Phrase-level strategy masks the whole phrase which is composed of several words standing together as a conceptual unit.Experimental results show that ERNIE outperforms other baseline methods, achieving new state-of-the-art results on five Chinese natural language processing tasks including natural language inference, semantic similarity, named entity recognition, sentiment analysis and question answering. We also demonstrate that ERNIE has more powerful knowledge inference capacity on a cloze test. Enhanced Super-Resolution Generative Adversarial Network(ESRGAN) The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at https://…/ESRGAN . Ensemble Actor-Critic(EAC) We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for high-level action selection. Empirically we show that TAC, RAC and EAC can achieve state-of-the-art performance on a range of benchmark control tasks, outperforming SAC and several cutting-edge learning algorithms in terms of both sample efficiency and effectiveness. Ensemble Bayesian Optimization(EBO) Bayesian Optimization (BO) has been shown to be a very effective paradigm for tackling hard black-box and non-convex optimization problems encountered in Machine Learning. Despite these successes, the computational complexity of the underlying function approximation has restricted the use of BO to problems that can be handled with less than a few thousand function evaluations. Harder problems like those involving functions operating in very high dimensional spaces may require hundreds of thousands or millions of evaluations or more and become computationally intractable to handle using standard Bayesian Optimization methods. In this paper, we propose Ensemble Bayesian Optimization (EBO) to overcome this problem. Unlike conventional BO methods that operate on a single posterior GP model, EBO works with an ensemble of posterior GP models. Further, we represent each GP model using tile coding random features and an additive function structure. Our approach generates speedups by parallelizing the time consuming hyper-parameter posterior inference and functional evaluations on hundreds of cores and aggregating the models in every iteration of BO. Our extensive experimental evaluation shows that EBO can speed up the posterior inference between 2-3 orders of magnitude (400 times in one experiment) compared to the state-of-the-art by putting data into Mondrian bins without sacrificing the sample quality. We demonstrate the ability of EBO to handle sample-intensive hard optimization problems by applying it to a rover navigation problem with tens of thousands of observations. Ensemble Clustering Algorithm for Graphs(ECG) We propose an ensemble clustering algorithm for graphs (ECG), which is based on the Louvain algorithm and the concept of consensus clustering. We validate our approach by replicating a recently published study comparing graph clustering algorithms over artificial networks, showing that ECG outperforms the leading algorithms from that study. We also illustrate how the ensemble obtained with ECG can be used to quantify the presence of community structure in the graph. Ensemble Distribution Distillation(EnD^2) Ensemble of Neural Network (NN) models are known to yield improvements in accuracy. Furthermore, they have been empirically shown to yield robust measures of uncertainty, though without theoretical guarantees. However, ensembles come at high computational and memory cost, which may be prohibitive for certain application. There has been significant work done on the distillation of an ensemble into a single model. Such approaches decrease computational cost and allow a single model to achieve accuracy comparable to that of an ensemble. However, information about the \emph{diversity} of the ensemble, which can yield estimates of \emph{knowledge uncertainty}, is lost. Recently, a new class of models, called Prior Networks, has been proposed, which allows a single neural network to explicitly model a distribution over output distributions, effectively emulating an ensemble. In this work ensembles and Prior Networks are combined to yield a novel approach called \emph{Ensemble Distribution Distillation} (EnD$^2$), which allows distilling an ensemble into a single Prior Network. This allows a single model to retain both the improved classification performance as well as measures of diversity of the ensemble. In this initial investigation the properties of EnD$^2$ have been investigated and confirmed on an artificial dataset. Ensemble Empirical Mode Decomposition(EEMD) This approach consists of sifting an ensemble of white noise-added signal (data) and treats the mean as the final true result. Finite, not infinitesimal, amplitude white noise is necessary to force the ensemble to exhaust all possible solutions in the sifting process, thus making the different scale signals to collate in the proper intrinsic mode functions (IMF) dictated by the dyadic filter banks. As EEMD is a time-space analysis method, the added white noise is averaged out with sufficient number of trials; the only persistent part that survives the averaging process is the component of the signal (original data), which is then treated as the true and more physical meaningful answer. The effect of the added white noise is to provide a uniform reference frame in the time-frequency space; therefore, the added noise collates the portion of the signal of comparable scale in one IMF. With this ensemble mean, one can separate scales naturally without any a priori subjective criterion selection as in the intermittence test for the original EMD algorithm. This new approach utilizes the full advantage of the statistical characteristics of white noise to perturb the signal in its true solution neighborhood, and to cancel itself out after serving its purpose; therefore, it represents a substantial improvement over the original EMD and is a truly noise-assisted data analysis (NADA) method. Rlibeemd Ensemble Feature Selection Integrating Stability(EFSIS) Ensemble learning that can be used to combine the predictions from multiple learners has been widely applied in pattern recognition, and has been reported to be more robust and accurate than the individual learners. This ensemble logic has recently also been more applied in feature selection. There are basically two strategies for ensemble feature selection, namely data perturbation and function perturbation. Data perturbation performs feature selection on data subsets sampled from the original dataset and then selects the features consistently ranked highly across those data subsets. This has been found to improve both the stability of the selector and the prediction accuracy for a classifier. Function perturbation frees the user from having to decide on the most appropriate selector for any given situation and works by aggregating multiple selectors. This has been found to maintain or improve classification performance. Here we propose a framework, EFSIS, combining these two strategies. Empirical results indicate that EFSIS gives both high prediction accuracy and stability. Ensemble Forecast Framework(ENFF) An accurate load forecast is always important for the power industry and energy players as it enables stakeholders to make critical decisions. In addition, its importance is further increased with growing uncertainties in the generation sector due to the high penetration of renewable energy and the introduction of demand side management strategies. An incremental improvement in grid-level demand forecast of anomalous days can potentially save millions of dollars. However, due to an increasing penetration of renewable energy resources and their dependency on several meteorological and exogenous variables, accurate load forecasting of anomalous days has now become very challenging. To improve the prediction accuracy of the load forecasting, an ensemble forecast framework (ENFF) is proposed with a systematic combination of three multiple predictors, namely Elman neural network (ELM), feedforward neural network (FNN) and radial basis function (RBF) neural network. These predictors are trained using global particle swarm optimization (GPSO) to improve their prediction capability in the ENFF. The outputs of individual predictors are combined using a trim aggregation technique by removing forecasting anomalies. Real recorded data of New England ISO grid is used for training and testing of the ENFF for anomalous days. The forecast results of the proposed ENFF indicate a significant improvement in prediction accuracy in comparison to autoregressive integrated moving average (ARIMA) and back-propagation neural networks (BPNN) based benchmark models. Ensemble Methods In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble refers only to a concrete finite set of alternative models, but typically allows for much more flexible structure to exist between those alternatives. Ensemble Model Patching Two main obstacles preventing the widespread adoption of variational Bayesian neural networks are the high parameter overhead that makes them infeasible on large networks, and the difficulty of implementation, which can be thought of as ‘programming overhead.’ MC dropout [Gal and Ghahramani, 2016] is popular because it sidesteps these obstacles. Nevertheless, dropout is often harmful to model performance when used in networks with batch normalization layers [Li et al., 2018], which are an indispensable part of modern neural networks. We construct a general variational family for ensemble-based Bayesian neural networks that encompasses dropout as a special case. We further present two specific members of this family that work well with batch normalization layers, while retaining the benefits of low parameter and programming overhead, comparable to non-Bayesian training. Our proposed methods improve predictive accuracy and achieve almost perfect calibration on a ResNet-18 trained with ImageNet. Ensemble Partial Least Squares Regression(EnPLS) enpls EnsembleDAgger While imitation learning is often used in robotics, this approach often suffers from data mismatch and compounding errors. DAgger is an iterative algorithm that addresses these issues by aggregating training data from both the expert and novice policies, but does not consider the impact of safety. We present a probabilistic extension to DAgger, which attempts to quantify the confidence of the novice policy as a proxy for safety. Our method, EnsembleDAgger, approximates a GP using an ensemble of neural networks. Using the variance as a measure of confidence, we compute a decision rule that captures how much we doubt the novice, thus determining when it is safe to allow the novice to act. With this approach, we aim to maximize the novice’s share of actions, while constraining the probability of failure. We demonstrate improved safety and learning performance compared to other DAgger variants and classic imitation learning on an inverted pendulum and in the MuJoCo HalfCheetah environment. EnsembleNet Ensembling is a universally useful approach to boost the performance of machine learning models. However, individual models in an ensemble are typically trained independently in separate stages, without information access about the overall ensemble. In this paper, model ensembles are treated as first-class citizens, and their performance is optimized end-to-end with parameter sharing and a novel loss structure that improves generalization. On large-scale datasets including ImageNet, Youtube-8M, and Kinetics, we demonstrate a procedure that starts from a strongly performing single deep neural network, and constructs an EnsembleNet that has both a smaller size and better performance. Moreover, an EnsembleNet can be trained in one stage just like a single model without manual intervention. Enterprise Control Language / Data-Centric Programming Language(ECL) ECL is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process big data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions. Enterprise Data Hub(EDH) Organizations everywhere are grappling with how to manage their growing big data sets from ERP and e-commerce systems, log files, sensor data, social media and more. Apache Hadoop provides a cost-effective enterprise data hub (EDH) to store, transform, cleanse, filter, analyze and gain new value from all kinds of data. ➚ “Data Lake” Enterprise Information Flow(EIF) What is Enterprise Information Flow? The concept is closely connected to its neighboring disciplines: Information Flow, Data Lineage Analysis and Metadata Management. But it’s not the same. This new buzzword is only beginning to be recognized, so let’s get a head start. Information Flow focuses on information processing when it comes to security, throughput optimization and transporters; Data Lineage Analysis studies the way data is transferred between systems; and Metadata Management is all about metadata structure and purpose. Why do we need a new concept then? · Big data is getting really big. From internal company systems, social networks and external data from partners to automatically collected data – a huge amount of information needs to be properly dealt with. The high volume of data is of course connected to the high volume of contributing sources: dozens of online channels, the previously mentioned social networks, portable devices, blogs, news and video content. And every source needs to be correctly described, attributed and integrated into the company’s Enterprise Information Flow. · Systems are getting more and more complicated. EIF needs to be ready not only for big data coming from a wide variety of channels, but also for the many different ways data is transformed and processed inside the system. Old school transformation methods like ETL and SQL scripts are easy and usually well-accounted for, but cracks start to show when it comes to the semantic analysis of non-structured data, Google’s search algorithms, Facebook’s preferential algorithms, automated quality assurance scripts or artificial intelligence methods used for predictive analysis. When it comes to transformations, it’s critical to know how security and other specific attributes change. Another key point is deciding if the information is created or just transformed. · New routes between systems. The number of different ways to transfer data between systems is rapidly growing. Classic ETL and extract transfers are joined by more complicated systems based on SOA, PBM and ESB. It’s also necessary to be ready for new approaches like Data Federation and Logical Data Warehouse, where data saving is not persistent. · Different data types. It’s not about relational data or text anymore. You need to be ready for NoSQL databases, hyperlinks, video, graphics, xml, semi-structured data and other types of information. In complicated environments like these, current solutions fail. New approaches need to be more complex, as the new systems are. It’s necessary to follow data not only on a physical level, but also through more layers of logical abstraction. Let’s sum it up into two main angles of Enterprise Information Flow: 1) New information necessary for decision making appears. Where does it come from? When was it created? Who’s responsible for its quality? 2) Who uses my information and how? Those two sets of questions are vital to Enterprise Information Flow which is a standard part of Enterprise Information Management. Any organization who takes its data seriously is searching for answers anyway, but EIF can provide a more comprehensive overview and merge existing solutions from currently separated fields into one complex policy. A complex solution is precisely what you need, when you’re dealing with complex systems. Entity and Features The article deals with the problem which led to Big Data. Big Data information technology is the set of methods and means of processing different types of structured and unstructured dynamic large amounts of data for their analysis and use of decision support. Features of NoSQL databases and categories are described. The developed Big Data Model ‘Entity and Features’ allows determining the distance between the sources of data on the availability of information about a particular entity. The information structure of Big Data has been devised. It became a basis for further research and for concentrating on a problem of development of diverse data without their preliminary integration. Entity Linking(EL) In natural language processing, entity linking, named entity linking (NEL), named entity disambiguation (NED), named entity recognition and disambiguation (NERD) or named entity normalization (NEN) is the task of determining the identity of entities mentioned in text. For example, given the sentence ‘Paris is the capital of France’, the idea is to determine that ‘Paris’ refers to the city of Paris and not to Paris Hilton or any other entity that could be referred as ‘Paris’. NED is different from named entity recognition (NER) in that NER identifies the occurrence or mention of a named entity in text but it does not identify which specific entity it is. Entity linking requires a knowledge base containing the entities to which entity mentions can be linked. A popular choice for entity linking on open domain text are knowledge-bases based on Wikipedia, in which each page is regarded as a named entity. NED using Wikipedia entities has been also called wikification (see Wikify! an early entity linking system ). A knowledge base may also be induced automatically from training text or manually built. Entity Neighbors Knowledge Graph Embedding (KGE) aims to represent entities and relations of knowledge graph in a low-dimensional continuous vector space. Recent works focus on incorporating structural knowledge with additional information, such as entity descriptions, relation paths and so on. However, common used additional information usually contains plenty of noise, which makes it hard to learn valuable representation. In this paper, we propose a new kind of additional information, called entity neighbors, which contain both semantic and topological features about given entity. We then develop a deep memory network model to encode information from neighbors. Employing a gating mechanism, representations of structure and neighbors are integrated into a joint representation. The experimental results show that our model outperforms existing KGE methods utilizing entity descriptions and achieves state-of-the-art metrics on 4 datasets. Entity Resolution(ER) Entity Resolution (ER), the problem of extracting, matching and resolving entity mentions in structured and unstructured data, is a long-standing challenge in database management, information retrieval, machine learning, natural language processing and statistics. Ironically, different subdisciplines refer to it by a variety of names, including record linkage, deduplication, co-reference resolution, reference reconciliation, object consolidation, identity uncertainty and database hardening. Accurate and fast ER has huge practical implications in a wide variety of commercial, scientific and security domains. Despite the long history of work on ER there is still a surprising diversity of approaches – including rule based methods, pair-wise classification, clustering approaches, and richer forms of probabilistic inference – and a lack of guiding theory. Meanwhile, in the age of big data, the need for high quality entity resolution is only growing. We are inundated with more and more data that needs to be integrated, aligned and matched before further utility can be extracted. Entity2Topic(E2T) A major proportion of a text summary includes important entities found in the original text. These entities build up the topic of the summary. Moreover, they hold commonsense information once they are linked to a knowledge base. Based on these observations, this paper investigates the usage of linked entities to guide the decoder of a neural text summarizer to generate concise and better summaries. To this end, we leverage on an off-the-shelf entity linking system (ELS) to extract linked entities and propose Entity2Topic (E2T), a module easily attachable to a sequence-to-sequence model that transforms a list of entities into a vector representation of the topic of the summary. Current available ELS’s are still not sufficiently effective, possibly introducing unresolved ambiguities and irrelevant entities. We resolve the imperfections of the ELS by (a) encoding entities with selective disambiguation, and (b) pooling entity vectors using firm attention. By applying E2T to a simple sequence-to-sequence model with attention mechanism as base model, we see significant improvements of the performance in the Gigaword (sentence to title) and CNN (long document to multi-sentence highlights) summarization datasets by at least 2 ROUGE points. Entity-Duet Neural Ranking Model(EDRM) This paper presents the Entity-Duet Neural Ranking Model (EDRM), which introduces knowledge graphs to neural search systems. EDRM represents queries and documents by their words and entity annotations. The semantics from knowledge graphs are integrated in the distributed representations of their entities, while the ranking is conducted by interaction-based neural ranking networks. The two components are learned end-to-end, making EDRM a natural combination of entity-oriented search and neural information retrieval. Our experiments on a commercial search log demonstrate the effectiveness of EDRM. Our analyses reveal that knowledge graph semantics significantly improve the generalization ability of neural ranking models. Entity-Relationship Dependence Model(ERDM) Entity-Relationship (E-R) Search is a complex case of Entity Search where the goal is to search for multiple unknown entities and relationships connecting them. We assume that a E-R query can be decomposed as a sequence of sub-queries each containing keywords related to a specific entity or relationship. We adopt a probabilistic formulation of the E-R search problem. When creating specific representations for entities (e.g. context terms) and for pairs of entities (i.e. relationships) it is possible to create a graph of probabilistic dependencies between sub-queries and entity plus relationship representations. To the best of our knowledge this represents the first probabilistic model of E-R search. We propose and develop a novel supervised Early Fusion-based model for E-R search, the Entity-Relationship Dependence Model (ERDM). It uses Markov Random Field to model term dependencies of E-R sub-queries and entity/relationship documents. We performed experiments with more than 800M entities and relationships extractions from ClueWeb-09-B with FACC1 entity linking. We obtained promising results using 3 different query collections comprising 469 E-R queries, with results showing that it is possible to perform E-R search without using fix and pre-defined entity and relationship types, enabling a wide range of queries to be addressed. Entrofy Selecting a cohort from a set of candidates is a common task within and beyond academia. Admitting students, awarding grants, choosing speakers for a conference are situations where human biases may affect the make-up of the final cohort. We propose a new algorithm, Entrofy, designed to be part of a larger decision making strategy aimed at making cohort selection as just, quantitative, transparent, and accountable as possible. We suggest this algorithm be embedded in a two-step selection procedure. First, all application materials are stripped of markers of identity that could induce conscious or sub-conscious bias. During blind review, the committee selects all applicants, submissions, or other entities that meet their merit-based criteria. This often yields a cohort larger than the admissible number. In the second stage, the target cohort can be chosen from this meritorious pool via a new algorithm and software tool. Entrofy optimizes differences across an assignable set of categories selected by the human committee. Criteria could include gender, academic discipline, experience with certain technologies, or other quantifiable characteristics. The Entrofy algorithm yields the computational maximization of diversity by solving the tie-breaking problem with provable performance guarantees. We show how Entrofy selects cohorts according to pre-determined characteristics in simulated sets of applications and demonstrate its use in a case study. This cohort selection process allows human judgment to prevail when assessing merit, but assigns the assessment of diversity to a computational process less likely to be beset by human bias. Importantly, the stage at which diversity assessments occur is fully transparent and auditable with Entrofy. Splitting merit and diversity considerations into their own assessment stages makes it easier to explain why a given candidate was selected or rejected. Entropic Spectral Learning We present a novel algorithm for learning the spectral density of large scale networks using stochastic trace estimation and the method of maximum entropy. The complexity of the algorithm is linear in the number of non-zero elements of the matrix, offering a computational advantage over other algorithms. We apply our algorithm to the problem of community detection in large networks. We show state-of-the-art performance on both synthetic and real datasets. Entropy In information theory, entropy is a measure of the uncertainty in a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message. Entropy is typically measured in bits, nats, or bans. Shannon entropy is the average unpredictability in a random variable, which is equivalent to its information content. Shannon entropy provides an absolute limit on the best possible lossless encoding or compression of any communication, assuming that the communication may be represented as a sequence of independent and identically distributed random variables. Entropy Agglomeration(EA) ➘ “Entropy Agglomeration” Entropy c-Means(ECM) Fuzzy clustering methods identify naturally occurring clusters in a dataset, where the extent to which different clusters are overlapped can differ. Most methods have a parameter to fix the level of fuzziness. However, the appropriate level of fuzziness depends on the application at hand. This paper presents Entropy $c$-Means (ECM), a method of fuzzy clustering that simultaneously optimizes two contradictory objective functions, resulting in the creation of fuzzy clusters with different levels of fuzziness. This allows ECM to identify clusters with different degrees of overlap. ECM optimizes the two objective functions using two multi-objective optimization methods, Non-dominated Sorting Genetic Algorithm II (NSGA-II), and Multiobjective Evolutionary Algorithm based on Decomposition (MOEA/D). We also propose a method to select a suitable trade-off clustering from the Pareto front. Experiments on challenging synthetic datasets as well as real-world datasets show that ECM leads to better cluster detection compared to the conventional fuzzy clustering methods as well as previously used multi-objective methods for fuzzy clustering. Entropy Search Enhancement of Energy-Based Swing-Up Controller via Entropy Search Entropy Signal Reflects the Malicious Document(ESRMD) Abstract-Email cyber-attacks based on malicious documents have become the popular techniques in today’s sophisticated attacks. In the past, persistent efforts have been made to detect such attacks. But there are still some common defects in the existing methods including unable to capture unknown attacks, high overhead of resource and time, and just can be used to detect specific formats of documents. In this study, a new Framework named ESRMD (Entropy signal Reflects the Malicious document) is proposed, which can detect malicious document based on the entropy distribution of the file. In essence, ESRMD is a machine learning classifier. What makes it distinctive is that it extracts global and structural entropy features from the entropy of the malicious documents rather than the structural data or metadata of the file, enduing it the ability to deal with various document formats and against the parser-confusion and obfuscated attacks. In order to assess the validity of the model, we conducted extensive experiments on a collected dataset with 10381 samples in it, which contains malware (51.47%) and benign (48.53%) samples. The results show that our model can achieve a good performance on the true positive rate, precision and ROC with the value of 96.00%, 96.69% and 99.2% respectively. We also compared ESRMD with some leading antivirus engines and prevalent tools. The results showed that our framework can achieve a better performance compared with these engines and tools. Envy-Free Classification In classic fair division problems such as cake cutting and rent division, envy-freeness requires that each individual (weakly) prefer his allocation to anyone else’s. On a conceptual level, we argue that envy-freeness also provides a compelling notion of fairness for classification tasks. Our technical focus is the generalizability of envy-free classification, i.e., understanding whether a classifier that is envy free on a sample would be almost envy free with respect to the underlying distribution with high probability. Our main result establishes that a small sample is sufficient to achieve such guarantees, when the classifier in question is a mixture of deterministic classifiers that belong to a family of low Natarajan dimension. Episodic Memory Deep Q-Network(EMDQN) Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interaction with the environments to obtain satisfactory performance. Recently, episodic memory based RL has attracted attention due to its ability to latch on good actions quickly. In this paper, we present a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep Q-Networks (EMDQN), which leverages episodic memory to supervise an agent during training. Experiments show that our proposed method can lead to better sample efficiency and is more likely to find good policies. It only requires 1/5 of the interactions of DQN to achieve many state-of-the-art performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms. Episodic Memory Reader(EMR) We consider a novel question answering (QA) task where the machine needs to read from large streaming data (long documents or videos) without knowing when the questions will be given, in which case the existing QA methods fail due to lack of scalability. To tackle this problem, we propose a novel end-to-end reading comprehension method, which we refer to as Episodic Memory Reader (EMR) that sequentially reads the input contexts into an external memory, while replacing memories that are less important for answering unseen questions. Specifically, we train an RL agent to replace a memory entry when the memory is full in order to maximize its QA accuracy at a future timepoint, while encoding the external memory using the transformer architecture to learn representations that considers relative importance between the memory entries. We validate our model on a real-world large-scale textual QA task (TriviaQA) and a video QA task (TVQA), on which it achieves significant improvements over rule-based memory scheduling policies or an RL-based baseline that learns the query-specific importance of each memory independently. Epistemology Epistemology is the study of the nature of knowledge, justification, and the rationality of belief. Much debate in epistemology centers on four areas: (1) the philosophical analysis of the nature of knowledge and how it relates to such concepts as truth, belief, and justification, (2) various problems of skepticism, (3) the sources and scope of knowledge and justified belief, and (4) the criteria for knowledge and justification. Epistemology addresses such questions as: ‘What makes justified beliefs justified?’, ‘What does it mean to say that we know something?’, and fundamentally ‘How do we know that we know?’. Epsilon-Greedy Algorithm To get you started thinking algorithmically about the Explore-Exploit dilemma, we’re going to teach you how to code up one of the simplest possible algorithms for trading off exploration and exploitation. This algorithm is called the epsilon-Greedy algorithm. In computer science, a greedy algorithm is an algorithm that always takes whatever action seems best at the present moment, even when that decision might lead to bad long term consequences. The epsilon-Greedy algorithm is almost a greedy algorithm because it generally exploits the best available option, but every once in a while the epsilon-Greedy algorithm explores the other available options. As we’ll see, the term epsilon in the algorithm’s name refers to the odds that the algorithm explores instead of exploiting. Let’s be more specific. The epsilon-Greedy algorithm works by randomly oscillating between Cynthia’s vision of purely randomized experimentation and Bob’s instinct to maximize profits. The epsilon-Greedy algorithm is one of the easiest bandit algorithms to understand because it tries to be fair to the two opposite goals of exploration and exploitation by using a mechanism that even a little kid could understand: it just flips a coin. While there are a few details we’ll have to iron out to make that statement precise, the big idea behind the epsilon-Greedy algorithm really is that simple: if you flip a coin and it comes up heads, you … epsilon-ResNet A family of super deep networks, referred to as residual networks or ResNet, achieved record-beating performance in various visual tasks such as image recognition, object detection, and semantic segmentation. The ability to train very deep networks naturally pushed the researchers to use enormous resources to achieve the best performance. Consequently, in many applications super deep residual networks were employed for just a marginal improvement in performance. In this paper, we propose $\epsilon$-ResNet that allows us to automatically discard redundant layers, which produces responses that are smaller than a threshold $\epsilon$, without any loss in performance. The $\epsilon$-ResNet architecture can be achieved using a few additional rectified linear units in the original ResNet. Our method does not use any additional variables nor numerous trials like other hyper-parameter optimization techniques. The layer selection is achieved using a single training process and the evaluation is performed on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets. In some instances, we achieve about 80\% reduction in the number of parameters. Epsilon-Subgradient Descent Minimax optimization plays a key role in adversarial training of machine learning algorithms, such as learning generative models, domain adaptation, privacy preservation, and robust learning. In this paper, we demonstrate the failure of alternating gradient descent in minimax optimization problems due to the discontinuity of solutions of the inner maximization. To address this, we propose a new epsilon-subgradient descent algorithm that addresses this problem by simultaneously tracking K candidate solutions. Practically, the algorithm can find solutions that previous saddle-point algorithms cannot find, with only a sublinear increase of complexity in K. We analyze the conditions under which the algorithm converges to the true solution in detail. A significant improvement in stability and convergence speed of the algorithm is observed in simple representative problems, GAN training, and domain-adaptation problems. Equilibrated Recurrent Neural Network(ERNN) We propose a novel {\it Equilibrated Recurrent Neural Network} (ERNN) to combat the issues of inaccuracy and instability in conventional RNNs. Drawing upon the concept of autapse in neuroscience, we propose augmenting an RNN with a time-delayed self-feedback loop. Our sole purpose is to modify the dynamics of each internal RNN state and, at any time, enforce it to evolve close to the equilibrium point associated with the input signal at that time. We show that such self-feedback helps stabilize the hidden state transitions leading to fast convergence during training while efficiently learning discriminative latent features that result in state-of-the-art results on several benchmark datasets at test-time. We propose a novel inexact Newton method to solve fixed-point conditions given model parameters for generating the latent features at each hidden state. We prove that our inexact Newton method converges locally with linear rate (under mild conditions). We leverage this result for efficient training of ERNNs based on backpropagation. Equilibrium-Independent Passivity-Short System(EIPS) Maximal equilibrium-independent passivity (MEIP) is a recently introduced system property which has acquired special attention in the study of networked dynamical systems. MEIP requires a system to be passive with respect to any forced equilibrium configuration and the associated steady-state input-output map must be maximally monotone. In practice, however, most of the systems are not well behaved and possess shortage of passivity or non-passiveness in their operation. In this paper, we consider a class of passivity-short systems, namely equilibrium-independent passivity-short (EIPS) systems, and presents an input-output transformation based generalized passivation approach to ensure their MEIP properties. We characterize the steady-state input-output relations of the EIPS systems and establish their connection with that of the transformed MEIP systems. We further study the diffusively-coupled networked interactions of such EIPS systems and explore their connection to a pair of dual network optimization problems, under the proposed matrix transformation. A simulation example is given to illustrate the theoretical results. Equivalent Class Optimization(EC-Opt) It has been widely observed that many activation functions and pooling methods of neural network models have (positive-) rescaling-invariant property, including ReLU, PReLU, max-pooling, and average pooling, which makes fully-connected neural networks (FNNs) and convolutional neural networks (CNNs) invariant to (positive) rescaling operation across layers. This may cause unneglectable problems with their optimization: (1) different NN models could be equivalent, but their gradients can be very different from each other; (2) it can be proven that the loss functions may have many spurious critical points in the redundant weight space. To tackle these problems, in this paper, we first characterize the rescaling-invariant properties of NN models using equivalent classes and prove that the dimension of the equivalent class space is significantly smaller than the dimension of the original weight space. Then we represent the loss function in the compact equivalent class space and develop novel algorithms that conduct optimization of the NN models directly in the equivalent class space. We call these algorithms Equivalent Class Optimization (abbreviated as EC-Opt) algorithms. Moreover, we design efficient tricks to compute the gradients in the equivalent class, which almost have no extra computational complexity as compared to standard back-propagation (BP). We conducted experimental study to demonstrate the effectiveness of our proposed new optimization algorithms. In particular, we show that by using the idea of EC-Opt, we can significantly improve the accuracy of the learned model (for both FNN and CNN), as compared to using conventional stochastic gradient descent algorithms. Equivariant Relational Layer(ERL) Due to its extensive use in databases, the relational model is ubiquitous in representing big-data. We propose to apply deep learning to this type of relational data by introducing an Equivariant Relational Layer (ERL), a neural network layer derived from the entity-relationship model of the database. Our layer relies on identification of exchangeabilities in the relational data(base), and their expression as a permutation group. We prove that an ERL is an optimal parameter-sharing scheme under the given exchangeability constraints, and subsumes recently introduced deep models for sets, exchangeable tensors, and graphs. The proposed model has a linear complexity in the size of the relational data, and it can be used for both inductive and transductive reasoning in databases, including the prediction of missing records, and database embedding. This opens the door to the application of deep learning to one of the most abundant forms of data. Equivariant Transformer(ET) How can prior knowledge on the transformation invariances of a domain be incorporated into the architecture of a neural network? We propose Equivariant Transformers (ETs), a family of differentiable image-to-image mappings that improve the robustness of models towards pre-defined continuous transformation groups. Through the use of specially-derived canonical coordinate systems, ETs incorporate functions that are equivariant by construction with respect to these transformations. We show empirically that ETs can be flexibly composed to improve model robustness towards more complicated transformation groups in several parameters. On a real-world image classification task, ETs improve the sample efficiency of ResNet classifiers, achieving relative improvements in error rate of up to 15% in the limited data regime while increasing model parameter count by less than 1%. Erase Rectified Linear Unit(EraseReLU) For most state-of-the-art architectures, Rectified Linear Unit (ReLU) becomes a standard component accompanied by each layer. Although ReLU can ease the network training to an extent, the character of blocking negative values may suppress the propagation of useful information and leads to the difficulty of optimizing very deep Convolutional Neural Networks (CNNs). Moreover, stacking of layers with nonlinear activations is hard to approximate the intrinsic linear transformations between feature representations. In this paper, we investigate the effect of erasing ReLUs of certain layers and apply it to various representative architectures. We name our approach as ‘EraseReLU’. It can ease the optimization and improve the generalization performance for very deep CNN models. In experiments, this method successfully improves the performance of various representative architectures, and we report the improved results on SVHN, CIFAR-10/100, and ImageNet-1k. By using EraseReLU, we achieve state-of-the-art single-model performance on CIFAR-100 with 83.47% accuracy. Codes will be released soon. Ergodic Inference Approximate inference algorithm is one of the fundamental research fields in machine learning. The two dominant theoretical inference frameworks in machine learning are variational inference (VI) and Markov chain Monte Carlo (MCMC). However, because of the fundamental limitation in the theory, it is very challenging to improve existing VI and MCMC methods on both the computational scalability and statistical efficiency. To overcome this obstacle, we propose a new theoretical inference framework called ergodic Inference based on the fundamental property of ergodic transformations. The key contribution of this work is to establish the theoretical foundation of ergodic inference for the development of practical algorithms in future work. E-RL^2 ➘ “Krazy World” Error Correction Model(ECM) An error correction model belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run stochastic trend, also known as cointegration. ECMs are a theoretically-driven approach useful for estimating both short-term and long-term effects of one time series on another. The term error-correction relates to the fact that last-periods deviation from a long-run equilibrium, the error, influences its short-run dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables. ecm Error Matrix ➚ “Confusion Matrix” Error-Correcting Output Codes – Neural Language Modelling(ECOC-NLM) In this paper we propose a novel neural language modelling (NLM) method based on \textit{error-correcting output codes} (ECOC), abbreviated as ECOC-NLM (error-correcting output codes – neural language modelling). This latent variable based approach provides a principled way to choose a varying amount of latent output codes and avoids exact softmax normalization. Instead of minimizing measures between the predicted probability distribution and true distribution, we use error-correcting codes to represent both predictions and outputs. Secondly, we propose multiple ways to improve accuracy and convergence rates by maximizing the separability between codes that correspond to classes proportional to word embedding similarities. Lastly, we introduce a novel method called \textit{Latent Mixture Sampling}, a technique that is used to mitigate exposure bias and can be integrated into training latent-based neural language models. This involves mixing the latent codes (i.e variables) of past predictions and past targets in one of two ways: (1) according to a predefined sampling schedule or (2) a differentiable sampling procedure whereby the mixing probability is learned throughout training by replacing the greedy argmax operation with a smooth approximation. In evaluating Codeword Mixture Sampling for ECOC-NLM, we also baseline it against CWMS in a closely related Hierarhical Softmax-based NLM. Error-Robust Multi-View Clustering(EMVC) In the era of big data, data may come from multiple sources, known as multi-view data. Multi-view clustering aims at generating better clusters by exploiting complementary and consistent information from multiple views rather than relying on the individual view. Due to inevitable system errors caused by data-captured sensors or others, the data in each view may be erroneous. Various types of errors behave differently and inconsistently in each view. More precisely, error could exhibit as noise and corruptions in reality. Unfortunately, none of the existing multi-view clustering approaches handle all of these error types. Consequently, their clustering performance is dramatically degraded. In this paper, we propose a novel Markov chain method for Error-Robust Multi-View Clustering (EMVC). By decomposing each view into a shared transition probability matrix and error matrix and imposing structured sparsity-inducing norms on error matrices, we characterize and handle typical types of errors explicitly. To solve the challenging optimization problem, we propose a new efficient algorithm based on Augmented Lagrangian Multipliers and prove its convergence rigorously. Experimental results on various synthetic and real-world datasets show the superiority of the proposed EMVC method over the baseline methods and its robustness against different types of errors. Escape Room Domain(ERD) Recent successes in Reinforcement Learning have encouraged a fast-growing network of RL researchers and a number of breakthroughs in RL research. As the RL community and the body of RL work grows, so does the need for widely applicable benchmarks that can fairly and effectively evaluate a variety of RL algorithms. This need is particularly apparent in the realm of Hierarchical Reinforcement Learning (HRL). While many existing test domains may exhibit hierarchical action or state structures, modern RL algorithms still exhibit great difficulty in solving domains that necessitate hierarchical modeling and action planning, even when such domains are seemingly trivial. These difficulties highlight both the need for more focus on HRL algorithms themselves, and the need for new testbeds that will encourage and validate HRL research. Existing HRL testbeds exhibit a Goldilocks problem; they are often either too simple (e.g. Taxi) or too complex (e.g. Montezuma’s Revenge from the Arcade Learning Environment). In this paper we present the Escape Room Domain (ERD), a new flexible, scalable, and fully implemented testing domain for HRL that bridges the ‘moderate complexity’ gap left behind by existing alternatives. ERD is open-source and freely available through GitHub, and conforms to widely-used public testing interfaces for simple integration and testing with a variety of public RL agent implementations. We show that the ERD presents a suite of challenges with scalable difficulty to provide a smooth learning gradient from Taxi to the Arcade Learning Environment. Escort Deep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Although pruning can remove more than 80% of the weights, it actually hurts inference performance (speed) when running models on GPUs. Two major problems cause this unsatisfactory performance on GPUs. First, lowering convolution onto matrix multiplication reduces data reuse opportunities and wastes memory bandwidth. Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when running on massively parallel GPUs. To overcome these two limitations, we propose Escort, an efficient sparse convolutional neural networks on GPUs. Instead of using the lowering method, we choose to compute the sparse convolutions directly. We then orchestrate the parallelism and locality for the direct sparse convolution kernel, and apply customized optimization techniques to further improve performance. Evaluation on NVIDIA GPUs show that Escort can improve sparse convolution speed by 2.63x and 3.07x, and inference speed by 1.38x and 1.60x, compared to CUBLAS and CUSPARSE respectively. ESN Recurrent Autoencoder(ESN-RAE) It is a widely accepted fact that data representations intervene noticeably in machine learning tools. The more they are well defined the better the performance results are. Feature extraction-based methods such as autoencoders are conceived for finding more accurate data representations from the original ones. They efficiently perform on a specific task in terms of 1) high accuracy, 2) large short term memory and 3) low execution time. Echo State Network (ESN) is a recent specific kind of Recurrent Neural Network which presents very rich dynamics thanks to its reservoir-based hidden layer. It is widely used in dealing with complex non-linear problems and it has outperformed classical approaches in a number of tasks including regression, classification, etc. In this paper, the noticeable dynamism and the large memory provided by ESN and the strength of Autoencoders in feature extraction are gathered within an ESN Recurrent Autoencoder (ESN-RAE). In order to bring up sturdier alternative to conventional reservoir-based networks, not only single layer basic ESN is used as an autoencoder, but also Multi-Layer ESN (ML-ESN-RAE). The new features, once extracted from ESN’s hidden layer, are applied to classification tasks. The classification rates rise considerably compared to those obtained when applying the original data features. An accuracy-based comparison is performed between the proposed recurrent AEs and two variants of an ELM feed-forward AEs (Basic and ML) in both of noise free and noisy environments. The empirical study reveals the main contribution of recurrent connections in improving the classification performance results. e-SNLI In order for machine learning to garner widespread public adoption, models must be able to provide interpretable and robust explanations for their decisions, as well as learn from human-provided explanations at train time. In this work, we extend the Stanford Natural Language Inference dataset with an additional layer of human-annotated natural language explanations of the entailment relations. We further implement models that incorporate these explanations into their training process and output them at test time. We show how our corpus of explanations, which we call e-SNLI, can be used for various goals, such as obtaining full sentence justifications of a model’s decisions, improving universal sentence representations and transferring to out-of-domain NLI datasets. Our dataset thus opens up a range of research directions for using natural language explanations, both for improving models and for asserting their trust. ESPnet This paper introduces a new open source platform for end-to-end speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks. ESPNetv2 We introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2, for modeling visual and sequential data. Our network uses group point-wise and depth-wise dilated separable convolutions to learn representations from a large effective receptive field with fewer FLOPs and parameters. The performance of our network is evaluated on three different tasks: (1) object classification, (2) semantic segmentation, and (3) language modeling. Experiments on these tasks, including image classification on the ImageNet and language modeling on the PenTree bank dataset, demonstrate the superior performance of our method over the state-of-the-art methods. Our network has better generalization properties than ShuffleNetv2 when tested on the MSCOCO multi-object classification task and the Cityscapes urban scene semantic segmentation task. Our experiments show that ESPNetv2 is much more power efficient than existing state-of-the-art efficient methods including ShuffleNets and MobileNets. Our code is open-source and available at \url{https://…/ESPNetv2} Espresso There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required for the forward propagation of CNNs, in a binary file less than 400KB, without any external dependencies. Although it is mainly designed to take advantage of massive GPU parallelism, Espresso also provides an equivalent CPU implementation for CNNs. Espresso provides special convolutional and dense layers for BCNNs, leveraging bit-packing and bit-wise computations for efficient execution. These techniques provide a speed-up of matrix-multiplication routines, and at the same time, reduce memory usage when storing parameters and activations. We experimentally show that Espresso is significantly faster than existing implementations of optimized binary neural networks ($\approx$ 2 orders of magnitude). Espresso is released under the Apache 2.0 license and is available at http://…/espresso. Esri Esri´s GIS (geographic information systems) mapping software helps you understand and visualize data to make decisions based on the best information and analysis. Essence Vector Model(EV) In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations, our major contributions in this paper are twofold. First, we propose a novel unsupervised paragraph embedding method, named the essence vector (EV) model, which aims at not only distilling the most representative information from a paragraph but also excluding the general background information to produce a more informative low-dimensional vector representation for the paragraph. Second, in view of the increasing importance of spoken content processing, an extension of the EV model, named the denoising essence vector (D-EV) model, is proposed. The D-EV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition. Essential Histogram The histogram is widely used as a simple, exploratory display of data, but it is usually not clear how to choose the number and size of bins for this purpose. We construct a confidence set of distribution functions that optimally address the two main tasks of the histogram: estimating probabilities and detecting features such as increases and (anti)modes in the distribution. We define the essential histogram as the histogram in the confidence set with the fewest bins. Thus the essential histogram is the simplest visualization of the data that optimally achieves the main tasks of the histogram. We provide a fast algorithm for computing a slightly relaxed version of the essential histogram, which still possesses most of its beneficial theoretical properties, and we illustrate our methodology with examples. An R-package is available online. Estimability Here we consider, in the context of causal inference, the general question: ‘what can be estimated from data?’. We call this the question of estimability. We consider the usual definition adopted in the causal inference literature — identifiability — in a general mathematical setting and show why it is an inadequate formal translation of the concept of estimability. Despite showing that identifiability implies the existence of a Fisher-consistent estimator, we show that this estimator may be discontinuous, hence unstable, in general. The source of the difficulty is that the general form of the causal inference problem is an ill-posed inverse problem. Inverse problems have three conditions which must be satisfied in order to be considered well-posed: existence, uniqueness, and stability of solutions. We illustrate how identifiability corresponds to the question of uniqueness; in contrast, we take estimability to mean satisfaction of all three conditions, i.e. well-posedness. Well-known results from the inverse problems literature imply that mere identifiability does not guarantee well-posedness of the causal inference procedure, i.e. estimability, and apparent solutions to causal inference problems can be essentially useless with even the smallest amount of imperfection. Analogous issues with attempts to apply standard statistical procedures to very general settings were raised in the statistical literature as far back as the 60s and 70s. Thus, in addition to giving general characterisations of identifiability and estimability, we demonstrate how the issues raised in both the theory of inverse problems and in the theory of statistical inference lead to concerns over the stability of general nonparametric approaches to causal inference. These apply, in particular, to those that focus on identifiability while excluding the additional stability requirements required for estimability. estimability Estimation of Distribution Algorithm(EDA) Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding the uniform distribution over admissible solutions and ending with the model that generates only the global optima. EDAs belong to the class of evolutionary algorithms. The main difference between EDAs and most conventional evolutionary algorithms is that evolutionary algorithms generate new candidate solutions using an implicit distribution defined by one or more variation operators, whereas EDAs use an explicit probability distribution encoded by a Bayesian network, a multivariate normal distribution, or another model class. Similarly as other evolutionary algorithms, EDAs can be used to solve optimization problems defined over a number of representations from vectors to LISP style S expressions, and the quality of candidate solutions is often evaluated using one or more objective functions. Level-Based Analysis of the Univariate Marginal Distribution Algorithm ETNLP In this paper, we introduce a comprehensive toolkit, ETNLP, which can evaluate, extract, and visualize multiple sets of pre-trained word embeddings. First, for evaluation, ETNLP analyses the quality of pre-trained embeddings based on an input word analogy list. Second, for extraction ETNLP provides a subset of the embeddings to be used in the downstream NLP tasks. Finally, ETNLP has a visualization module which is for exploring the embedded words interactively. We demonstrate the effectiveness of ETNLP on our pre-trained word embeddings in Vietnamese. Specifically, we create a large Vietnamese word analogy list to evaluate the embeddings. We then utilize the pre-trained embeddings for the name entity recognition (NER) task in Vietnamese and achieve the new state-of-the-art results on a benchmark dataset for the NER task. A video demonstration of ETNLP is available at https://…/317599106. The source code and data are available at https: //github.com/vietnlp/etnlp. Euclidean Distance In mathematics, the Euclidean distance or Euclidean metric is the ‘ordinary’ distance between two points that one would measure with a ruler, and is given by the Pythagorean formula. By using this formula as distance, Euclidean space (or even any inner product space) becomes a metric space. The associated norm is called the Euclidean norm. EvalAI We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale. EvalAI is built to provide a scalable solution to the research community to fulfill the critical need of evaluating machine learning models and agents acting in an environment against annotations or with a human-in-the-loop. This will help researchers, students, and data scientists to create, collaborate, and participate in AI challenges organized around the globe. By simplifying and standardizing the process of benchmarking these models, EvalAI seeks to lower the barrier to entry for participating in the global scientific effort to push the frontiers of machine learning and artificial intelligence, thereby increasing the rate of measurable progress in this domain. EvalNE In this paper we present EvalNE, a Python toolbox for evaluating network embedding methods on link prediction tasks. Link prediction is one of the most popular choices for evaluating the quality of network embeddings. However, the complexity of this task requires a carefully designed evaluation pipeline in order to provide consistent, reproducible and comparable results. EvalNE simplifies this process by providing automation and abstraction of tasks such as hyper-parameter tuning and model validation, edge sampling and negative edge sampling, computation of edge embeddings from node embeddings, and evaluation metrics. The toolbox allows for the evaluation of any off-the-shelf embedding method without the need to write extra code. Moreover, it can also be used for evaluating any other link prediction method, and integrates several link prediction heuristics as baselines. EvalNorm Batch normalization (BN) has been very effective for deep learning and is widely used. However, when training with small minibatches, models using BN exhibit a significant degradation in performance. In this paper we study this peculiar behavior of BN to gain a better understanding of the problem, and identify a potential cause based on a statistical insight. We propose EvalNorm’ to address the issue by estimating corrected normalization statistics to use for BN during evaluation. EvalNorm supports online estimation of the corrected statistics while the model is being trained, and it does not affect the training scheme of the model. As a result, an added advantage of EvalNorm is that it can be used with existing pre-trained models allowing them to benefit from our method. EvalNorm yields large gains for models trained with smaller batches. Our experiments show that EvalNorm performs 6.18% (absolute) better than vanilla BN for a batchsize of 2 on ImageNet validation set and from 1.5 to 7.0 points (absolute) gain on the COCO object detection benchmark across a variety of setups. Evaluating Quantitative Understanding Aptitude in Textual Entailment(EQUATE) Quantitative reasoning is an important component of reasoning that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new dataset to evaluate the ability of models to reason with quantities in textual entailment (including not only arithmetic and algebraic computation, but also other phenomena such as range comparisons and verbal reasoning with quantities). The average performance of 7 published textual entailment models on EQUATE does not exceed a majority class baseline, indicating that current models do not implicitly learn to reason with quantities. We propose a new baseline Q-REAS that manipulates quantities symbolically, achieving some success on numerical reasoning, but struggling at more verbal aspects of the task. We hope our evaluation framework will support the development of new models of quantitative reasoning in language understanding. EvE ➘ “GENESYS” Even Initialization In this paper, we propose a new weight initialization method called even initialization for wide and deep nonlinear neural networks with the ReLU activation function. We prove that no poor local minimum exists in the initial loss landscape in the wide and deep nonlinear neural network initialized by the even initialization method that we propose. Specifically, in the initial loss landscape of such a wide and deep ReLU neural network model, the following four statements hold true: 1) the loss function is non-convex and non-concave; 2) every local minimum is a global minimum; 3) every critical point that is not a global minimum is a saddle point; and 4) bad saddle points exist. We also show that the weight values initialized by the even initialization method are contained in those initialized by both of the (often used) standard initialization and He initialization methods. Event Driven Architeture(EDA) Event-driven architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of, and reaction to events. An event can be defined as “a significant change in state”. For example, when a consumer purchases a car, the car’s state changes from “for sale” to “sold”. A car dealer’s system architecture may treat this state change as an event whose occurrence can be made known to other applications within the architecture. From a formal perspective, what is produced, published, propagated, detected or consumed is a (typically asynchronous) message called the event notification, and not the event itself, which is the state change that triggered the message emission. Event Extraction(EE) One common application of text mining is event extraction, which encompasses deducing specific knowledge concerning incidents referred to in texts. Event extraction can be applied to various types of written text, e.g., (online) news messages, blogs, and manuscripts. Event Graph Recent advances in data collection and storage have allowed both researchers and industry alike to collect data in real time. Much of this data comes in the form of ‘events’, or timestamped interactions, such as email and social media posts, website clickstreams, or protein-protein interactions. This of type data poses new challenges for modelling, especially if we wish to preserve all temporal features and structure. We propose a generalised framework to explore temporal networks using second-order time-unfolded models, called event graphs. Through examples we demonstrate how event graphs can be used to understand the higher-order topological-temporal structure of temporal networks and capture properties of the network that are unobserved when considering either a static (or time-aggregated) model. Furthermore, we show that by modelling a temporal network as an event graph our analysis extends easily to consider non-dyadic interactions, known as hyper-events. Event History Analysis Event history analysis deals with data obtained by observing individuals over time, focusing on events occurring for the individuals under observation. Important applications are to life events of humans in demography, life insurance mathematics, epidemiology, and sociology. The basic data are the times of occurrence of the events and the types of events that occur. The standard approach to the analysis of such data is to use multistate models; a basic example is finite-state Markov processes in continuous time. Censoring and truncation are defining features of the area. This review comments specifically on three areas that are current subjects of active development, all motivated by demands from applications: sampling patterns, the possibility of causal interpretation of the analyses, and the levels and interpretation of variability. eha Event Schema Induction(ESI) Event Sourcing(ES) An architectural pattern which warrants that your entities (as per Eric Evans’ definition) do not track their internal state by means of direct serialization or O/R mapping, but by means of reading and committing events to an event store. Where ES is combined with CQRS and DDD, aggregate roots are responsible for thoroughly validating and applying commands (often by means having their instance methods invoked from a Command Handler), and then publishing a single or a set of events which is also the foundation upon which the aggregate roots base their logic for dealing with method invocations. Hence, the input is a command and the output is one or many events which are transactionally (single commit) saved to an event store, and then often published on a message broker for the benefit of those interested (often the views are interested; they are then queried using Query-messages). When modeling your aggregate roots to output events, you can isolate the internal state event further than would be possible when projecting read-data from your entities, as is done in standard n-tier data-passing architectures. One significant benefit from this is that tooling such as axiomatic theorem provers (e.g. Microsoft Contracts or CHESS) are easier to apply, as the aggregate root comprehensively hides its internal state. Events are often persisted based on the version of the aggregate root instance, which yields a domain model that synchronizes in distributed systems around the concept of optimistic concurrency. Event Store An event store is a type of database optimized for storage of events. Conceptually, in an event store, only the events of a dossier or policy are stored. The idea behind it is that the dossier or policy can be derived from these events. The events (and their corresponding data) are the only ‘real’ facts that should be stored in the database. The instantiation of all other objects can be derived from these events. The code instantiates these objects in memory. In an event store database, this means that all objects that should be instantiated, are not stored in the database. Instead these objects are instantiated ‘on the fly’ in memory by the code based on the events. After usage of these objects (e.g. shown in a user interface), the instantiated objects are removed from memory. For example, the event store concept of a database can be applied to insurance policies or pension dossiers. In these policies or dossiers the instantiation of each object that make up the dossier or policy (the person, partner(s), employments, etc.) can be derived and can be instantiated in memory based on the real world events. Event Stream Processing(ESP) Event stream processing, or ESP, is a set of technologies designed to assist the construction of event-driven information systems. ESP technologies include event visualization, event databases, event-driven middleware, and event processing languages, or complex event processing (CEP). In practice, the terms ESP and CEP are often used interchangeably. ESP deals with the task of processing streams of event data with the goal of identifying the meaningful pattern within those streams, employing techniques such as detection of relationships between multiple events, event correlation, event hierarchies, and other aspects such as causality, membership and timing. ESP enables many different applications such as algorithmic trading in financial services, RFID event processing applications, fraud detection, process monitoring, and location-based services in telecommunications. Event2Vec Network representation learning (NRL) has been widely used to help analyze large-scale networks through mapping original networks into a low-dimensional vector space. However, existing NRL methods ignore the impact of properties of relations on the object relevance in heterogeneous information networks (HINs). To tackle this issue, this paper proposes a new NRL framework, called Event2vec, for HINs to consider both quantities and properties of relations during the representation learning process. Specifically, an event (i.e., a complete semantic unit) is used to represent the relation among multiple objects, and both event-driven first-order and second-order proximities are defined to measure the object relevance according to the quantities and properties of relations. We theoretically prove how event-driven proximities can be preserved in the embedding space by Event2vec, which utilizes event embeddings to facilitate learning the object embeddings. Experimental studies demonstrate the advantages of Event2vec over state-of-the-art algorithms on four real-world datasets and three network analysis tasks (including network reconstruction, link prediction, and node classification). Event-Centric Temporal Knowledge Graph(EventKG) One of the key requirements to facilitate semantic analytics of information regarding contemporary and historical events on the Web, in the news and in social media is the availability of reference knowledge repositories containing comprehensive representations of events and temporal relations. Existing knowledge graphs, with popular examples including DBpedia, YAGO and Wikidata, focus mostly on entity-centric information and are insufficient in terms of their coverage and completeness with respect to events and temporal relations. EventKG presented in this paper is a multilingual event-centric temporal knowledge graph that addresses this gap. EventKG incorporates over 690 thousand contemporary and historical events and over 2.3 million temporal relations extracted from several large-scale knowledge graphs and semi-structured sources and makes them available through a canonical representation. EventKG One of the key requirements to facilitate the semantic analytics of information regarding contemporary and historical events on the Web, in the news and in social media is the availability of reference knowledge repositories containing comprehensive representations of events, entities and temporal relations. Existing knowledge graphs, with popular examples including DBpedia, YAGO and Wikidata, focus mostly on entity-centric information and are insufficient in terms of their coverage and completeness with respect to events and temporal relations. In this article we address this limitation, formalise the concept of a temporal knowledge graph and present its instantiation – EventKG. EventKG is a multilingual event-centric temporal knowledge graph that incorporates over 690 thousand events and over 2.3 million temporal relations obtained from several large-scale knowledge graphs and semi-structured sources and makes them available through a canonical RDF representation. Whereas popular entities often possess hundreds of relations within a temporal knowledge graph such as EventKG, generating a concise overview of the most important temporal relations for a given entity is a challenging task. In this article we demonstrate an application of EventKG to biographical timeline generation, where we adopt a distant supervision method to identify relations most relevant for an entity biography. Our evaluation results provide insights on the characteristics of EventKG and demonstrate the effectiveness of the proposed biographical timeline generation method. EventKG+TL The provision of multilingual event-centric temporal knowledge graphs such as EventKG enables structured access to representations of a large number of historical and contemporary events in a variety of language contexts. Timelines provide an intuitive way to facilitate an overview of events related to a \textit{query entity} – i.e. an entity or an event of user interest – over a certain period of time. In this paper, we present \eventTL{} – a novel system that generates cross-lingual event timelines using EventKG and facilitates an overview of the language-specific event relevance and popularity along with the cross-lingual differences. Event-Triggered Control(ETC) Recent developments in computer and communication technologies have led to a new type of large-scale resource-constrained wireless embedded control systems. It is desirable in these systems to limit the sensor and control computation and/or communication to instances when the system needs attention. However, classical sampled-data control is based on performing sensing and actuation periodically rather than when the system needs attention. This paper provides an introduction to event- and self-triggered control systems where sensing and actuation is performed when needed. Event-triggered control is reactive and generates sensor sampling and control actuation when, for instance, the plant state deviates more than a certain threshold from a desired value. Self-triggered control, on the other hand, is proactive and computes the next sampling or actuation instance ahead of time. The basics of these control strategies are introduced together with a discussion on the differences between state feedback and output feedback for event-triggered control. It is also shown how event- and self-triggered control can be implemented using existing wireless communication technology. Some applications to wireless control in process industry are discussed as well. Deep Reinforcement Learning for Event-Triggered Control Event-Triggered and Self-Triggered Control Event-triggered Learning Efficient exchange of information is an essential aspect of intelligent collective behavior. Event-triggered control and estimation achieve some efficiency by replacing continuous data exchange between agents with intermittent, or event-triggered communication. Typically, model-based predictions are used at times of no data transmission, and updates are sent only when the prediction error grows too large. The effectiveness in reducing communication thus strongly depends on the quality of the prediction model. In this article, we propose event-triggered learning as a novel concept to reduce communication even further and to also adapt to changing dynamics. By monitoring the actual communication rate and comparing it to the one that is induced by the model, we detect mismatch between model and reality and trigger learning of a new model when needed. Specifically for linear Gaussian dynamics, we derive different classes of learning triggers solely based on a statistical analysis of inter-communication times and formally prove their effectiveness with the aid of concentration inequalities. Evidence based Data Analysis What I think the statistical community needs to invest time and energy into is what I call “evidence-based data analysis”. What do I mean by this? Most data analyses are not the simple classroom exercises that we’ve all done involving linear regression or two-sample t-tests. Most of the time, you have to obtain the data, clean that data, remove outliers, impute missing values, transform variables and on and on, even before you fit any sort of model. Then there’s model selection, model fitting, diagnostics, sensitivity analysis, and more. So a data analysis is really pipeline of operations where the output of one stage becomes the input of another. The basic idea behind evidence-based data analysis is that for each stage of that pipeline, we should be using the best method, justified by appropriate statistical research that provides evidence favoring one method over another. If we cannot reasonable agree on a best method for a given stage in the pipeline, then we have a gap that needs to be filled. So we fill it! Evidence Lower Bound(ELBO) (Section: 2.2 The Evidence Lower Bound) Filtering Variational Objectives ➘ “Filtering Variational Objectives” Evidence Transfer for Clustering In this paper we introduce evidence transfer for clustering, a deep learning method that can incrementally manipulate the latent representations of an autoencoder, according to external categorical evidence, in order to improve a clustering outcome. It is deployed on a baseline solution to reduce the cross entropy between the external evidence and an extension of the latent space. By evidence transfer we define the process by which the categorical outcome of an external, auxiliary task is exploited to improve a primary task, in this case representation learning for clustering. Our proposed method makes no assumptions regarding the categorical evidence presented, nor the structure of the latent space. We compare our method, against the baseline solution by performing k-means clustering before and after its deployment. Experiments with three different kinds of evidence show that our method effectively manipulates the latent representations when introduced with real corresponding evidence, while remaining robust when presented with low quality evidence. Evidence-Driven State-Merging(EDSM) Human in the Loop: Interactive Passive Automata Learning via Evidence-Driven State-Merging Algorithms Evidential C-Medoids(ECMdd) In this work, a new prototype-based clustering method named Evidential C-Medoids (ECMdd), which belongs to the family of medoid-based clustering for proximity data, is proposed as an extension of Fuzzy C-Medoids (FCMdd) on the theoretical framework of belief functions. In the application of FCMdd and original ECMdd, a single medoid (prototype), which is supposed to belong to the object set, is utilized to represent one class. For the sake of clarity, this kind of ECMdd using a single medoid is denoted by sECMdd. In real clustering applications, using only one pattern to capture or interpret a class may not adequately model different types of group structure and hence limits the clustering performance. In order to address this problem, a variation of ECMdd using multiple weighted medoids, denoted by wECMdd, is presented. Unlike sECMdd, in wECMdd objects in each cluster carry various weights describing their degree of representativeness for that class. This mechanism enables each class to be represented by more than one object. Experimental results in synthetic and real data sets clearly demonstrate the superiority of sECMdd and wECMdd. Moreover, the clustering results by wECMdd can provide richer information for the inner structure of the detected classes with the help of prototype weights. Evolution Gene The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments’ distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results. Evolution Strategy(ES) In computer science, an evolution strategy (ES) is an optimization technique based on ideas of adaptation and evolution. It belongs to the general class of evolutionary computation or artificial evolution methodologies. Evolutionary Algorithm(EA) In artificial intelligence, an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators. Artificial evolution (AE) describes a process involving individual evolutionary algorithms; EAs are individual components that participate in an AE. ➘ “Loss Function” Evolutionary Attention-based LSTM(EA-LSTM) Time series prediction with deep learning methods, especially long short-term memory neural networks (LSTMs), have scored significant achievements in recent years. Despite the fact that the LSTMs can help to capture long-term dependencies, its ability to pay different degree of attention on sub-window feature within multiple time-steps is insufficient. To address this issue, an evolutionary attention-based LSTM training with competitive random search is proposed for multivariate time series prediction. By transferring shared parameters, an evolutionary attention learning approach is introduced to the LSTMs model. Thus, like that for biological evolution, the pattern for importance-based attention sampling can be confirmed during temporal relationship mining. To refrain from being trapped into partial optimization like traditional gradient-based methods, an evolutionary computation inspired competitive random search method is proposed, which can well configure the parameters in the attention layer. Experimental results have illustrated that the proposed model can achieve competetive prediction performance compared with other baseline methods. Evolutionary Cost-Sensitive Deep Belief Network Imbalanced data with a skewed class distribution are common in many real-world applications. Deep Belief Network (DBN) is a machine learning technique that is effective in classification tasks. However, conventional DBN does not work well for imbalanced data classification because it assumes equal costs for each class. To deal with this problem, cost-sensitive approaches assign different misclassification costs for different classes without disrupting the true data sample distributions. However, due to lack of prior knowledge, the misclassification costs are usually unknown and hard to choose in practice. Moreover, it has not been well studied as to how cost-sensitive learning could improve DBN performance on imbalanced data problems. This paper proposes an evolutionary cost-sensitive deep belief network (ECS-DBN) for imbalanced classification. ECS-DBN uses adaptive differential evolution to optimize the misclassification costs based on training data, that presents an effective approach to incorporating the evaluation measure (i.e. G-mean) into the objective function. We first optimize the misclassification costs, then apply them to deep belief network. Adaptive differential evolution optimization is implemented as the optimization algorithm that automatically updates its corresponding parameters without the need of prior domain knowledge. The experiments have shown that the proposed approach consistently outperforms the state-of-the-art on both benchmark datasets and real-world dataset for fault diagnosis in tool condition monitoring. Evolutionary DEep Networks(EDEN) Deep neural networks continue to show improved performance with increasing depth, an encouraging trend that implies an explosion in the possible permutations of network architectures and hyperparameters for which there is little intuitive guidance. To address this increasing complexity, we propose Evolutionary DEep Networks (EDEN), a computationally efficient neuro-evolutionary algorithm which interfaces to any deep neural network platform, such as TensorFlow. We show that EDEN evolves simple yet successful architectures built from embedding, 1D and 2D convolutional, max pooling and fully connected layers along with their hyperparameters. Evaluation of EDEN across seven image and sentiment classification datasets shows that it reliably finds good networks — and in three cases achieves state-of-the-art results — even on a single GPU, in just 6-24 hours. Our study provides a first attempt at applying neuro-evolution to the creation of 1D convolutional networks for sentiment analysis including the optimisation of the embedding layer. Evolutionary eXploration of Augmenting Memory Model(EXAMM) This paper presents a new algorithm, Evolutionary eXploration of Augmenting Memory Models (EXAMM), which is capable of evolving recurrent neural networks (RNNs) using a wide variety of memory structures, such as Delta-RNN, GRU, LSTM, MGU and UGRNN cells. EXAMM evolved RNNs to perform prediction of large-scale, real world time series data from the aviation and power industries. These data sets consist of very long time series (thousands of readings), each with a large number of potentially correlated and dependent parameters. Four different parameters were selected for prediction and EXAMM runs were performed using each memory cell type alone, each cell type with feed forward nodes, and with all possible memory cell types. Evolved RNN performance was measured using repeated k-fold cross validation, resulting in 1210 EXAMM runs which evolved 2,420,000 RNNs in 12,100 CPU hours on a high performance computing cluster. Generalization of the evolved RNNs was examined statistically, providing interesting findings that can help refine the RNN memory cell design as well as inform future neuro-evolution algorithms development. Evolutionary Generative Adversarial Network(E-GAN) Generative adversarial networks (GAN) have been effective for learning generative models for real-world data. However, existing GANs (GAN and its variants) tend to suffer from training problems such as instability and mode collapse. In this paper, we propose a novel GAN framework called evolutionary generative adversarial networks (E-GAN) for stable GAN training and improved generative performance. Unlike existing GANs, which employ a pre-defined adversarial objective function alternately training a generator and a discriminator, we utilize different adversarial training objectives as mutation operations and evolve a population of generators to adapt to the environment (i.e., the discriminator). We also utilize an evaluation mechanism to measure the quality and diversity of generated samples, such that only well-performing generator(s) are preserved and used for further training. In this way, E-GAN overcomes the limitations of an individual adversarial training objective and always preserves the best offspring, contributing to progress in and the success of GANs. Experiments on several datasets demonstrate that E-GAN achieves convincing generative performance and reduces the training problems inherent in existing GANs. Evolutionary Graph Recurrent Network(EGRN) Time series modeling aims to capture the intrinsic factors underpinning observed data and its evolution. However, most existing studies ignore the evolutionary relations among these factors, which are what cause the combinatorial evolution of a given time series. In this paper, we propose to represent time-varying relations among intrinsic factors of time series data by means of an evolutionary state graph structure. Accordingly, we propose the Evolutionary Graph Recurrent Networks (EGRN) to learn representations of these factors, along with the given time series, using a graph neural network framework. The learned representations can then be applied to time series classification tasks. From our experiment results, based on six real-world datasets, it can be seen that our approach clearly outperforms ten state-of-the-art baseline methods (e.g. +5% in terms of accuracy, and +15% in terms of F1 on average). In addition, we demonstrate that due to the graph structure’s improved interpretability, our method is also able to explain the logical causes of the predicted events. Evolutionary Reinforcement Learning(ERL) Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack of effective exploration, and brittle convergence properties that are extremely sensitive to hyperparameters. Collectively, these challenges severely limit the applicability of these approaches to real world problems. Evolutionary Algorithms (EAs), a class of black box optimization techniques inspired by natural evolution, are well suited to address each of these three challenges. However, EAs typically suffer with high sample complexity and struggle to solve problems that require optimization of a large number of parameters. In this paper, we introduce Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into the EA population periodically to inject gradient information into the EA. ERL inherits EA’s ability of temporal credit assignment with a fitness metric, effective exploration with a diverse set of policies, and stability of a population-based approach and complements it with off-policy DRL’s ability to leverage gradients for higher sample efficiency and faster learning. Experiments in a range of challenging continuous control benchmark tasks demonstrate that ERL significantly outperforms prior DRL and EA methods, achieving state-of-the-art performances. Evolutionary Stochastic Gradient Descent(ESGD) We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures. Evolutionary Strategy Procedure(ES Procedure) Evolutionary Strategies (ES) are a popular family of black-box zeroth-order optimization algorithms which rely on search distributions to efficiently optimize a large variety of objective functions. This paper investigates the potential benefits of using highly flexible search distributions in classical ES algorithms, in contrast to standard ones (typically Gaussians). We model such distributions with Generative Neural Networks (GNNs) and introduce a new training algorithm that leverages their expressiveness to accelerate the Evolutionary Strategy – ES procedure. We show that this tailored algorithm can readily incorporate existing ES algorithms, and outperforms the state-of-the-art on diverse objective functions. Evolutionary Subspace Clustering The problem of organizing data that evolves over time into clusters is encountered in a number of practical settings. We introduce evolutionary subspace clustering, a method whose objective is to cluster a collection of evolving data points that lie on a union of low-dimensional evolving subspaces. To learn the parsimonious representation of the data points at each time step, we propose a non-convex optimization framework that exploits the self-expressiveness property of the evolving data while taking into account representation from the preceding time step. To find an approximate solution to the aforementioned non-convex optimization problem, we develop a scheme based on alternating minimization that both learns the parsimonious representation as well as adaptively tunes and infers a smoothing parameter reflective of the rate of data evolution. The latter addresses a fundamental challenge in evolutionary clustering — determining if and to what extent one should consider previous clustering solutions when analyzing an evolving data collection. Our experiments on both synthetic and real-world datasets demonstrate that the proposed framework outperforms state-of-the-art static subspace clustering algorithms and existing evolutionary clustering schemes in terms of both accuracy and running time, in a range of scenarios. Evolutionary-Neural Hybrid Agent(Evo-NAS) Neural Architecture Search has recently shown potential to automate the design of Neural Networks. The use of Neural Network agents trained with Reinforcement Learning can offer the possibility to learn complex patterns, as well as the ability to explore a vast and compositional search space. On the other hand, evolutionary algorithms offer the greediness and sample efficiency needed for such an application, as each sample requires a considerable amount of resources. We propose a class of Evolutionary-Neural hybrid agents (Evo-NAS), that retain the best qualities of the two approaches. We show that the Evo-NAS agent can outperform both Neural and Evolutionary agents, both on a synthetic task, and on architecture search for a suite of text classification datasets. Evolving Fuzzy System(EFS) Evolving fuzzy systems (EFS) can be defined as self-developing, self-learning fuzzy rule-based or neuro-fuzzy systems that have both their parameters but also (more importantly) their structure self-adapting on-line. They are usually associated with streaming data and on-line (often real-time) modes of operation. In a narrower sense they can be seen as adaptive fuzzy systems. The difference is that evolving fuzzy systems assume on-line adaptation of system structure in addition to the parameter adaptation which is usually associated with the term adaptive. They also allow for adaptation of the learning mechanism. Therefore, evolving assumes a higher level of adaptation. In this definition the English word evolving is used with its core meaning as described in the Oxford dictionary (Hornby, 1974; p.294), namely unfolding; developing; being developed, naturally and gradually. Often evolving is used in relation to so called evolutionary and genetic algorithms. The meaning of the term evolutionary is defined in the Oxford dictionary as development of more complicated forms of life (plants, animals) from earlier and simpler forms. EFS consider a gradual development of the underlying (fuzzy or neuro-fuzzy) system structure and do not deal with such phenomena specific for the evolutionary and genetic algorithms as chromosomes crossover, mutation, selection and reproduction, parents and off-springs. ➘ “Evolving Intelligent System” Evolving Graph Convolutional Network(EvolveGCN) Graph representation learning resurges as a trending research subject owing to the widespread use of deep learning for Euclidean data, which inspire various creative designs of neural networks in the non-Euclidean domain, particularly graphs. With the success of these graph neural networks (GNN) in the static setting, we approach further practical scenarios where the graph dynamically evolves. For this case, combining the GNN with a recurrent neural network (RNN, broadly speaking) is a natural idea. Existing approaches typically learn one single graph model for all the graphs, by using the RNN to capture the dynamism of the output node embeddings and to implicitly regulate the graph model. In this work, we propose a different approach, coined EvolveGCN, that uses the RNN to evolve the graph model itself over time. This model adaptation approach is model oriented rather than node oriented, and hence is advantageous in the flexibility on the input. For example, in the extreme case, the model can handle at a new time step, a completely new set of nodes whose historical information is unknown, because the dynamism has been carried over to the GNN parameters. We evaluate the proposed approach on tasks including node classification, edge classification, and link prediction. The experimental results indicate a generally higher performance of EvolveGCN compared with related approaches. Evolving Intelligent System(EIS) The term Evolving was first used to describe an intelligent system in 1996 by B. Carse, T. Fogarty and A Munro for a fuzzy rule-based controller where its parameters and structure were learnt simultaneously using a Genetic Algorithm. Years later, alternative methods to learn an evolving intelligent system (EIS) via Incremental learning were suggested as a neuro-fuzzy algorithm by N. Kasabov in 1998 and a rule-based model by P. Angelov in 1999. EIS are usually associated with, streaming data and on-line (often real-time) modes of operation. They can be seen as adaptive intelligent systems. EIS assumes on-line adaptation of system structure in addition to the parameter adaptation which is usually associated with the term ‘incremental’ from Incremental learning. They have been studied as a methodological solution to learn from streaming data exhibiting non-stationary behaviours by M. Sayed-Mouchaweh and E. Lughofer. An important sub-area of EIS is represented by Evolving Fuzzy Systems (EFS) (a comprehensive survey written by E. Lughofer including real-world applications can be found in ), which rely on fuzzy systems architecture and incrementally update, evolve and prune fuzzy sets and fuzzy rules on demand and on-the-fly. One of the major strengths of EFS, compared to other forms of evolving system models, is that they are able to support some sort of interpretability and understandability for experts and users. This opens possibilities for enriched human-machine interaction’s scenarios, where the users may ‘communicate’ with an on-line evolving system in form of knowledge exchange (active learning (machine learning) and teaching). This concept is currently motivated and discussed in the evolving systems community under the term Human-Inspired Evolving Machines and respected as ‘one future’ generation of ‘EIS’. ➘ “PANFIS++” Evoplex Evoplex is a fast, robust and extensible platform for developing agent-based models and multi-agent systems on networks. Each agent is represented as a node and interacts with its neighbors, as defined by the network structure. Evoplex is ideal for modeling complex systems, for example in evolutionary game theory and computational social science. In Evoplex, the models are not coupled to the execution parameters or the visualization tools, and there is a user-friendly graphical user interface which makes it easy for all users, ranging from newcomers to experienced, to create, analyze, replicate and reproduce the experiments. eWhoring In this paper, we describe a new type of online fraud, referred to as ‘eWhoring’ by offenders. This crime script analysis provides an overview of the ‘eWhoring’ business model, drawing on more than 6,500 posts crawled from an online underground forum. This is an unusual fraud type, in that offenders readily share information about how it is committed in a way that is almost prescriptive. There are economic factors at play here, as providing information about how to make money from ‘eWhoring’ can increase the demand for the types of images that enable it to happen. We find that sexualised images are typically stolen and shared online. While some images are shared for free, these can quickly become ‘saturated’, leading to the demand for (and trade in) more exclusive ‘packs’. These images are then sold to unwitting customers who believe they have paid for a virtual sexual encounter. A variety of online services are used for carrying out this fraud type, including email, video, dating sites, social media, classified advertisements, and payment platforms. This analysis reveals potential interventions that could be applied to each stage of the crime commission process to prevent and disrupt this crime type. Exact Soft Confidence-Weighted Learning In this paper, we propose a new Soft Confidence-Weighted (SCW) online learning scheme, which enables the conventional confidence-weighted learning method to handle non-separable cases. Unlike the previous confidence-weighted learning algorithms, the proposed soft confidence-weighted learning method enjoys all the four salient properties: (i) large margin training, (ii) confidence weighting, (iii) capability to handle non-separable data, and (iv) adaptive margin. Our experimental results show that the proposed SCW algorithms significantly outperform the original CW algorithm. When comparing with a variety of state-of-theart algorithms (including AROW, NAROW and NHERD), we found that SCW generally achieves better or at least comparable predictive accuracy, but enjoys significant advantage of computational efficiency (i.e., smaller number of updates and lower time cost). Excess Relative Risk Model rERR Excess Risk In statistics, excess risk is a measure of the relationship between a specified risk factor and a specified outcome (such as contracting a disease). It is the difference between two proportions. In epidemiology it is typically defined to be the difference between the proportion of subjects in a population with a particular disease who were exposed to a specified risk factor and the proportion of subjects with that same disease who were not exposed. Excessive Gap Technique In compressed sensing, the sensing matrix is assumed perfectly known. However, there exists perturbation in the sensing matrix in reality due to sensor offsets or noise disturbance. Directions-of-arrival (DoA) estimation with off-grid effect satisfies this situation, and can be formulated into a (non)convex optimization problem with linear inequalities constraints, which can be solved by the interior point method (using the CVX tools), but at a large computational cost. In this work, in order to design efficient algorithms, we consider various alternative formulations, such as unconstrained formulation, primal-dual formulation, or conic formulation to develop group-sparsity promoted solvers. First, the consensus alternating direction method of multipliers (C-ADMM) is applied. Then, iterative algorithms for the BPDN formulation is proposed by combining the Nesterov smoothing technique with accelerated proximal gradient method, and the convergence analysis of the method is conducted as well. We also developed a variant of EGT (Excessive Gap Technique)-based primal-dual method to systematically reduce the smoothing parameter sequentially. Finally, we propose algorithms for quadratically constrained L2-L1 mixed norm minimization problem by using the smoothed dual conic optimization (SDCO) and continuation technique. The performance of accuracy and convergence for all the proposed methods are demonstrated in the numerical simulations. ExcitNet This paper proposes a WaveNet-based neural excitation model (ExcitNet) for statistical parametric speech synthesis systems. Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework. However, they often suffer from noisy outputs because of the difficulties in capturing the complicated time-varying nature of speech signals. To improve modeling efficiency, the proposed ExcitNet vocoder employs an adaptive inverse filter to decouple spectral components from the speech signal. The residual component, i.e. excitation signal, is then trained and generated within the WaveNet framework. In this way, the quality of the synthesized speech signal can be further improved since the spectral component is well represented by a deep learning framework and, moreover, the residual component is efficiently generated by the WaveNet framework. Experimental results show that the proposed ExcitNet vocoder, trained both speaker-dependently and speaker-independently, outperforms traditional linear prediction vocoders and similarly configured conventional WaveNet vocoders. ExFaKT Fact checking is a crucial task for accurately populating, updating and curating knowledge graphs. Manually validating candidate facts is time-consuming. Prior work on automating this task focuses on estimating truthfulness using numerical scores which are not human-interpretable. Others extract explicit mentions of the candidate fact in the text as an evidence for the candidate fact, which can be hard to directly spot. In our work, we introduce ExFaKT, a framework focused on generating human-comprehensible explanations for candidate facts. ExFaKT uses background knowledge encoded in the form of Horn clauses to rewrite the fact in question into a set of other easier-to-spot facts. The final output of our framework is a set of semantic traces for the candidate fact from both text and knowledge graphs. The experiments demonstrate that our rewritings significantly increase the recall of fact spotting while preserving high precision. Moreover, we show that the explanations effectively help humans to perform fact-checking and can also perform well when used for automated fact-checking. ExGUtils The study of reaction times and their underlying cognitive processes is an important field in Psychology. Reaction times are usually modeled through the ex-Gaussian distribution, because it provides a good fit to multiple empirical data. The complexity of this distribution makes the use of computational tools an essential element in the field. Therefore, there is a strong need for efficient and versatile computational tools for the research in this area. In this manuscript we discuss some mathematical details of the ex-Gaussian distribution and apply the ExGUtils package, a set of functions and numerical tools, programmed for python, developed for numerical analysis of data involving the ex-Gaussian probability density. In order to validate the package, we present an extensive analysis of fits obtained with it, discuss advantages and differences between the least squares and maximum likelihood methods and quantitatively evaluate the goodness of the obtained fits (which is usually an overlooked point in most literature in the area). The analysis done allows one to identify outliers in the empirical datasets and criteriously determine if there is a need for data trimming and at which points it should be done. Exogenous Variable Exogenous variables in causal modeling are the variables with no causal links (arrows) leading to them from other variables in the model. In other words, exogenous variables have no explicit causes within the model. The concept of exogenous variable is fundamental in path analysis and structural equation modeling. The complementary concept is endogenous variable. Expansive Automata Network An Automata Network is a map ${f:Q^n\rightarrow Q^n}$ where $Q$ is a finite alphabet. It can be viewed as a network of $n$ entities, each holding a state from $Q$, and evolving according to a deterministic synchronous update rule in such a way that each entity only depends on its neighbors in the network’s graph, called interaction graph. A major trend in automata network theory is to understand how the interaction graph affects dynamical properties of $f$. In this work we introduce the following property called expansivity: the observation of the sequence of states at any given node is sufficient to determine the initial configuration of the whole network. Our main result is a characterization of interaction graphs that allow expansivity. Moreover, we show that this property is generic among linear automata networks over such graphs with large enough alphabet. We show however that the situation is more complex when the alphabet is fixed independently of the size of the interaction graph: no alphabet is sufficient to obtain expansivity on all admissible graphs, and only non-linear solutions exist in some cases. Finally, among other results, we consider a stronger version of expansivity where we ask to determine the initial configuration from any large enough observation of the system. We show that it can be achieved for any number of nodes and naturally gives rise to maximum distance separable codes. Expansivity An Automata Network is a map ${f:Q^n\rightarrow Q^n}$ where $Q$ is a finite alphabet. It can be viewed as a network of $n$ entities, each holding a state from $Q$, and evolving according to a deterministic synchronous update rule in such a way that each entity only depends on its neighbors in the network’s graph, called interaction graph. A major trend in automata network theory is to understand how the interaction graph affects dynamical properties of $f$. In this work we introduce the following property called expansivity: the observation of the sequence of states at any given node is sufficient to determine the initial configuration of the whole network. Our main result is a characterization of interaction graphs that allow expansivity. Moreover, we show that this property is generic among linear automata networks over such graphs with large enough alphabet. We show however that the situation is more complex when the alphabet is fixed independently of the size of the interaction graph: no alphabet is sufficient to obtain expansivity on all admissible graphs, and only non-linear solutions exist in some cases. Finally, among other results, we consider a stronger version of expansivity where we ask to determine the initial configuration from any large enough observation of the system. We show that it can be achieved for any number of nodes and naturally gives rise to maximum distance separable codes. Expectation Maximization(EM) In statistics, an expectation-maximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. Expectation Propagation(EP) Expectation Propagation (EP) is a technique in Bayesian machine learning. EP finds approximations to a probability distribution. It uses an iterative approach that leverages the factorization structure of the target distribution. It differs from other Bayesian approximation approaches such as Variational Bayesian methods. Expectation-Biasing State-of-the-art forecasting methods using Recurrent Neural Net- works (RNN) based on Long-Short Term Memory (LSTM) cells have shown exceptional performance targeting short-horizon forecasts, e.g given a set of predictor features, forecast a target value for the next few time steps in the future. However, in many applications, the performance of these methods decays as the forecasting horizon extends beyond these few time steps. This paper aims to explore the challenges of long-horizon forecasting using LSTM networks. Here, we illustrate the long-horizon forecasting problem in datasets from neuroscience and energy supply management. We then propose expectation-biasing, an approach motivated by the literature of Dynamic Belief Networks, as a solution to improve long-horizon forecasting using LSTMs. We propose two LSTM architectures along with two methods for expectation biasing that significantly outperforms standard practice. Expected Accuracy We empirically investigate the (negative) expected accuracy as an alternative loss function to cross entropy (negative log likelihood) for classification tasks. Coupled with softmax activation, it has small derivatives over most of its domain, and is therefore hard to optimize. A modified, leaky version is evaluated on a variety of classification tasks, including digit recognition, image classification, sequence tagging and tree tagging, using a variety of neural architectures such as logistic regression, multilayer perceptron, CNN, LSTM and Tree-LSTM. We show that it yields comparable or better accuracy compared to cross entropy. Furthermore, the proposed objective is shown to be more robust to label noise. Expected Energy-Based Restricted Boltzmann Machine(EE-RBM) Expected Improvement(EI) Improving the Expected Improvement Algorithm Expected Policy Gradient(EPG) We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussian policies and quadric critics and then extend it to an analytical method for the universal case, covering a broad class of actors and critics, including Gaussian, exponential families, and reparameterised policies with bounded support. For Gaussian policies, we show that it is optimal to explore using covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions. EPG also provides a general framework for reasoning about policy gradient methods, which we use to establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. Furthermore, we prove that EPG reduces the variance of the gradient estimates without requiring deterministic policies and with little computational overhead. Finally, we show that EPG outperforms existing approaches on six challenging domains involving the simulated control of physical systems. Expected Regret Minimization Bayesian optimization has demonstrated impressive success in finding the optimum location $x^{*}$ and value $f^{*}=f(x^{*})=\max_{x\in\mathcal{X}}f(x)$ of the black-box function $f$. In some applications, however, the optimum value is known in advance and the goal is to find the corresponding optimum location. Existing work in Bayesian optimization (BO) has not effectively exploited the knowledge of $f^{*}$ for optimization. In this paper, we consider a new setting in BO in which the knowledge of the optimum value is available. Our goal is to exploit the knowledge about $f^{*}$ to search for the location $x^{*}$ efficiently. To achieve this goal, we first transform the Gaussian process surrogate using the information about the optimum value. Then, we propose two acquisition functions, called confidence bound minimization and expected regret minimization, which exploit the knowledge about the optimum value to identify the optimum location efficiently. We show that our approaches work both intuitively and quantitatively achieve better performance against standard BO methods. We demonstrate real applications in tuning a deep reinforcement learning algorithm on the CartPole problem and XGBoost on Skin Segmentation dataset in which the optimum values are publicly available. EXPected Similarity Estimation(EXPoSE) We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being significant faster than techniques with the same discriminant power. Expected Utility Hypothesis(EUH) In economics, game theory, and decision theory the expected utility hypothesis is a hypothesis concerning people’s preferences with regard to choices that have uncertain outcomes (gambles). This hypothesis states that if specific axioms are satisfied, the subjective value associated with an individual’s gamble is the statistical expectation of that individual’s valuations of the outcomes of that gamble. This hypothesis has proved useful to explain some popular choices that seem to contradict the expected value criterion (which takes into account only the sizes of the payouts and the probabilities of occurrence), such as occur in the contexts of gambling and insurance. Daniel Bernoulli initiated this hypothesis in 1738. Until the mid-twentieth century, the standard term for the expected utility was the moral expectation, contrasted with ‘mathematical expectation’ for the expected value. The von Neumann-Morgenstern utility theorem provides necessary and sufficient conditions under which the expected utility hypothesis holds. From relatively early on, it was accepted that some of these conditions would be violated by real decision-makers in practice but that the conditions could be interpreted nonetheless as ‘axioms’ of rational choice. Work by Anand (1993) argues against this normative interpretation and shows that ‘rationality’ does not require transitivity, independence or completeness. This view is now referred to as the ‘modern view’ and Anand argues that despite the normative and evidential difficulties the general theory of decision-making based on expected utility is an insightful first order approximation that highlights some important fundamental principles of choice, even if it imposes conceptual and technical limits on analysis which need to be relaxed in real world settings where knowledge is less certain or preferences are more sophisticated. Expected Value of Partial Perfect Information(EVPPI) ➘ “Expected Value of Perfect Information” http://…/WP130003.pdf http://…/seqposterSMDMfina.pdf BCEA Expected Value of Perfect Information(EVPI) In decision theory, the expected value of perfect information (EVPI) is the price that one would be willing to pay in order to gain access to perfect information. http://…/seqposterSMDMfina.pdf Expedition Archives are an important source of study for various scholars. Digitization and the web have made archives more accessible and led to the development of several time-aware exploratory search systems. However these systems have been designed for more general users rather than scholars. Scholars have more complex information needs in comparison to general users. They also require support for corpus creation during their exploration process. In this paper we present Expedition – a time-aware exploratory search system that addresses the requirements and information needs of scholars. Expedition possesses a suite of ad-hoc and diversity based retrieval models to address complex information needs; a newspaper-style user interface to allow for larger textual previews and comparisons; entity filters to more naturally refine a result list and an interactive annotated timeline which can be used to better identify periods of importance. Experience-Based Planning Domain(EBPD) Experience-based planning domains (EBPDs) have been recently proposed to improve problem solving by learning from experience. EBPDs provide important concepts for long-term learning and planning in robotics. They rely on acquiring and using task knowledge, i.e., activity schemata, for generating concrete solutions to problem instances in a class of tasks. Using Three-Valued Logic Analysis (TVLA), we extend previous work to generate a set of conditions as the scope of applicability for an activity schema. The inferred scope is a bounded representation of a set of problems of potentially unbounded size, in the form of a 3-valued logical structure, which allows an EBPD system to automatically find an applicable activity schema for solving task problems. We demonstrate the utility of our approach in a set of classes of problems in a simulated domain and a class of real world tasks in a fully physically simulated PR2 robot in Gazebo. Experimental Design Problem Experimental design is a classical problem in statistics and has also found new applications in machine learning. In the experimental design problem, the aim is to estimate an unknown vector x in m-dimensions from linear measurements where a Gaussian noise is introduced in each measurement. The goal is to pick k out of the given n experiments so as to make the most accurate estimate of the unknown parameter x. Given a set S of chosen experiments, the most likelihood estimate x’ can be obtained by a least squares computation. ➚ “Design of Experiments” Expert Core Aggregating responses from crowd workers is a fundamental task in the process of crowdsourcing. In cases where a few experts are overwhelmed by a large number of non-experts, most answer aggregation algorithms such as the majority voting fail to identify the correct answers. Therefore, it is crucial to extract reliable experts from the crowd workers. In this study, we introduce the notion of ‘expert core’, which is a set of workers that is very unlikely to contain a non-expert. We design a graph-mining-based efficient algorithm that exactly computes the expert core. To answer the aggregation task, we propose two types of algorithms. The first one incorporates the expert core into existing answer aggregation algorithms such as the majority voting, whereas the second one utilizes information provided by the expert core extraction algorithm pertaining to the reliability of workers. We then give a theoretical justification for the first type of algorithm. Computational experiments using synthetic and real-world datasets demonstrate that our proposed answer aggregation algorithms outperform state-of-the-art algorithms. Expert Iteration Solving sequential decision making problems, such as text parsing, robotic control, and game playing, requires a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration, a novel algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. In contrast, standard Deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that our method substantially outperforms Policy Gradients in the board game Hex, winning 84.4% of games against it when trained for equal time. Expert-Augmented Machine Learning(EAML) Machine Learning is proving invaluable across disciplines. However, its successis often limited by the quality and quantity of available data, while its adoption by the level of trust that models afford users. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of man and machine. Here we present Expert-Augmented Machine Learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We use a large dataset of intensive care patient data to predict mortality and show that we can extract expert knowledge using an online platform, help reveal hidden confounders, improve generalizability ona different population and learn using less data. EAML presents a novel framework for high performance and dependable machine learning in critical applications. Experts Model Estimating the intensity of emotion has gained significance as modern textual inputs in potential applications like social media, e-retail markets, psychology, advertisements etc., carry a lot of emotions, feelings, expressions along with its meaning. However, the approaches of traditional sentiment analysis primarily focuses on classifying the sentiment in general (positive or negative) or at an aspect level (very positive, low negative, etc.) and cannot exploit the intensity information. Moreover, automatically identifying emotions like anger, fear, joy, sadness, disgust etc., from text introduces challenging scenarios where single tweet may contain multiple emotions with different intensities and some emotions may even co-occur in some of the tweets. In this paper, we propose an architecture, Experts Model, inspired from the standard Mixture of Experts (MoE) model. The key idea here is each expert learns different sets of features from the feature vector which helps in better emotion detection from the tweet. We compared the results of our Experts Model with both baseline results and top five performers of SemEval-2018 Task-1, Affect in Tweets (AIT). The experimental results show that our proposed approach deals with the emotion detection problem and stands at top-5 results. ExperTwin Even though the advent of the Web coupled with powerful search engines has empowered the knowledge workers to quickly find the needed information, it still is a time-consuming operation. Presently there are no readily available tools that can create and maintain an up-to-date personal knowledge base that can be readily consulted when needed. While organizing the entire Web as a semantic network is a long-term goal, creation of a semantic network of personal knowledge sources that are continuously updated by crawlers and other devices is an attainable task. We created an app titled ExperTwin, that collects personally relevant knowledge units (known as JANs) from the Web, Email correspondence, and locally stored files, organize them as a semantic network that can be easily queried and visualized in many formats – just in time – when performing a knowledge-based task. The architecture of ExperTwin is based on the model of a ‘Society of Intelligent Agents’, where each agent is responsible for a specific task. Collection of JANs from multiple sources, establishing the relevancy, and creation of the personal semantic network are some of the many tasks performed by the individual agents. Tensorflow and Natural Language Processing (NLP) tools have been implemented to let ExperTwin learn from users. Document the design and deployment of ExperTwin as a ‘Knowledge Advantage Machine’ able to search for relevant information while performing a knowledge-based task, is the main goal of the research presented in this post. Explain to Fix(E2X) Explaining predictions of deep neural networks (DNNs) is an important and nontrivial task. In this paper, we propose a practical approach to interpret decisions made by a DNN object detector that has fidelity comparable to state-of-the-art methods and sufficient computational efficiency to process large datasets. Our method relies on recent theory and approximates Shapley feature importance values. We qualitatively and quantitatively show that the proposed explanation method can be used to find image features which cause failures in DNN object detection. The developed software tool combined into the ‘Explain to Fix’ (E2X) framework has a factor of 10 higher computational efficiency than prior methods and can be used for cluster processing using graphics processing units (GPUs). Lastly, we propose a potential extension of the E2X framework where the discovered missing features can be added into training dataset to overcome failures after model retraining. Explain3D Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today’s data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries’ differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently. Explainability ➚ “Collaborative Black-box and RUle Set Hybrid” Explainable Artificial Intelligence(XAI) This paper introduces the ‘grasp-ability test’ as a ‘goodness’ criteria by which to compare which explanation is more or less meaningful than others for users to understand the automated algorithmic data processing. Explainable Security(XSec) The Defense Advanced Research Projects Agency (DARPA) recently launched the Explainable Artificial Intelligence (XAI) program that aims to create a suite of new AI techniques that enable end users to understand, appropriately trust, and effectively manage the emerging generation of AI systems. In this paper, inspired by DARPA’s XAI program, we propose a new paradigm in security research: Explainable Security (XSec). We discuss the “Six Ws” of XSec (Who What Where When Why and How ) and argue that XSec has unique and complex characteristics: XSec involves several different stakeholders (i.e., the system’s developers, analysts, users and attackers) and is multi-faceted by nature (as it requires reasoning about system model, threat model and properties of security, privacy and trust as well as about concrete attacks, vulnerabilities and countermeasures). We define a roadmap for XSec that identifies several possible research directions. Explainable Time Series Tweaking Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP-hard, and focus on two instantiations of the problem, which we refer to as reversible and irreversible time series tweaking. The classifier under investigation is the random shapelet forest classifier. Moreover, we propose two algorithmic solutions for the two problems along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions. ExplaiNE Networks are powerful data structures, but are challenging to work with for conventional machine learning methods. Network Embedding (NE) methods attempt to resolve this by learning vector representations for the nodes, for subsequent use in downstream machine learning tasks. Link Prediction (LP) is one such downstream machine learning task that is an important use case and popular benchmark for NE methods. Unfortunately, while NE methods perform exceedingly well at this task, they are lacking in transparency as compared to simpler LP approaches. We introduce ExplaiNE, an approach to offer counterfactual explanations for NE-based LP methods, by identifying existing links in the network that explain the predicted links. ExplaiNE is applicable to a broad class of NE algorithms. An extensive empirical evaluation for the NE method Conditional Network Embedding’ in particular demonstrates its accuracy and scalability. eXplaining Aggregates for eXploratory Analytics(XAXA) Analysts wishing to explore multivariate data spaces, typically pose queries involving selection operators, i.e., range or radius queries, which define data subspaces of possible interest and then use aggregation functions, the results of which determine their exploratory analytics interests. However, such aggregate query (AQ) results are simple scalars and as such, convey limited information about the queried subspaces for exploratory analysis. We address this shortcoming aiding analysts to explore and understand data subspaces by contributing a novel explanation mechanism coined XAXA: eXplaining Aggregates for eXploratory Analytics. XAXA’s novel AQ explanations are represented using functions obtained by a three-fold joint optimization problem. Explanations assume the form of a set of parametric piecewise-linear functions acquired through a statistical learning model. A key feature of the proposed solution is that model training is performed by only monitoring AQs and their answers on-line. In XAXA, explanations for future AQs can be computed without any database (DB) access and can be used to further explore the queried data subspaces, without issuing any more queries to the DB. We evaluate the explanation accuracy and efficiency of XAXA through theoretically grounded metrics over real-world and synthetic datasets and query workloads. ExplainIt! We present ExplainIt!, a declarative, unsupervised root-cause analysis engine that uses time series monitoring data from large complex systems such as data centres. ExplainIt! empowers operators to succinctly specify a large number of causal hypotheses to search for causes of interesting events. ExplainIt! then ranks these hypotheses and summarises causal dependencies between hundreds of thousands of variables for human understanding. We show how a declarative language, such as SQL, can be effective in declaratively enumerating hypotheses that probe the structure of an unknown probabilistic graphical causal model of the underlying system. Our thesis is that databases are in a unique position to enable users to rapidly explore the possible causal mechanisms in data collected from diverse sources. We empirically demonstrate how ExplainIt! had helped us resolve over 30 performance issues in a commercial product since late 2014, of which we discuss a few cases in detail. Explanation-assisted Guess(ExAG) While there have been many proposals on how to make AI algorithms more transparent, few have attempted to evaluate the impact of AI explanations on human performance on a task using AI. We propose a Twenty-Questions style collaborative image guessing game, Explanation-assisted Guess Which (ExAG) as a method of evaluating the efficacy of explanations in the context of Visual Question Answering (VQA) – the task of answering natural language questions on images. We study the effect of VQA agent explanations on the game performance as a function of explanation type and quality. We observe that ‘effective’ explanations are not only conducive to game performance (by almost 22% for ‘excellent’ rated explanations), but also helpful when VQA system answers are erroneous or noisy (by almost 30% compared to no explanations). We also see that players develop a preference for explanations even when penalized and that the explanations are mostly rated as ‘helpful’. Explanatory Artificial Intelligence(XAI) ➚ “Explainable Artificial Intelligence” Explanatory Graph This paper introduces a graphical model, namely an explanatory graph, which reveals the knowledge hierarchy hidden inside conv-layers of a pre-trained CNN. Each filter in a conv-layer of a CNN for object classification usually represents a mixture of object parts. We develop a simple yet effective method to disentangle object-part pattern components from each filter. We construct an explanatory graph to organize the mined part patterns, where a node represents a part pattern, and each edge encodes co-activation relationships and spatial relationships between patterns. More crucially, given a pre-trained CNN, the explanatory graph is learned without a need of annotating object parts. Experiments show that each graph node consistently represented the same object part through different images, which boosted the transferability of CNN features. We transferred part patterns in the explanatory graph to the task of part localization, and our method significantly outperformed other approaches. EXplicit interAction Model(EXAM) Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multi-label and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches. Explicit Semantic Analysis(ESA) In natural language processing and information retrieval, explicit semantic analysis (ESA) is a vectorial representation of text (individual words or entire documents) that uses a document corpus as a knowledge base. Specifically, in ESA, a word is represented as a column vector in the tf-idf matrix of the text corpus and a document (string of words) is represented as the centroid of the vectors representing its words. Typically, the text corpus is Wikipedia, though other corpora including the Open Directory Project have been used. ESA was designed by Evgeniy Gabrilovich and Shaul Markovitch as a means of improving text categorization and has been used by this pair of researchers to compute what they refer to as ‘semantic relatedness’ by means of cosine similarity between the aforementioned vectors, collectively interpreted as a space of ‘concepts explicitly defined and described by humans’, where Wikipedia articles (or ODP entries, or otherwise titles of documents in the knowledge base corpus) are equated with concepts. The name ‘explicit semantic analysis’ contrasts with latent semantic analysis (LSA), because the use of a knowledge base makes it possible to assign human-readable labels to the concepts that make up the vector space. ESA, as originally posited by Gabrilovich and Markovitch, operates under the assumption that the knowledge base contains topically orthogonal concepts. However, it was later shown by Anderka and Stein that ESA also improves the performance of information retrieval systems when it is based not on Wikipedia, but on the Reuters corpus of newswire articles, which does not satisfy the orthogonality property; in their experiments, Anderka and Stein used newswire stories as ‘concepts’. To explain this observation, links have been shown between ESA and the generalized vector space model. Gabrilovich and Markovitch replied to Anderka and Stein by pointing out that their experimental result was achieved using ‘a single application of ESA (text similarity)’ and ‘just a single, extremely small and homogenous test collection of 50 news documents’. Cross-language explicit semantic analysis (CL-ESA) is a multilingual generalization of ESA. CL-ESA exploits a document-aligned multilingual reference collection (e.g., again, Wikipedia) to represent a document as a language-independent concept vector. The relatedness of two documents in different languages is assessed by the cosine similarity between the corresponding vector representations. http://…-explicit-semantic-analysis-esa-explained Exploding and Vanishing Gradient Problem(EVGP) h-detach: Modifying the LSTM Gradient Towards Better Optimization Exploration Potential We introduce exploration potential, a quantity for that measures how much a reinforcement learning agent has explored its environment class. In contrast to information gain, exploration potential takes the problem’s reward structure into account. This leads to an exploration criterion that is both necessary and sufficient for asymptotic optimality (learning to act optimally across the entire environment class). Our experiments in multi-armed bandits use exploration potential to illustrate how different algorithms make the tradeoff between exploration and exploitation. Explorative Landscape Analysis(ELA) Exploratory Landscape Analysis subsumes a number of techniques employed to obtain knowledge about the properties of an unknown optimization problem, especially insofar as these properties are important for the performance of optimization algorithms. Wher flacco Exploratory Data Analysis(EDA) In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA. Exploratory Factor Analysis(EFA) In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a scale is a collection of questions used to measure a particular research topic) and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. An example of a measured variable would be the physical height of a human being. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis. EFA is based on the common factor model. Within the common factor model, a function of common factors, unique factors, and errors of measurements expresses measured variables. Common factors influence two or more measured variables, while each unique factor influences only one measured variable and does not explain correlations among measured variables. EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to confirmatory factor analysis (CFA). EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method. http://…/factornew.htm Survey on establishing the optimal number of factors in exploratory factor analysis applied to data mining Explore-Exploit Interactive user interfaces need to continuously evolve based on the interactions that a user has (or does not have) with the system. This may require constant exploration of various options that the system may have for the user and obtaining signals of user preferences on those. However, such an exploration, especially when the set of available options itself can change frequently, can lead to sub-optimal user experiences. We present Explore-Exploit: a framework designed to collect and utilize user feedback in an interactive and online setting that minimizes regressions in end-user experience. This framework provides a suite of online learning operators for various tasks such as personalization ranking, candidate selection and active learning. We demonstrate how to integrate this framework with run-time services to leverage online and interactive machine learning out-of-the-box. We also present results demonstrating the efficiencies that can be achieved using the Explore-Exploit framework. Exponential Moving Average(EMA) An exponential moving average (EMA), also known as an exponentially weighted moving average (EWMA), is a type of infinite impulse response filter that applies weighting factors which decrease exponentially. The weighting for each older datum decreases exponentially, never reaching zero. The graph at right shows an example of the weight decrease. Exponential Multivariate Hawkes Model In recent years methods of data analysis for point processes have received some attention, for example, by Cox & Lewis (1966) and Lewis (1964). In particular Bartlett (1963a, b) has introduced methods of analysis based on the point spectrum. Theoretical models are relatively sparse. In this paper the theoretical properties of a class of processes with particular reference to the point spectrum or corresponding covariance density functions are discussed. A particular result is a self-exciting process with the same second-order properties as a certain doubly stochastic process. These are not distinguishable by methods of data analysis based on these properties. emhawkes Exponential Random Graph Models(ERGM) Exponential random graph models (ERGMs) are a family of statistical models for analyzing data about social and other networks. Many metrics exist to describe the structural features of an observed network such as the density, centrality, or assortativity. However, these metrics describe the observed network which is only one instance of a large number of possible alternative networks. This set of alternative networks may have similar or dissimilar structural features. To support statistical inference on the processes influencing the formation of network structure, a statistical model should consider the set of all possible alternative networks weighted on their similarity to an observed network. However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like linear regression. Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of confounding processes, efficiently representing complex structures, and linking local-level processes to global-level properties. Degree Preserving Randomization, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks. Exponential Smoothing Exponential smoothing is a technique that can be applied to time series data, either to produce smoothed data for presentation, or to make forecasts. The time series data themselves are a sequence of observations. The observed phenomenon may be an essentially random process, or it may be an orderly, but noisy, process. Whereas in the simple moving average the past observations are weighted equally, exponential smoothing assigns exponentially decreasing weights over time. Exponential smoothing is commonly applied to financial market and economic data, but it can be used with any discrete set of repeated measurements. The simplest form of exponential smoothing should be used only for data without any systematic trend or seasonal components. Exponential-Generalized Truncated Logarithmic(EGTL) In this paper, we introduce a new two-parameter lifetime distribution, called the exponential-generalized truncated logarithmic (EGTL) distribution, by compounding the exponential and generalized truncated logarithmic distributions. Our procedure generalizes the exponential-logarithmic (EL) distribution modelling the reliability of systems by the use of first-order concepts, where the minimum lifetime is considered (Tahmasbi 2008). In our approach, we assume that a system fails if a given number k of the components fails and then, we consider the kth-smallest value of lifetime instead of the minimum lifetime. The reliability and failure rate functions as well as their properties are presented for some special cases. The estimation of the parameters is attained by the maximum likelihood, the expectation maximization algorithm, the method of moments and the Bayesian approach, with a simulation study performed to illustrate the different methods of estimation. The application study is illustrated based on two real data sets used in many applications of reliability. Exposure Machine learning models based on neural networks and deep learning are being rapidly adopted for many purposes. What those models learn, and what they may share, is a significant concern when the training data may contain secrets and the models are public — e.g., when a model helps users compose text messages using models trained on all users’ messages. This paper presents exposure: a simple-to-compute metric that can be applied to any deep learning model for measuring the memorization of secrets. Using this metric, we show how to extract those secrets efficiently using black-box API access. Further, we show that unintended memorization occurs early, is not due to over-fitting, and is a persistent issue across different types of models, hyperparameters, and training strategies. We experiment with both real-world models (e.g., a state-of-the-art translation model) and datasets (e.g., the Enron email dataset, which contains users’ credit card numbers) to demonstrate both the utility of measuring exposure and the ability to extract secrets. Finally, we consider many defenses, finding some ineffective (like regularization), and others to lack guarantees. However, by instantiating our own differentially-private recurrent model, we validate that by appropriately investing in the use of state-of-the-art techniques, the problem can be resolved, with high utility. Extended Autoregressive Model(EAR) Generative models (GMs) such as Generative Adversary Network (GAN) and Variational Auto-Encoder (VAE) have thrived these years and achieved high quality results in generating new samples. Especially in Computer Vision, GMs have been used in image inpainting, denoising and completion, which can be treated as the inference from observed pixels to corrupted pixels. However, images are hierarchically structured which are quite different from many real-world inference scenarios with non-hierarchical features. These inference scenarios contain heterogeneous stochastic variables and irregular mutual dependences. Traditionally they are modeled by Bayesian Network (BN). However, the learning and inference of BN model are NP-hard thus the number of stochastic variables in BN is highly constrained. In this paper, we adapt typical GMs to enable heterogeneous learning and inference in polynomial time.We also propose an extended autoregressive (EAR) model and an EAR with adversary loss (EARA) model and give theoretical results on their effectiveness. Experiments on several BN datasets show that our proposed EAR model achieves the best performance in most cases compared to other GMs. Except for black box analysis, we’ve also done a serial of experiments on Markov border inference of GMs for white box analysis and give theoretical results. Extended Autoregressive Model with Adversary Loss(EARA) Generative models (GMs) such as Generative Adversary Network (GAN) and Variational Auto-Encoder (VAE) have thrived these years and achieved high quality results in generating new samples. Especially in Computer Vision, GMs have been used in image inpainting, denoising and completion, which can be treated as the inference from observed pixels to corrupted pixels. However, images are hierarchically structured which are quite different from many real-world inference scenarios with non-hierarchical features. These inference scenarios contain heterogeneous stochastic variables and irregular mutual dependences. Traditionally they are modeled by Bayesian Network (BN). However, the learning and inference of BN model are NP-hard thus the number of stochastic variables in BN is highly constrained. In this paper, we adapt typical GMs to enable heterogeneous learning and inference in polynomial time.We also propose an extended autoregressive (EAR) model and an EAR with adversary loss (EARA) model and give theoretical results on their effectiveness. Experiments on several BN datasets show that our proposed EAR model achieves the best performance in most cases compared to other GMs. Except for black box analysis, we’ve also done a serial of experiments on Markov border inference of GMs for white box analysis and give theoretical results. Extended Bayesian Information Criterion(EBIC) The ordinary Bayes information criterion is too liberal for model selection when the model space is large. In this article, we re-examine the Bayesian paradigm for model selection and propose an extended family of Bayes information criteria. The new criteria take into account both the number of unknown parameters and the complexity of the model space. Their consistency is established, in particular allowing the number of covariates to increase to in nity with the sample size. Their performance in various situations is evaluated by simulation studies. It is demonstrated that the extended Bayes information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. The extended Bayes information criteria are extremely useful for variable selection in problems with a moderate sample size but a huge number of covariates, especially in genome-wide association studies, which are now an active area in genetics research. Some keywords: Bayesian paradigm; Consistency; Genome-wide association study; Tour- nament approach; Variable selection. Extended Fourier Amplitude Sensitivity Test Excluding irrelevant features in a pattern recognition task plays an important role in maintaining a simpler machine learning model and optimizing the computational efficiency. Nowadays with the rise of large scale datasets, feature selection is in great demand as it becomes a central issue when facing high-dimensional datasets. The present study provides a new measure of saliency for features by employing a Sensitivity Analysis (SA) technique called the extended Fourier amplitude sensitivity test, and a well-trained Feedforward Neural Network (FNN) model, which ultimately leads to the selection of a promising optimal feature subset. Ideas of the paper are mainly demonstrated based on adopting FNN model for feature selection in classification problems. But in the end, a generalization framework is discussed in order to give insights into the usage in regression problems as well as expressing how other function approximate models can be deployed. Effectiveness of the proposed method is verified by result analysis and data visualization for a series of experiments over several well-known datasets drawn from UCI machine learning repository. Extended Isolation Forest We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended Isolation Forest (EIF), improves the consistency and reliability of the anomaly score produced for a given data point. We show that the standard Isolation Forest produces inconsistent scores using score maps. The score maps suffer from an artifact generated as a result of how the criteria for branching operation of the binary tree is selected. We propose two different approaches for improving the situation. First we propose transforming the data randomly before creation of each tree, which results in averaging out the bias introduced in the algorithm. Second, which is the preferred way, is to allow the slicing of the data to use hyperplanes with random slopes. This approach results in improved score maps. We show that the consistency and reliability of the algorithm is much improved using this method by looking at the variance of scores of data points distributed along constant score lines. We find no appreciable difference in the rate of convergence nor in computation time between the standard Isolation Forest and EIF, which highlights its potential as anomaly detection algorithm. Extended Kalman Filter(EKF) In estimation theory, the extended Kalman filter (EKF) is the nonlinear version of the Kalman filter which linearizes about an estimate of the current mean and covariance. In the case of well defined transition models, the EKF has been considered the de facto standard in the theory of nonlinear state estimation, navigation systems and GPS. Extended PCA(XPCA) Principal component analysis (PCA) is arguably the most popular tool in multivariate exploratory data analysis. In this paper, we consider the question of how to handle heterogeneous variables that include continuous, binary, and ordinal. In the probabilistic interpretation of low-rank PCA, the data has a normal multivariate distribution and, therefore, normal marginal distributions for each column. If some marginals are continuous but not normal, the semiparametric copula-based principal component analysis (COCA) method is an alternative to PCA that combines a Gaussian copula with nonparametric marginals. If some marginals are discrete or semi-continuous, we propose a new extended PCA (XPCA) method that also uses a Gaussian copula and nonparametric marginals and accounts for discrete variables in the likelihood calculation by integrating over appropriate intervals. Like PCA, the factors produced by XPCA can be used to find latent structure in data, build predictive models, and perform dimensionality reduction. We present the new model, its induced likelihood function, and a fitting algorithm which can be applied in the presence of missing data. We demonstrate how to use XPCA to produce an estimated full conditional distribution for each data point, and use this to produce to provide estimates for missing data that are automatically range respecting. We compare the methods as applied to simulated and real-world data sets that have a mixture of discrete and continuous variables. Extended Space Forest The Extended Space Forest is a new method for decision tree construction in which training is done with input vectors including all the original features and their random combinations. The combinations are generated with a difference operator applied to random pairs of original features. The experimental results show that extended space versions of ensemble algorithms have better performance than the original ensemble algorithms. To investigate the success dynamics of the Extended Space Forest, the individual accuracy and diversity creation powers of ensemble algorithms are compared. The Extended Space Forest creates more diversity when it uses all the input features than Bagging and Rotation Forest. It also results in more individual accuracy when it uses random selection of the features than Random Subspace and Random Forest methods. It needs more training time because of using more features than the original algorithms. But its testing time is lower than the others because it generates less complex base learners. eXtensible Neural Machine Translation toolkit(XNMT) This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configuration system, and demonstrate its utility on the tasks of machine translation, speech recognition, and multi-tasked machine translation/parsing. XNMT is available open-source at https://…/xnmt Exterior Distance Function(EDF) We introduce and study exterior distance function (EDF) and correspondent exterior point method (EPM) for convex optimization. The EDF is a classical Lagrangian for an equivalent problem obtained from the initial one by monotone transformation of both the objective function and the constraints. The constraints transformation is scaled by a positive scaling parameter. Thus, the EDF is a particular realization of the Nonlinear Rescaling (NR) principle. Along with the ‘center’, the EDF has two extra tools: the barrier (scaling) parameter and the vector of Lagrange multipliers. We show that EPM generates primal – dual sequence, which converges to the primal – dual solution in value under minimum assumption on the input data. Moreover, the convergence is taking place under any fixed interior point as a ‘center’ and any fixed positive scaling parameter, just due to the Lagrange multipliers update. If the second order sufficient optimality condition is satisfied, then the EPM converges with Q-linear rate under any fixed interior point as a ‘center’ and any fixed, but large enough positive scaling parameter. Exterior Point Method(EPM) ➘ “Exterior Distance Function” EXTRA State-of-the-art in network science of teams offers effective recommendation methods to answer questions like who is the best replacement, what is the best team expansion strategy, but lacks intuitive ways to explain why the optimization algorithm gives the specific recommendation for a given team optimization scenario. To tackle this problem, we develop an interactive prototype system, EXTRA, as the first step towards addressing such a sense-making challenge, through the lens of the underlying network where teams embed, to explain the team recommendation results. The main advantages are (1) Algorithm efficacy: we propose an effective and fast algorithm to explain random walk graph kernel, the central technique for networked team recommendation; (2) Intuitive visual explanation: we present intuitive visual analysis of the recommendation results, which can help users better understand the rationality of the underlying team recommendation algorithm. Extract, Transform, Analyse and Load(ET(A)L) The ET(AL) is another form of reduction mechanism, which is why the analytics aspect is included to ensure that the data that gets through is the data that is needed, and that the junk and noise that has no recognisable value, gets cleaned out early and often. Extract-Transform-Load(ETL) In computing, extract, transform, and load (ETL) refers to a process in database usage and especially in data warehousing that: *Extracts data from outside sources *Transforms it to fit operational needs, which can include quality levels *Loads it into the end target (database, more specifically, operational data store, data mart, or data warehouse) ETL systems are commonly used to integrate data from multiple applications, typically developed and supported by different vendors or hosted on separate computer hardware. The disparate systems containing the original data are frequently managed and operated by different employees. For example a cost accounting system may combine data from payroll, sales and purchasing. Extrapolation Compression Optimizing distributed learning systems is an art of balancing between computation and communication. There have been two lines of research that try to deal with slower networks: {\em quantization} for low bandwidth networks, and {\em decentralization} for high latency networks. In this paper, we explore a natural question: {\em can the combination of both decentralization and quantization lead to a system that is robust to both bandwidth and latency?} Although the system implication of such combination is trivial, the underlying theoretical principle and algorithm design is challenging: simply quantizing data sent in a decentralized training algorithm would accumulate the error. In this paper, we develop a framework of quantized, decentralized training and propose two different strategies, which we call {\em extrapolation compression} and {\em difference compression}. We analyze both algorithms and prove both converge at the rate of $O(1/\sqrt{nT})$ where $n$ is the number of workers and $T$ is the number of iterations, matching the {\rc convergence} rate for full precision, centralized training. We evaluate our algorithms on training deep learning models, and find that our proposed algorithm outperforms the best of merely decentralized and merely quantized algorithm significantly for networks with {\em both} high latency and low bandwidth. Extremal Depth(ED) We propose a new notion called extremal depth’ (ED) for functional data, discuss its properties, and compare its performance with existing concepts. The proposed notion is based on a measure of extreme outlyingness’. ED has several desirable properties that are not shared by other notions and is especially well suited for obtaining central regions of functional data and function spaces. In particular: a) the central region achieves the nominal (desired) simultaneous coverage probability; b) there is a correspondence between ED-based (simultaneous) central regions and appropriate point-wise central regions; and c) the method is resistant to certain classes of functional outliers. The paper examines the performance of ED and compares it with other depth notions. Its usefulness is demonstrated through applications to constructing central regions, functional boxplots, outlier detection, and simultaneous confidence bands in regression problems. Extreme Bounds Analysis(EBA) The basic idea of extreme bounds analysis is quite simple. We are interested in finding out which variables from the set X are robustly associated with the dependent variable y. To do so, we run a large number of regression models. Each has y as the dependent variable and includes a set of standard explanatory variables F that are included in each regression model. In addition, each model includes a different subset D of the variables in X. Following the convention in the literature, we will refer to F as the free variables and to X as the doubtful variables. Some subset of the doubtful variables X might be socalled focus variables that are of particular interest to the researcher. The doubtful variables 4 ExtremeBounds: Extreme Bounds Analysis in R whose regression coefficients retain their statistical significance in a large enough proportion of estimated models are declared to be robust, whereas those that do not are labelled fragile. ExtremeBounds Extreme Function Theory We introduce an extreme function theory as a novel method by which probabilistic novelty detection may be performed with functions, where the functions are represented by time-series of (potentially multivariate) discrete observations. We set the method within the framework of Gaussian processes (GP), which offers a convenient means of constructing a distribution over functions. Whereas conventional novelty detection methods aim to identify individually extreme data points, with respect to a model of normality constructed using examples of ‘normal’ data points, the proposed method aims to identify extreme functions, with respect to a model of normality constructed using examples of ‘normal’ functions, where those functions are represented by time-series of observations. The method is illustrated using synthetic data, physiological data acquired from a large clinical trial, and a benchmark time-series dataset. Extreme Gradient Boosting Extreme Gradient Boosting, which is an efficient implementation of gradient boosting framework. xgboost Extreme Learning Machine(ELM) Extreme learning machine (ELM) is a modification of single layer feedforward network (SLFN) where learning is quite similar to the reservoir computing. ELMR,elmNNRcpp Extreme Machine Learning(ELM) ➘ “Extreme Learning Machine” Extreme Multi-Label Classification(XML) Ranking-Based Autoencoder for Extreme Multi-label Classification Extreme Multi-Label Learning using Distributional Semantics(ExMLDS) We present a novel and scalable label embedding framework for large-scale multi-label learning a.k.a ExMLDS (Extreme Multi-Label Learning using Distributional Semantics). Our approach draws inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings for natural language processing tasks. Learning such embeddings can be reduced to a certain matrix factorization. Our approach is novel in that it highlights interesting connections between label embedding methods used for multi-label learning and paragraph/document embedding methods commonly used for learning representations of text data. The framework can also be easily extended to incorporate auxiliary information such as label-label correlations; this is crucial especially when there are a lot of missing labels in the training data. We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed learning methods perform favorably compared to several baselines and state-of-the-art methods for large-scale multi-label learning. Extreme Studentized Deviate(ESD) The generalized extreme Studentized deviate (ESD) test is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution. The primary limitation of the Grubbs test and the Tietjen-Moore test is that the suspected number of outliers, k, must be specified exactly. If k is not specified correctly, this can distort the conclusions of these tests. On the other hand, the generalized ESD test only requires that an upper bound for the suspected number of outliers be specified. Given the upper bound, r, the generalized ESD test essentially performs r separate tests: a test for one outlier, a test for two outliers, and so on up to r outliers. Extreme Summarization We introduce extreme summarization, a new single-document summarization task which does not favor extractive strategies and calls for an abstractive modeling approach. The idea is to create a short, one-sentence news summary answering the question ‘What is the article about?’. We collect a real-world, large-scale dataset for this task by harvesting online articles from the British Broadcasting Corporation (BBC). We propose a novel abstractive model which is conditioned on the article’s topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans. Extreme Tensoring State-of-the-art models are now trained with billions of parameters, reaching hardware limits in terms of memory consumption. This has created a recent demand for memory-efficient optimizers. To this end, we investigate the limits and performance tradeoffs of memory-efficient adaptively preconditioned gradient methods. We propose extreme tensoring for high-dimensional stochastic optimization, showing that an optimizer needs very little memory to benefit from adaptive preconditioning. Our technique applies to arbitrary models (not necessarily with tensor-shaped parameters), and is accompanied by regret and convergence guarantees, which shed light on the tradeoffs between preconditioner quality and expressivity. On a large-scale NLP model, we reduce the optimizer memory overhead by three orders of magnitude, without degrading performance. Extreme Value Analysis(EVA) ➘ “Extreme Value Theory” Introduction to Extreme Value Analysis https://…/Extremes.pdf hkevp,revdbayes,threshr Extreme Value Learning(EVL) The novel unseen classes can be formulated as the extreme values of known classes. This inspired the recent works on open-set recognition \cite{Scheirer_2013_TPAMI,Scheirer_2014_TPAMIb,EVM}, which however can have no way of naming the novel unseen classes. To solve this problem, we propose the Extreme Value Learning (EVL) formulation to learn the mapping from visual feature to semantic space. To model the margin and coverage distributions of each class, the Vocabulary-informed Learning (ViL) is adopted by using vast open vocabulary in the semantic space. Essentially, by incorporating the EVL and ViL, we for the first time propose a novel semantic embedding paradigm — Vocabulary-informed Extreme Value Learning (ViEVL), which embeds the visual features into semantic space in a probabilistic way. The learned embedding can be directly used to solve supervised learning, zero-shot and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of proposed frameworks. Extreme Value Theory(EVT) Extreme value theory (EVT) or extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the probability of events that are more extreme than any previously observed. Extreme value analysis is widely used in many disciplines, such as structural engineering, finance, earth sciences, traffic prediction, and geological engineering. For example, EVA might be used in the field of hydrology to estimate the probability of an unusually large flooding event, such as the 100-year flood. Similarly, for the design of a breakwater, a coastal engineer would seek to estimate the 50-year wave and design the structure accordingly. Extreme Value Theory for Open Set Classification – GPD and GEV Classifiers Extreme View Synthesis We present Extreme View Synthesis, a solution for novel view extrapolation when the number of input images is small. Occlusions and depth uncertainty, in this context, are two of the most pressing issues, and worsen as the degree of extrapolation increases. State-of-the-art methods approach this problem by leveraging explicit geometric constraints, or learned priors. Our key insight is that only by modeling both depth uncertainty and image priors can the extreme cases be solved. We first generate a depth probability volume for the novel view and synthesize an estimate of the sought image. Then, we use learned priors combined with depth uncertainty, to refine it. Our method is the first to show visually pleasing results for baseline magnifications of up to 30X.