A Swift Approximate PatternMiner  While there has been a tremendous interest in processing data that has an underlying graph structure, existing distributed graph processing systems take several minutes or even hours to mine simple patterns on graphs. This paper presents ASAP, a fast, approximate computation engine for graph pattern mining. ASAP leverages stateoftheart results in graph approximation theory, and extends it to general graph patterns in distributed settings. To enable the users to navigate the tradeoff between the result accuracy and latency, we propose a novel approach to build the ErrorLatency Profile (ELP) for a given computation. We have implemented ASAP on a generalpurpose distributed dataflow platform, and evaluated it extensively on several graph patterns. Our experimental results show that ASAP outperforms existing exact pattern mining solutions by up to 77×. Further, ASAP can scale to graphs with billions of edges without the need for large clusters. 
A/B Testing  In marketing, A/B testing is a simple randomized experiment with two variants, A and B, which are the control and treatment in the controlled experiment. It is a form of statistical hypothesis testing. Other names include randomized controlled experiments, online controlled experiments, and split testing. In online settings, such as web design (especially user experience design), the goal is to identify changes to web pages that increase or maximize an outcome of interest (e.g., clickthrough rate for a banner advertisement). Why does designing a simple A/B test seem so complicated acp 
AAMP  Matrix profile has been recently proposed as a promising technique to the problem of allpairssimilarity search on time series. Efficient algorithms have been proposed for computing it, e.g., STAMP, STOMP and SCRIMP++. All these algorithms use the znormalized Euclidean distance to measure the distance between subsequences. However, as we observed, for some datasets other Euclidean measurements are more useful for knowledge discovery from time series. In this paper, we propose efficient algorithms for computing matrix profile for a general class of Euclidean distances. We first propose a simple but efficient algorithm called AAMP for computing matrix profile with the ‘pure’ (nonnormalized) Euclidean distance. Then, we extend our algorithm for the pnorm distance. We also propose an algorithm, called ACAMP, that uses the same principle as AAMP, but for the case of znormalized Euclidean distance. We implemented our algorithms, and evaluated their performance through experimentation. The experiments show excellent performance results. For example, they show that AAMP is very efficient for computing matrix profile for nonnormalized Euclidean distances. The results also show that the ACAMP algorithm is significantly faster than SCRIMP++ (the state of the art matrix profile algorithm) for the case of znormalized Euclidean distance. 
ABC Analysis  In materials management, the ABC analysis (or Selective Inventory Control) is an inventory categorization technique. ABC analysis divides an inventory into three categories “A items” with very tight control and accurate records, “B items” with less tightly controlled and good records, and “C items” with the simplest controls possible and minimal records. The ABC analysis provides a mechanism for identifying items that will have a significant impact on overall inventory cost, while also providing a mechanism for identifying different categories of stock that will require different management and controls. The ABC analysis suggests that inventories of an organization are not of equal value. Thus, the inventory is grouped into three categories (A, B, and C) in order of their estimated importance. ‘A’ items are very important for an organization. Because of the high value of these ‘A’ items, frequent value analysis is required. In addition to that, an organization needs to choose an appropriate order pattern (e.g. ‘Just in time’) to avoid excess capacity. ‘B’ items are important, but of course less important than ‘A’ items and more important than ‘C’ items. Therefore ‘B’ items are intergroup items. ‘C’ items are marginally important. adaptDA 
ABC Shadow  This paper presents an original ABC algorithm, ABC Shadow, that can be applied to sample posterior densities that are continuously differentiable. The proposed algorithm solves the main condition to be fulfilled by any ABC algorithm, in order to be useful in practice. This condition requires enough samples in the parameter space region, induced by the observed statistics. The algorithm is tuned on the posterior of a Gaussian model which is entirely known, and then, it is applied for the statistical analysis of several spatial patterns. These patterns are issued or assumed to be outcomes of point processes. The considered models are: Strauss, Candy and areainteraction. A simulated annealing procedure based on the ABC Shadow algorithm for statistical inference of point processes 
ABCDStrategy  Determining the causal structure of a set of variables is critical for both scientific inquiry and decisionmaking. However, this is often challenging in practice due to limited interventional data. Given that randomized experiments are usually expensive to perform, we propose a general framework and theory based on optimal Bayesian experimental design to select experiments for targeted causal discovery. That is, we assume the experimenter is interested in learning some function of the unknown graph (e.g., all descendants of a target node) subject to design constraints such as limits on the number of samples and rounds of experimentation. While it is in general computationally intractable to select an optimal experimental design strategy, we provide a tractable implementation with provable guarantees on both approximation and optimization quality based on submodularity. We evaluate the efficacy of our proposed method on both synthetic and real datasets, thereby demonstrating that our method realizes considerable performance gains over baseline strategies such as random sampling. 
Abductive Reasoning  Abductive reasoning (also called abduction, abductive inference, or retroduction) is a form of logical inference which starts with an observation or set of observations then seeks to find the simplest and most likely explanation. In abductive reasoning, unlike in deductive reasoning, the premises do not guarantee the conclusion. One can understand abductive reasoning as inference to the best explanation, although not all uses of the terms abduction and inference to the best explanation are exactly equivalent. In the 1990s, as computing power grew, the fields of law, computer science, and artificial intelligence research spurred renewed interest in the subject of abduction. Diagnostic expert systems frequently employ abduction. 
Abnormal Event Detection Network (AEDNet) 
It is challenging to detect the anomaly in crowded scenes for quite a long time. In this paper, a selfsupervised framework, abnormal event detection network (AEDNet), which is composed of PCAnet and kernel principal component analysis (kPCA), is proposed to address this problem. Using surveillance video sequences of different scenes as raw data, PCAnet is trained to extract highlevel semantics of crowd’s situation. Next, kPCA,a oneclass classifier, is trained to determine anomaly of the scene. In contrast to some prevailing deep learning methods,the framework is completely selfsupervised because it utilizes only video sequences in a normal situation. Experiments of global and local abnormal event detection are carried out on UMN and UCSD datasets, and competitive results with higher EER and AUC compared to other stateoftheart methods are observed. Furthermore, by adding local response normalization (LRN) layer, we propose an improvement to original AEDNet. And it is proved to perform better by promoting the framework’s generalization capacity according to the experiments. 
Abstaining Classification  Abstaining classification aims to reject to classify the easily misclassified examples, so it is an effective approach to increase the clasificaiton reliability and reduce the misclassification risk in the costsensitive applications. ➘ “BoundedAbstention Method With two Constraints of Reject Rates” 
Abstract Dialectical Framework (ADF) 
Abstract Dialectical Frameworks (ADFs) generalize Dung’s argumentation frameworks allowing various relationships among arguments to be expressed in a systematic way. We further generalize ADFs so as to accommodate arbitrary acceptance degrees for the arguments. This makes ADFs applicable in domains where both the initial status of arguments and their relationship are only insufficiently specified by Boolean functions. We define all standard ADF semantics for the weighted case, including grounded, preferred and stable semantics. We illustrate our approach using acceptance degrees from the unit interval and show how other valuation structures can be integrated. In each case it is sufficient to specify how the generalized acceptance conditions are represented by formulas, and to specify the information ordering underlying the characteristic ADF operator. We also present complexity results for problems related to weighted ADFs. 
Abstract Meaning Representation (AMR) 
We describe AbstractMeaning Representation (AMR), a semantic representation language in which we are writing down the meanings of thousands of English sentences. We hope that a sembank of simple, wholesentence semantic structures will spur new work in statistical natural language understanding and generation, like the Penn Treebank encouraged work on statistical parsing. This paper gives an overview of AMR and tools associated with it. Abstract Meaning Representation (AMR) Annotation Release 1.0, Linguistic Data Consortium (LDC) Catalog Number LDC2014T12 and ISBN 1585636770, was developed by LDC, SDL/Language Weaver, Inc., the University of Colorado’s Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 13,000 English natural language sentences from newswire, weblogs and web discussion forums. AMR captures ‘who is doing what to whom’ in a sentence. Each sentence is paired with a graph that represents its wholesentence meaning in a treestructure. AMR utilizes PropBank frames, noncore semantic roles, withinsentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax. AMR Specification Abstract Meaning Representation: A survey ADDT 
Abstraction Learning  There has been a gap between artificial intelligence and human intelligence. In this paper, we identify three key elements forming human intelligence, and suggest that abstraction learning combines these elements and is thus a way to bridge the gap. Prior researches in artificial intelligence either specify abstraction by human experts, or take abstraction as a qualitative explanation for the model. This paper aims to learn abstraction directly. We tackle three main challenges: representation, objective function, and learning algorithm. Specifically, we propose a partition structure that contains preallocated abstraction neurons; we formulate abstraction learning as a constrained optimization problem, which integrates abstraction properties; we develop a network evolution algorithm to solve this problem. This complete framework is named ONE (Optimization via Network Evolution). In our experiments on MNIST, ONE shows elementary humanlike intelligence, including low energy consumption, knowledge sharing, and lifelong learning. 
ABtree  An Algorithm for SubgroupBased Treatment Assignment. Given two possible treatments, there may exist subgroups who benefit greater from one treatment than the other. This problem is relevant to the field of marketing, where treatments may correspond to different ways of selling a product. It is similarly relevant to the field of public policy, where treatments may correspond to specific government programs ADPclust,densityClust 
ACAMP  Matrix profile has been recently proposed as a promising technique to the problem of allpairssimilarity search on time series. Efficient algorithms have been proposed for computing it, e.g., STAMP, STOMP and SCRIMP++. All these algorithms use the znormalized Euclidean distance to measure the distance between subsequences. However, as we observed, for some datasets other Euclidean measurements are more useful for knowledge discovery from time series. In this paper, we propose efficient algorithms for computing matrix profile for a general class of Euclidean distances. We first propose a simple but efficient algorithm called AAMP for computing matrix profile with the ‘pure’ (nonnormalized) Euclidean distance. Then, we extend our algorithm for the pnorm distance. We also propose an algorithm, called ACAMP, that uses the same principle as AAMP, but for the case of znormalized Euclidean distance. We implemented our algorithms, and evaluated their performance through experimentation. The experiments show excellent performance results. For example, they show that AAMP is very efficient for computing matrix profile for nonnormalized Euclidean distances. The results also show that the ACAMP algorithm is significantly faster than SCRIMP++ (the state of the art matrix profile algorithm) for the case of znormalized Euclidean distance. 
Accelerated Alternating Projection  We study robust PCA for the fully observed setting, which is about separating a low rank matrix \BL \BL and a sparse matrix \BS \BS from their sum \BD=\BL+\BS \BD=\BL+\BS . In this paper, a new algorithm, dubbed accelerated alternating projections, is introduced for robust PCA which significantly improves the computational efficiency of the existing alternating projections proposed in (Netrapalli et al., 2014) when updating the low rank factor. The acceleration is achieved by first projecting a matrix onto some low dimensional subspace before obtaining a new estimate of the low rank matrix via truncated SVD. Exact recovery guarantee has been established which shows linear convergence of the proposed algorithm. Empirical performance evaluations establish the advantage of our algorithm over other stateoftheart algorithms for robust PCA. 
Accelerated Alternating Projections  We study robust PCA for the fully observed setting, which is about separating a low rank matrix $\boldsymbol{L}$ and a sparse matrix $\boldsymbol{S}$ from their sum $\boldsymbol{D}=\boldsymbol{L}+\boldsymbol{S}$. In this paper, a new algorithm, termed accelerated alternating projections, is introduced for robust PCA which accelerates existing alternating projections proposed in [Netrapalli, Praneeth, et al., 2014]. Let $\boldsymbol{L}_k$ and $\boldsymbol{S}_k$ be the current estimates of the low rank matrix and the sparse matrix, respectively. The algorithm achieves significant acceleration by first projecting $\boldsymbol{D}\boldsymbol{S}_k$ onto a low dimensional subspace before obtaining the new estimate of $\boldsymbol{L}$ via truncated SVD. Exact recovery guarantee has been established which shows linear convergence of the proposed algorithm. Empirical performance evaluations establish the advantage of our algorithm over other stateoftheart algorithms for robust PCA. 
Accelerated Bayesian Additive Regression Trees  Although less widely known than random forests or boosted regression trees, Bayesian additive regression trees (BART) \citep{chipman2010bart} is a powerful predictive model that often outperforms those betterknown alternatives at outofsample prediction. BART is especially wellsuited to settings with unstructured predictor variables and substantial sources of unmeasured variation as is typical in the social, behavioral and health sciences. This paper develops a modified version of BART that is amenable to fast posterior estimation. We present a stochastic hill climbing algorithm that matches the remarkable predictive accuracy of previous BART implementations, but is orders of magnitude faster and uses a fraction of the memory. Simulation studies show that the new method is comparable in computation time and more accurate at function estimation than both random forests and gradient boosting. 
Accelerated Coordinate Descent (ACD) 
Accelerated coordinate descent is a widely popular optimization algorithm due to its efficiency on largedimensional problems. It achieves stateoftheart complexity on an important class of empirical risk minimization problems. In this paper we design and analyze an accelerated coordinate descent (ACD) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method. If all coordinates are updated in each iteration, our method reduces to the classical accelerated gradient descent method AGD of Nesterov. If a single coordinate is updated in each iteration, and we pick probabilities proportional to the square roots of the coordinatewise Lipschitz constants, our method reduces to the currently fastest coordinate descent method NUACDM of AllenZhu, Qu, Richt\'{a}rik and Yuan. While minibatch variants of ACD are more popular and relevant in practice, there is no importance sampling for ACD that outperforms the standard uniform minibatch sampling. Through insights enabled by our general analysis, we design new importance sampling for minibatch ACD which significantly outperforms previous stateoftheart minibatch ACD in practice. We prove a rate that is at most ${\cal O}(\sqrt{\tau})$ times worse than the rate of minibatch ACD with uniform sampling, but can be ${\cal O}(n/\tau)$ times better, where $\tau$ is the minibatch size. Since in modern supervised learning training systems it is standard practice to choose $\tau \ll n$, and often $\tau={\cal O}(1)$, our method can lead to dramatic speedups. Lastly, we obtain similar results for minibatch nonaccelerated CD as well, achieving improvements on previous best rates. 
Accelerated Destructive Degradation Test (ADDT) 
Degradation data analysis is a powerful tool for reliability assessment. Useful reliability information is available from degradation data when there are few or even no failures. For some applications the degradation measurement process destroys or changes the physical/mechanical characteristics of test units. In such applications, only one meaningful measurement can be can be taken on each test unit. This is known as ‘destructive degradation’. Degradation tests are often accelerated by testing at higher than usual levels of accelerating variables like temperature. aftgee 
Accelerated Distributed Nesterov Gradient Descent (AccDNGD) 
This paper considers the distributed optimization problem over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. We develop an Accelerated Distributed Nesterov Gradient Descent (AccDNGD) method. When the objective function is convex and $L$smooth, we show that it achieves a $O(\frac{1}{t^{1.4\epsilon}})$ convergence rate for all $\epsilon\in(0,1.4)$. We also show the convergence rate can be improved to $O(\frac{1}{t^2})$ if the objective function is a composition of a linear map and a stronglyconvex and smooth function. When the objective function is $\mu$strongly convex and $L$smooth, we show that it achieves a linear convergence rate of $O([ 1 – O( (\frac{\mu}{L})^{5/7} )]^t)$, where $\frac{L}{\mu}$ is the condition number of the objective. 
Accelerated Failure Time Model (AFT) 
In the statistical area of survival analysis, an accelerated failure time model (AFT model) is a parametric model that provides an alternative to the commonly used proportional hazards models. Whereas a proportional hazards model assumes that the effect of a covariate is to multiply the hazard by some constant, an AFT model assumes that the effect of a covariate is to accelerate or decelerate the life course of a disease by some constant. This is especially appealing in a technical context where the ‘disease’ is a result of some mechanical process with a known sequence of intermediary stages. AHR 
Accelerated Gradient Boosting (AGB) 
Gradient tree boosting is a prediction algorithm that sequentially produces a model in the form of linear combinations of decision trees, by solving an infinitedimensional optimization problem. We combine gradient boosting and Nesterov’s accelerated descent to design a new algorithm, which we call AGB (for Accelerated Gradient Boosting). Substantial numerical evidence is provided on both synthetic and reallife data sets to assess the excellent performance of the method in a large variety of prediction problems. It is empirically shown that AGB is much less sensitive to the shrinkage parameter and outputs predictors that are considerably more sparse in the number of trees, while retaining the exceptional performance of gradient boosting. 
Accelerated Gradient Boosting Machine (AGBM) 
Gradient Boosting Machine (GBM) is an extremely powerful supervised learning algorithm that is widely used in practice. GBM routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In this work, we propose Accelerated Gradient Boosting Machine (AGBM) by incorporating Nesterov’s acceleration techniques into the design of GBM. The difficulty in accelerating GBM lies in the fact that weak (inexact) learners are commonly used, and therefore the errors can accumulate in the momentum term. To overcome it, we design a ‘corrected pseudo residual’ and fit best weak learner to this corrected pseudo residual, in order to perform the zupdate. Thus, we are able to derive novel computational guarantees for AGBM. This is the first GBM type of algorithm with theoreticallyjustified accelerated convergence rate. Finally we demonstrate with a number of numerical experiments the effectiveness of AGBM over conventional GBM in obtaining a model with good training and/or testing data fidelity. 
Accelerated Gradient Descent (AGD) 
In the world of optimization, we have a space and a convex objective function f we wish to minimize. We have seen that gradient descent is a simple greedy algorithm that works to minimize the objective function at some convergence rate (in this post we shall remain in discrete time). But the world is always stranger than we think. Indeed, there is a phenomenon of acceleration in convex optimization, in which we can boost the performance of some gradientbased algorithms by subtly modifying their implementation. In particular, we will discuss accelerated gradient descent, proposed by Yurii Nesterov in 1983, which achieves a faster – and optimal – convergence rate under the same assumption as gradient descent. Acceleration has received renewed research interests in recent years, leading to many proposed interpretations and further generalizations. Nevertheless, there is still a sense of mystery to what acceleration is doing and why it works; these are the questions that we want to understand better. 
Accelerated Hierarchical Density Clustering  We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter. This makes accelerated HDBSCAN* the default choice for density based clustering. Library available at: https://…/hdbscan 
Accelerated MiniBatch kMeans  We propose an accelerated MiniBatch kmeans algorithm which combines three key improvements. The first is a modified center update which results in convergence to a local minimum in fewer iterations. The second is an adaptive increase of batchsize to meet an increasing requirement for centroid accuracy. The third is the inclusion of distance bounds based on the triangle inequality, which are used to eliminate distance calculations along the same lines as Elkan’s algorithm. The combination of the two latter constitutes a very powerful scheme to reuse computation already done over samples until statistical accuracy requires the use of additional data points. alineR 
Accelerated Proximal Boosting  Gradient boosting is a prediction method that iteratively combines weak learners to produce a complex and accurate model. From an optimization point of view, the learning procedure of gradient boosting mimics a gradient descent on a functional variable. This paper proposes to build upon the proximal point algorithm when the empirical risk to minimize is not differentiable. In addition, the novel boosting approach, called accelerated proximal boosting, benefits from Nesterov’s acceleration in the same way as gradient boosting [Biau et al., 2018]. Advantages of leveraging proximal methods for boosting are illustrated by numerical experiments on simulated and realworld data. In particular, we exhibit a favorable comparison over gradient boosting regarding convergence rate and prediction accuracy. 
Accelerated Proximal Gradient (APG) 
More Efficient Accelerated Proximal Algorithm for Nonconvex Problems allanvar 
Accelerated Proximal Stochastic Variance Reduced Gradient (ASVRG) 
This paper proposes an accelerated proximal stochastic variance reduced gradient (ASVRG) method, in which we design a simple and effective momentum acceleration trick. Unlike most existing accelerated stochastic variance reduction methods such as Katyusha, ASVRG has only one additional variable and one momentum parameter. Thus, ASVRG is much simpler than those methods, and has much lower periteration complexity. We prove that ASVRG achieves the best known oracle complexities for both strongly convex and nonstrongly convex objectives. In addition, we extend ASVRG to minibatch and nonsmooth settings. We also empirically verify our theoretical results and show that the performance of ASVRG is comparable with, and sometimes even better than that of the stateoftheart stochastic methods. 
Accelerated Sparse Discriminant Analysis  accSDA 
Accelerated Sparse Subspace Clustering  Stateoftheart algorithms for sparse subspace clustering perform spectral clustering on a similarity matrix typically obtained by representing each data point as a sparse combination of other points using either basis pursuit (BP) or orthogonal matching pursuit (OMP). BPbased methods are often prohibitive in practice while the performance of OMPbased schemes are unsatisfactory, especially in settings where data points are highly similar. In this paper, we propose a novel algorithm that exploits an accelerated variant of orthogonal leastsquares to efficiently find the underlying subspaces. We show that under certain conditions the proposed algorithm returns a subspacepreserving solution. Simulation results illustrate that the proposed method compares favorably with BPbased method in terms of running time while being significantly more accurate than OMPbased schemes. 
AcceptanceRejection Method  ➘ “Rejection Sampling” 
AccUDNN  Typically, Ultradeep neural network(UDNN) tends to yield highquality model, but its training process is usually resource intensive and timeconsuming. Modern GPU’s scarce DRAM capacity is the primary bottleneck that hinders the trainability and the training efficiency of UDNN. In this paper, we present ‘AccUDNN’, an accelerator that aims to make the utmost use of finite GPU memory resources to speed up the training process of UDNN. AccUDNN mainly includes two modules: memory optimizer and hyperparameter tuner. Memory optimizer develops a performancemodel guided dynamic swap out/in strategy, by offloading appropriate data to host memory, GPU memory footprint can be significantly slashed to overcome the restriction of trainability of UDNN. After applying the memory optimization strategy, hyperparameter tuner is designed to explore the efficiencyoptimal minibatch size and the matched learning rate. Evaluations demonstrate that AccUDNN cuts down the GPU memory requirement of ResNet152 from more than 24GB to 8GB. In turn, given 12GB GPU memory budget, the efficiencyoptimal minibatch size can reach 4.2x larger than original Caffe. Benefiting from better utilization of single GPU’s computing resources and fewer parameter synchronization of large minibatch size, 7.7x speedup is achieved by 8 GPUs’ cluster without any communication optimization and no accuracy losses. 
Accumulated Gradient Normalization  This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of firstorder gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to firstorder gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous EASGD and DynSGD, which we show empirically. 
Accumulated Local Effects (ALE) 
ALEPlot 
Accumulated Total Derivative Effects Plot  Interpreting a nonparametric regression model with many predictors is known to be a challenging problem. There has been renewed interest in this topic due to the extensive use of machine learning algorithms and the difficulty in understanding and explaining their inputoutput relationships. This paper develops a unified framework using a derivativebased approach for existing tools in the literature, including the partialdependence plots, marginal plots and accumulated effects plots. It proposes a new interpretation technique called the accumulated total derivative effects plot and demonstrates how its components can be used to develop extensive insights in complex regression models with correlated predictors. The techniques are illustrated through simulation results. 
Accumulation Chart  We propose a new graphical format for instantrunoff voting election results. We call this proposal an ‘accumulation chart.’ This model, a modification of standard bar charts, is easy to understand, clearly indicates the winner, depicts the instantrunoff algorithm, and summarizes the votes cast. Moreover, it includes the pedigree of each accumulated vote and gives a clear depiction of candidates’ coalitions. 
Accuracy  In the fields of science, engineering, industry, and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value. alphaOutlier 
Accuracy and Precision  Accuracy and precision are defined in terms of systematic and random errors. The more common definition associates accuracy with systematic errors and precision with random errors. Another definition, advanced by ISO, associates trueness with systematic errors and precision with random errors, and defines accuracy as the combination of both trueness and precision. ANOM 
Accuracy Booster  Convolution Neural Networks (CNN) have been extremely successful in solving intensive computer vision tasks. The convolutional filters used in CNNs have played a major role in this success, by extracting useful features from the inputs. Recently researchers have tried to boost the performance of CNNs by recalibrating the feature maps produced by these filters, e.g., SqueezeandExcitation Networks (SENets). These approaches have achieved better performance by \textit{Exciting} up the important channels or feature maps while diminishing the rest. However, in the process, architectural complexity has increased. We propose an architectural block that introduces much lower complexity than the existing methods of CNN performance boosting while performing significantly better than them. We carry out experiments on the CIFAR, ImageNet and MSCOCO datasets, and show that the proposed block can challenge the stateoftheart results. Our method boosts the ResNet50 architecture to perform comparably to the ResNet152 architecture, which is a three times deeper network, on classification. We also show experimentally that our method is not limited to classification but also generalizes well to other tasks such as object detection. 
Accuracy Paradox  The accuracy paradox for predictive analytics states that predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. It may be better to avoid the accuracy metric in favor of other metrics such as precision and recall. Accuracy is often the starting point for analyzing the quality of a predictive model, as well as an obvious criterion for prediction. Accuracy measures the ratio of correct predictions to the total number of cases evaluated. It may seem obvious that the ratio of correct predictions to cases should be a key metric. A predictive model may have high accuracy, but be useless. 
Accurate Tracking by Overlap Maximization (ATOM) 
While recent years have witnessed astonishing improvements in visual tracking robustness, the advancements in tracking accuracy have been severely limited. As the focus has been directed towards the development of powerful classifiers, the problem of accurate target state estimation has been largely overlooked. Instead, the majority of methods resort to simple multiscale search in order to estimate the target bounding box. We argue that this approach is fundamentally limited as target estimation is a complex task, requiring highlevel knowledge about the object. We thus address the problem of target state estimation in tracking. We propose a novel tracking architecture consisting of dedicated target estimation and classification components. Due to the complex nature of target estimation, we propose a component that can be entirely trained offline on largescale datasets. Our target estimation component is trained to predict the overlap between the target object and an estimated bounding box. By carefully integrating targetspecific information in the prediction, our approach achieves previously unseen bounding box accuracy. Furthermore, we integrate a classification component that is trained online to guarantee high discriminative power in the presence of distractors. Our final tracking framework, comprised of a unified multitask architecture, sets a new stateoftheart on four challenging benchmarks. On the largescale TrackingNet dataset, our tracker ATOM achieves a relative gain of 15%, while running at over 30 FPS. 
AccurateML  The growing demands of processing massive datasets have promoted irresistible trends of running machine learning applications on MapReduce. When processing large input data, it is often of greater values to produce fast and accurate enough approximate results than slow exact results. Existing techniques produce approximate results by processing parts of the input data, thus incurring large accuracy losses when using short job execution times, because all the skipped input data potentially contributes to result accuracy. We address this limitation by proposing AccurateML that aggregates information of input data in each map task to create small aggregated data points. These aggregated points enable all map tasks producing initial outputs quickly to save computation times and decrease the outputs’ size to reduce communication times. Our approach further identifies the parts of input data most related to result accuracy, thus first using these parts to improve the produced outputs to minimize accuracy losses. We evaluated AccurateML using real machine learning applications and datasets. The results show: (i) it reduces execution times by 30 times with small accuracy losses compared to exact results; (ii) when using the same execution times, it achieves 2.71 times reductions in accuracy losses compared to existing approximate processing techniques. 
AceKG  Most existing knowledge graphs (KGs) in academic domains suffer from problems of insufficient multirelational information, name ambiguity and improper data format for largescale machine pro cessing. In this paper, we present AceKG, a new largescale KG in academic domain. AceKG not only provides clean academic information, but also offers a largescale benchmark dataset for researchers to conduct challenging data mining projects including link prediction, community detection and scholar classification. Specifically, AceKG describes 3.13 billion triples of academic facts based on a consistent ontology, including necessary properties of papers, authors, fields of study, venues and institutes, as well as the relations among them. To enrich the proposed knowledge graph, we also perform entity alignment with existing databases and rulebased inference. Based on AceKG, we conduct experiments of three typical academic data mining tasks and evaluate several stateof theart knowledge embedding and network representation learning approaches on the benchmark datasets built from AceKG. Finally, we discuss several promising research directions that benefit from AceKG. 
AclNet  We propose an efficient endtoend convolutional neural network architecture, AclNet, for audio classification. When trained with our data augmentation and regularization, we achieved stateoftheart performance on the ESC50 corpus with 85:65% accuracy. Our network allows configurations such that memory and compute requirements are drastically reduced, and a tradeoff analysis of accuracy and complexity is presented. The analysis shows high accuracy at significantly reduced computational complexity compared to existing solutions. For example, a configuration with only 155k parameters and 49:3 million multiplyadds per second is 81:75%, exceeding human accuracy of 81:3%. This improved efficiency can enable alwayson inference in energyefficient platforms. 
Acquisition Thompson Sampling (ATS) 
This paper presents Acquisition Thompson Sampling (ATS), a novel algorithm for batch Bayesian Optimization (BO) based on the idea of sampling multiple acquisition functions from a stochastic process. We define this process through the dependency of the acquisition functions on a set of model parameters. ATS is conceptually simple, straightforward to implement and, unlike other batch BO methods, it can be employed to parallelize any sequential acquisition function. In order to improve performance for multimodal tasks, we show that ATS can be combined with existing techniques in order to realize different exploreexploit tradeoffs and take into account pending function evaluations. We present experiments on a variety of benchmark functions and on the hyperparameter optimization of a popular gradient boosting tree algorithm. These demonstrate the competitiveness of our algorithm with two stateoftheart batch BO methods, and its advantages to classical parallel Thompson Sampling for BO. 
Act2Vec  We introduce Act2Vec, a general framework for learning contextbased action representation for Reinforcement Learning. Representing actions in a vector space help reinforcement learning algorithms achieve better performance by grouping similar actions and utilizing relations between different actions. We show how prior knowledge of an environment can be extracted from demonstrations and injected into action vector representations that encode natural compatible behavior. We then use these for augmenting state representations as well as improving function approximation of Qvalues. We visualize and test action embeddings in three domains including a drawing task, a high dimensional navigation task, and the large action space domain of StarCraft II. 
Action Point Process VAE (APPVAE) 
We propose a novel probabilistic generative model for action sequences. The model is termed the Action Point Process VAE (APPVAE), a variational autoencoder that can capture the distribution over the times and categories of action sequences. Modeling the variety of possible action sequences is a challenge, which we show can be addressed via the APPVAE’s use of latent representations and nonlinear functions to parameterize distributions over which event is likely to occur next in a sequence and at what time. We empirically validate the efficacy of APPVAE for modeling action sequences on the MultiTHUMOS and Breakfast datasets. 
Action Schema Network (ASNet) 
In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weightsharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the network to be amortised over all problems in that domain. Further, we propose a training method which balances exploration and supervised training on small problems to produce a policy which remains robust when evaluated on larger problems. In experiments, we show that ASNet’s learning capability allows it to significantly outperform traditional nonlearning planners in several challenging domains. 
Action2Vec  We describe a novel crossmodal embedding space for actions, named Action2Vec, which combines linguistic cues from class labels with spatiotemporal features derived from video clips. Our approach uses a hierarchical recurrent network to capture the temporal structure of video features. We train our embedding using a joint loss that combines classification accuracy with similarity to Word2Vec semantics. We evaluate Action2Vec by performing zero shot action recognition and obtain state of the art results on three standard datasets. In addition, we present two novel analogy tests which quantify the extent to which our joint embedding captures distributional semantics. This is the first joint embedding space to combine verbs and action videos, and the first to be thoroughly evaluated with respect to its distributional semantics. 
ActionalStructural Graph Convolution Network (ASGCN) 
Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoderdecoder structure, called Alink inference module, to capture actionspecific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higherorder dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actionalstructural graph convolution network (ASGCN), which stacks actionalstructural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through selfsupervision. We validate ASGCN in action recognition using two skeleton data sets, NTURGB+D and Kinetics. The proposed ASGCN achieves consistently large improvement compared to the stateoftheart methods. As a side product, ASGCN also shows promising results for future pose prediction. 
ActionAttending Graphic Neural Network (A2GNN) 
The motion analysis of human skeletons is crucial for human action recognition, which is one of the most active topics in computer vision. In this paper, we propose a fully endtoend actionattending graphic neural network (A$^2$GNN) for skeletonbased action recognition, in which each irregular skeleton is structured as an undirected attribute graph. To extract highlevel semantic representation from skeletons, we perform the local spectral graph filtering on the constructed attribute graphs like the standard image convolution operation. Considering not all joints are informative for action analysis, we design an actionattending layer to detect those salient action units (AUs) by adaptively weighting skeletal joints. Herein the filtering responses are parameterized into a weighting function irrelevant to the order of input nodes. To further encode continuous motion variations, the deep features learnt from skeletal graphs are gathered along consecutive temporal slices and then fed into a recurrent gated network. Finally, the spectral graph filtering, actionattending and recurrent temporal encoding are integrated together to jointly train for the sake of robust action recognition as well as the intelligibility of human actions. To evaluate our A$^2$GNN, we conduct extensive experiments on four benchmark skeletonbased action datasets, including the largescale challenging NTU RGB+D dataset. The experimental results demonstrate that our network achieves the stateoftheart performances. 
ActionConvolution Deep QNetwork  ➘ “Deep Curiosity Loop” 
ActionElimination Deep QNetwork (AEDQN) 
Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is sometimes easier to learn which actions not to take. In this work, we propose the ActionElimination Deep QNetwork (AEDQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates suboptimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in textbased games with over a thousand discrete actions. 
ActionSpecific Deep Recurrent QNetwork (ADRQN) 
Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Actionspecific Deep Recurrent QNetwork (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an actionobservation pair. The time series of actionobservation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Qvalues as in conventional Deep QNetworks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering Atari games. 
Activation Atlases  Activation Atlases (in collaboration with Google researchers), is a new technique for visualizing what interactions between neurons can represent. 
Activation Ensemble  Many activation functions have been proposed in the past, but selecting an adequate one requires trial and error. We propose a new methodology of designing activation functions within a neural network at each layer. We call this technique an ‘activation ensemble’ because it allows the use of multiple activation functions at each layer. This is done by introducing additional variables, $\alpha$, at each activation layer of a network to allow for multiple activation functions to be active at each neuron. By design, activations with larger $\alpha$ values at a neuron is equivalent to having the largest magnitude. Hence, those higher magnitude activations are ‘chosen’ by the network. We implement the activation ensembles on a variety of datasets using an array of Feed Forward and Convolutional Neural Networks. By using the activation ensemble, we achieve superior results compared to traditional techniques. In addition, because of the flexibility of this methodology, we more deeply explore activation functions and the features that they capture. 
Active Adversarial Domain Adaptation (AADA) 
We propose an active learning approach for transferring representations across domains. Our approach, active adversarial domain adaptation (AADA), explores a duality between two related problems: adversarial domain alignment and importance sampling for adapting models across domains. The former uses a domain discriminative model to align domains, while the latter utilizes it to weigh samples to account for distribution shifts. Specifically, our importance weight promotes samples with large uncertainty in classification and diversity from labeled examples, thus serves as a sample selection scheme for active learning. We show that these two views can be unified in one framework for domain adaptation and transfer learning when the source domain has many labeled examples while the target domain does not. AADA provides significant improvements over finetuning based approaches and other sampling methods when the two domains are closely related. Results on challenging domain adaptation tasks, e.g., object detection, demonstrate that the advantage over baseline approaches is retained even after hundreds of examples being actively annotated. 
Active and Adaptive Sequential Learning  A framework is introduced for actively and adaptively solving a sequence of machine learning problems, which are changing in bounded manner from one time step to the next. An algorithm is developed that actively queries the labels of the most informative samples from an unlabeled data pool, and that adapts to the change by utilizing the information acquired in the previous steps. Our analysis shows that the proposed active learning algorithm based on stochastic gradient descent achieves a nearoptimal excess risk performance for maximum likelihood estimation. Furthermore, an estimator of the change in the learning problems using the active learning samples is constructed, which provides an adaptive sample size selection rule that guarantees the excess risk is bounded for sufficiently large number of time steps. Experiments with synthetic and real data are presented to validate our algorithm and theoretical results. 
Active Betweenness Cardinality  Centrality rankings such as degree, closeness, betweenness, Katz, PageRank, etc. are commonly used to identify critical nodes in a graph. These methods are based on two assumptions that restrict their wider applicability. First, they assume the exact topology of the network is available. Secondly, they do not take into account the activity over the network and only rely on its topology. However, in many applications, the network is autonomous, vast, and distributed, and it is hard to collect the exact topology. At the same time, the underlying pairwise activity between node pairs is not uniform and node criticality strongly depends on the activity on the underlying network. In this paper, we propose active betweenness cardinality, as a new measure, where the node criticalities are based on not the static structure, but the activity of the network. We show how this metric can be computed efficiently by using only local information for a given node and how we can find the most critical nodes starting from only a few nodes. We also show how this metric can be used to monitor a network and identify failed nodes.We present experimental results to show effectiveness by demonstrating how the failed nodes can be identified by measuring active betweenness cardinality of a few nodes in the system. 
Active Domain Randomization  Domain randomization is a popular technique for improving domain transfer, often used in a zeroshot setting when the target domain is unknown or cannot easily be used for training. In this work, we empirically examine the effects of domain randomization on agent generalization. Our experiments show that domain randomization may lead to suboptimal, highvariance policies, which we attribute to the uniform sampling of environment parameters. We propose Active Domain Randomization, a novel algorithm that learns a parameter sampling strategy. Our method looks for the most informative environment variations within the given randomization ranges by leveraging the discrepancies of policy rollouts in randomized and reference environment instances. We find that training more frequently on these instances leads to better overall agent generalization. In addition, when domain randomization and policy transfer fail, Active Domain Randomization offers more insight into the deficiencies of both the chosen parameter ranges and the learned policy, allowing for more focused debugging. Our experiments across various physicsbased simulated and a realrobot task show that this enhancement leads to more robust, consistent policies. 
Active Exploration in Markov Decision Processes  We introduce the active exploration problem in Markov decision processes (MDPs). Each state of the MDP is characterized by a random value and the learner should gather samples to estimate the mean value of each state as accurately as possible. Similarly to active exploration in multiarmed bandit (MAB), states may have different levels of noise, so that the higher the noise, the more samples are needed. As the noise level is initially unknown, we need to trade off the exploration of the environment to estimate the noise and the exploitation of these estimates to compute a policy maximizing the accuracy of the mean predictions. We introduce a novel learning algorithm to solve this problem showing that active exploration in MDPs may be significantly more difficult than in MAB. We also derive a heuristic procedure to mitigate the negative effect of slowly mixing policies. Finally, we validate our findings on simple numerical simulations. 
Active Function CrossEntropy Clustering (afCEC) 
Active function crossentropy clustering partitions the ndimensional data into the clusters by finding the parameters of the mixed generalized multivariate normal distribution, that optimally approximates the scattering of the data in the ndimensional space, whose density function is of the form: p_1*N(mi_1,^sigma_1,sigma_1,f_1)+…+p_k*N(mi_k,^sigma_k,sigma_k,f_k). The abovementioned generalization is performed by introducing so called ‘fadapted Gaussian densities’ (i.e. the ordinary Gaussian densities adapted by the ‘active function’). Additionally, the active function crossentropy clustering performs the automatic reduction of the unnecessary clusters. For more information please refer to P. Spurek, J. Tabor, K.Byrski, ‘Active function CrossEntropy Clustering’ (2017) <doi:10.1016/j.eswa.2016.12.011>. afCEC 
Active Information Store (AIS) 
The goal of the AIS project is to provide a scalable information repository to support dataintensive information worker and decision support applications. AIS can help businesses to enhance, structure and navigate data to discover information and relationships that are in existing data stores but not readily available or obvious. (see also HANA Graph Engine) 
Active Learning  Active learning is a special case of semisupervised machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design. There are situations in which unlabeled data is abundant but manually labeling is expensive. In such a scenario, learning algorithms can actively query the user/teacher for labels. This type of iterative supervised learning is called active learning. 
Active Learning in Python (ALiPy) 
Supervised machine learning methods usually require a large set of labeled examples for model training. However, in many real applications, there are plentiful unlabeled data but limited labeled data; and the acquisition of labels is costly. Active learning (AL) reduces the labeling cost by iteratively selecting the most valuable data to query their labels from the annotator. This article introduces a Python toobox ALiPy for active learning. ALiPy provides a module based implementation of active learning framework, which allows users to conveniently evaluate, compare and analyze the performance of active learning methods. In the toolbox, multiple options are available for each component of the learning framework, including data process, active selection, label query, results visualization, etc. In addition to the implementations of more than 20 stateoftheart active learning algorithms, ALiPy also supports users to easily configure and implement their own approaches under different active learning settings, such as AL for multilabel data, AL with noisy annotators, AL with different costs and so on. The toolbox is welldocumented and opensource on Github, and can be easily installed through PyPI. 
Active Learning with Partial Feedback  In the largescale multiclass setting, assigning labels often consists of answering multiple questions to drill down through a hierarchy of classes. Here, the labor required per annotation scales with the number of questions asked. We propose active learning with partial feedback. In this setup, the learner asks the annotator if a chosen example belongs to a (possibly composite) chosen class. The answer eliminates some classes, leaving the agent with a partial label. Success requires (i) a sampling strategy to choose (example, class) pairs, and (ii) learning from partial labels. Experiments on the TinyImageNet dataset demonstrate that our most effective method achieves a 21% relative improvement in accuracy for a 200k binary question budget. Experiments on the TinyImageNet dataset demonstrate that our most effective method achieves a 26% relative improvement (8.1% absolute) in top1 classification accuracy for a 250k (or 30%) binary question budget, compared to a naive baseline. Our work may also impact traditional data annotation. For example, our best method fully annotates TinyImageNet with only 482k (with EDC though, ERC is 491) binary questions (vs 827k for naive method). 
Active Long Term Memory Network (ALTM) 
Continual Learning in artificial neural networks suffers from interference and forgetting when different tasks are learned sequentially. This paper introduces the Active Long Term Memory Networks (ALTM), a model of sequential multitask deep learning that is able to maintain previously learned association between sensory input and behavioral output while acquiring knew knowledge. ALTM exploits the nonconvex nature of deep neural networks and actively maintains knowledge of previously learned, inactive tasks using a distillation loss. Distortions of the learned inputoutput map are penalized but hidden layers are free to transverse towards new local optima that are more favorable for the multitask objective. We reframe the McClelland’s seminal Hippocampal theory with respect to Catastrophic Inference (CI) behavior exhibited by modern deep architectures trained with backpropagation and inhomogeneous sampling of latent factors across epochs. We present empirical results of nontrivial CI during continual learning in Deep Linear Networks trained on the same task, in Convolutional Neural Networks when the task shifts from predicting semantic to graphical factors and during domain adaptation from simple to complex environments. We present results of the ALTM model’s ability to maintain viewpoint recognition learned in the highly controlled iLab20M dataset with 10 object categories and 88 camera viewpoints, while adapting to the unstructured domain of Imagenet with 1,000 object categories. 
Active Neural Localizer  Localization is the problem of estimating the location of an autonomous agent from an observation and a map of the environment. Traditional methods of localization, which filter the belief based on the observations, are suboptimal in the number of steps required, as they do not decide the actions taken by the agent. We propose ‘Active Neural Localizer’, a fully differentiable neural network that learns to localize accurately and efficiently. The proposed model incorporates ideas of traditional filteringbased localization methods, by using a structured belief of the state with multiplicative interactions to propagate belief, and combines it with a policy model to localize accurately while minimizing the number of steps required for localization. Active Neural Localizer is trained endtoend with reinforcement learning. We use a variety of simulation environments for our experiments which include random 2D mazes, random mazes in the Doom game engine and a photorealistic environment in the Unreal game engine. The results on the 2D environments show the effectiveness of the learned policy in an idealistic setting while results on the 3D environments demonstrate the model’s capability of learning the policy and perceptual model jointly from rawpixel based RGB observations. We also show that a model trained on random textures in the Doom environment generalizes well to a photorealistic office space environment in the Unreal engine. 
Active Rotating Filters (ARF) 
Deep Convolution Neural Networks (DCNNs) are capable of learning unprecedentedly effective image representations. However, their ability in handling significant local and global image rotations remains limited. In this paper, we propose Active Rotating Filters (ARFs) that actively rotate during convolution and produce feature maps with location and orientation explicitly encoded. An ARF acts as a virtual filter bank containing the filter itself and its multiple unmaterialised rotated versions. During backpropagation, an ARF is collectively updated using errors from all its rotated versions. DCNNs using ARFs, referred to as Oriented Response Networks (ORNs), can produce withinclass rotationinvariant deep features while maintaining interclass discrimination for classification tasks. The oriented response produced by ORNs can also be used for image and object orientation estimation tasks. Over multiple stateoftheart DCNN architectures, such as VGG, ResNet, and STN, we consistently observe that replacing regular filters with the proposed ARFs leads to significant reduction in the number of network parameters and improvement in classification performance. We report the best results on several commonly used benchmarks. 
Active Sampler  Recent years have witnessed amazing outcomes from ‘Big Models’ trained by ‘Big Data’. Most popular algorithms for model training are iterative. Due to the surging volumes of data, we can usually afford to process only a fraction of the training data in each iteration. Typically, the data are either uniformly sampled or sequentially accessed. In this paper, we study how the data access pattern can affect model training. We propose an Active Sampler algorithm, where training data with more ‘learning value’ to the model are sampled more frequently. The goal is to focus training effort on valuable instances near the classification boundaries, rather than evident cases, noisy data or outliers. We show the correctness and optimality of Active Sampler in theory, and then develop a lightweight vectorized implementation. Active Sampler is orthogonal to most approaches optimizing the efficiency of largescale data analytics, and can be applied to most analytics models trained by stochastic gradient descent (SGD) algorithm. Extensive experimental evaluations demonstrate that Active Sampler can speed up the training procedure of SVM, feature selection and deep learning, for comparable training quality by 1.62.2x. 
Active Scene Learning  Sketch recognition allows natural and efficient interaction in penbased interfaces. A key obstacle to building accurate sketch recognizers has been the difficulty of creating large amounts of annotated training data. Several authors have attempted to address this issue by creating synthetic data, and by building tools that support efficient annotation. Two prominent sets of approaches stand out from the rest of the crowd. They use interim classifiers trained with a small set of labeled data to aid the labeling of the remainder of the data. The first set of approaches uses a classifier trained with a partially labeled dataset to automatically label unlabeled instances. The others, based on active learning, save annotation effort by giving priority to labeling informative data instances. The former is suboptimal since it doesn’t prioritize the order of labeling to favor informative instances, while the latter makes the strong assumption that unlabeled data comes in an already segmented form (i.e. the ink in the training data is already assembled into groups forming isolated object instances). In this paper, we propose an active learning framework that combines the strengths of these methods, while addressing their weaknesses. In particular, we propose two methods for deciding how batches of unsegmented sketch scenes should be labeled. The first method, scenewise selection, assesses the informativeness of each drawing (sketch scene) as a whole, and asks the user to annotate all objects in the drawing. The latter, segmentwise selection, attempts more precise targeting to locate informative fragments of drawings for user labeling. We show that both selection schemes outperform random selection. Furthermore, we demonstrate that precise targeting yields superior performance. Overall, our approach allows reaching top accuracy figures with up to 30% savings in annotation cost. 
Active Semisupervised Transfer Learning (ASTL) 
Singletrial classification of eventrelated potentials in electroencephalogram (EEG) signals is a very important paradigm of braincomputer interface (BCI). Because of individual differences, usually some subjectspecific calibration data are required to tailor the classifier for each subject. Transfer learning has been extensively used to reduce such calibration data requirement, by making use of auxiliary data from similar/relevant subjects/tasks. However, all previous research assumes that all auxiliary data have been labeled. This paper considers a more general scenario, in which part of the auxiliary data could be unlabeled. We propose active semisupervised transfer learning (ASTL) for offline BCI calibration, which integrates active learning, semisupervised learning, and transfer learning. Using a visual evoked potential oddball task and three different EEG headsets, we demonstrate that ASTL can achieve consistently good performance across subjects and headsets, and it outperforms some stateoftheart approaches in the literature. 
Active Testing  Much recent work on visual recognition aims to scale up learning to massive, noisilyannotated datasets. We address the problem of scalingup the evaluation of such models to largescale datasets with noisy labels. Current protocols for doing so require a human user to either vet (reannotate) a small fraction of the test set and ignore the rest, or else correct errors in annotation as they are found through manual inspection of results. In this work, we reformulate the problem as one of active testing, and examine strategies for efficiently querying a user so as to obtain an accurate performance estimate with minimal vetting. We demonstrate the effectiveness of our proposed active testing framework on estimating two performance metrics, Precision@K and mean Average Precision, for two popular computer vision tasks, multilabel classification and instance segmentation. We further show that our approach is able to save significant human annotation effort and is more robust than alternative evaluation protocols. 
Active Transfer Learning Network  Deep learning has recently attracted significant attention in the field of hyperspectral images (HSIs) classification. However, the construction of an efficient deep neural network (DNN) mostly relies on a large number of labeled samples being available. To address this problem, this paper proposes a unified deep network, combined with active transfer learning that can be welltrained for HSIs classification using only minimally labeled training data. More specifically, deep joint spectralspatial feature is first extracted through hierarchical stacked sparse autoencoder (SSAE) networks. Active transfer learning is then exploited to transfer the pretrained SSAE network and the limited training samples from the source domain to the target domain, where the SSAE network is subsequently finetuned using the limited labeled samples selected from both source and target domain by corresponding active learning strategies. The advantages of our proposed method are threefold: 1) the network can be effectively trained using only limited labeled samples with the help of novel active learning strategies; 2) the network is flexible and scalable enough to function across various transfer situations, including crossdataset and intraimage; 3) the learned deep joint spectralspatial feature representation is more generic and robust than many joint spectralspatial feature representation. Extensive comparative evaluations demonstrate that our proposed method significantly outperforms many stateoftheart approaches, including both traditional and deep networkbased methods, on three popular datasets. 
ActiveClean  Data cleaning is often an important step to ensure that predictive models, such as regression and classification, are not affected by systematic errors such as inconsistent, outofdate, or outlier data. Identifying dirty data is often a manual and iterative process, and can be challenging on large datasets. However, many data cleaning workflows can introduce subtle biases into the training processes due to violation of independence assumptions. We propose ActiveClean, a progressive cleaning approach where the model is updated incrementally instead of retraining and can guarantee accuracy on partially cleaned data. ActiveClean supports a popular class of models called convex loss models (e.g., linear regression and SVMs). ActiveClean also leverages the structure of a user’s model to prioritize cleaning those records likely to affect the results. We evaluate ActiveClean on five realworld datasets UCI Adult, UCI EEG, MNIST, Dollars For Docs, and WorldBank with both real and synthetic errors. Our results suggest that our proposed optimizations can improve model accuracy by upto 2.5x for the same amount of data cleaned. Furthermore for a fixed cleaning budget and on all real dirty datasets, ActiveClean returns more accurate models than uniform sampling and Active Learning. 
Activities, States, Events, and Their Relations (ASER) 
Understanding human’s language requires complex world knowledge. However, existing largescale knowledge graphs mainly focus on knowledge about entities while ignoring knowledge about activities, states, or events, which are used to describe how entities or things act in the real world. To fill this gap, we develop ASER (activities, states, events, and their relations), a largescale eventuality knowledge graph extracted from more than 11billiontoken unstructured textual data. ASER contains 15 relation types belonging to five categories, 194million unique eventualities, and 64million unique edges among them. Both human and extrinsic evaluations demonstrate the quality and effectiveness of ASER. 
activity2vec  Sufficient physical activity and restful sleep play a major role in the prevention and cure of many chronic conditions. Being able to proactively screen and monitor such chronic conditions would be a big step forward for overall health. The rapid increase in the popularity of wearable devices provides a significant new source, making it possible to track the user’s lifestyle realtime. In this paper, we propose a novel unsupervised representation learning technique called activity2vec that learns and ‘summarizes’ the discretevalued activity timeseries. It learns the representations with three components: (i) the cooccurrence and magnitude of the activity levels in a timesegment, (ii) neighboring context of the timesegment, and (iii) promoting subjectinvariance with adversarial training. We evaluate our method on four disorder prediction tasks using linear classifiers. Empirical evaluation demonstrates that our proposed method scales and performs better than many strong baselines. The adversarial regime helps improve the generalizability of our representations by promoting subject invariant features. We also show that using the representations at the level of a day works the best since human activity is structured in terms of daily routines 
Actor Ensemble Algorithm (ACE) 
In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning. In ACE, we use actor ensemble (i.e., multiple actors) to search the global maxima of the critic. Besides the ensemble perspective, we also formulate ACE in the option framework by extending the optioncritic architecture with deterministic intraoption policies, revealing a relationship between ensemble and options. Furthermore, we perform a lookahead tree search with those actors and a learned value prediction model, resulting in a refined value estimation. We demonstrate a significant performance boost of ACE over DDPG and its variants in challenging physical robot simulators. 
Actuarial Science  Actuarial science is the discipline that applies mathematical and statistical methods to assess risk in insurance, finance and other industries and professions. Actuaries are professionals who are qualified in this field through education and experience. In many countries, actuaries must demonstrate their competence by passing a series of rigorous professional examinations. Actuarial science includes a number of interrelated subjects, including probability, mathematics, statistics, finance, economics, financial economics, and computer programming. Historically, actuarial science used deterministic models in the construction of tables and premiums. The science has gone through revolutionary changes during the last 30 years due to the proliferation of high speed computers and the union of stochastic actuarial models with modern financial theory (Frees 1990). Many universities have undergraduate and graduate degree programs in actuarial science. In 2010, a study published by job search website CareerCast ranked actuary as the #1 job in the United States (Needleman 2010). The study used five key criteria to rank jobs: environment, income, employment outlook, physical demands, and stress. A similar study by U.S. News & World Report in 2006 included actuaries among the 25 Best Professions that it expects will be in great demand in the future (Nemko 2006). apc 
Acumos  Applying Machine Learning (ML) to business applications for automation usually faces difficulties when integrating diverse ML dependencies and services, mainly because of the lack of a common ML framework. In most cases, the ML models are developed for applications which are targeted for specific business domain use cases, leading to duplicated effort, and making reuse impossible. This paper presents Acumos, an open platform capable of packaging ML models into portable containerized microservices which can be easily shared via the platform’s catalog, and can be integrated into various business applications. We present a case study of packaging sentiment analysis and classification ML models via the Acumos platform, permitting easy sharing with others. We demonstrate that the Acumos platform reduces the technical burden on application developers when applying machine learning models to their business applications. Furthermore, the platform allows the reuse of readily available ML microservices in various business domains. 
AdaBoost+SVM  The AdaBoost algorithm has the superiority of resisting overfitting. Understanding the mysteries of this phenomena is a very fascinating fundamental theoretical problem. Many studies are devoted to explaining it from statistical view and margin theory. In this paper, we illustrate it from feature learning viewpoint, and propose the AdaBoost+SVM algorithm, which can explain the resistant to overfitting of AdaBoost directly and easily to understand. Firstly, we adopt the AdaBoost algorithm to learn the base classifiers. Then, instead of directly weighted combination the base classifiers, we regard them as features and input them to SVM classifier. With this, the new coefficient and bias can be obtained, which can be used to construct the final classifier. We explain the rationality of this and illustrate the theorem that when the dimension of these features increases, the performance of SVM would not be worse, which can explain the resistant to overfitting of AdaBoost. 
AdaCos  The cosinebased softmax losses and their variants achieve great success in deep learning based face recognition. However, hyperparameter settings in these losses have significant influences on the optimization path as well as the final recognition performance. Manually tuning those hyperparameters heavily relies on user experience and requires many training tricks. In this paper, we investigate in depth the effects of two important hyperparameters of cosinebased softmax losses, the scale parameter and angular margin parameter, by analyzing how they modulate the predicted classification probability. Based on these analysis, we propose a novel cosinebased softmax loss, AdaCos, which is hyperparameterfree and leverages an adaptive scale parameter to automatically strengthen the training supervisions during the training process. We apply the proposed AdaCos loss to largescale face verification and identification datasets, including LFW, MegaFace, and IJBC 1:1 Verification. Our results show that training deep neural networks with the AdaCos loss is stable and able to achieve high face recognition accuracy. Our method outperforms stateoftheart softmax losses on all the three datasets. 
AdaDepth  Supervised deep learning methods have shown promising results for the task of monocular depth estimation; but acquiring ground truth is costly, and prone to noise as well as inaccuracies. While synthetic datasets have been used to circumvent above problems, the resultant models do not generalize well to natural scenes due to the inherent domain shift. Recent adversarial approaches for domain adaption have performed well in mitigating the differences between the source and target domains. But these methods are mostly limited to a classification setup and do not scale well for fullyconvolutional architectures. In this work, we propose AdaDepth – an unsupervised domain adaptation strategy for the pixelwise regression task of monocular depth estimation. The proposed approach is devoid of above limitations through a) adversarial learning and b) explicit imposition of content consistency on the adapted target representation. Our unsupervised approach performs competitively with other established approaches on depth estimation tasks and achieves stateoftheart results in a semisupervised setting. 
AdaFlow  We tackle unsupervised anomaly detection (UAD), a problem of detecting data that significantly differ from normal data. UAD is typically solved by using density estimation. Recently, deep neural network (DNN)based density estimators, such as Normalizing Flows, have been attracting attention. However, one of their drawbacks is the difficulty in adapting them to the change in the normal data’s distribution. To address this difficulty, we propose AdaFlow, a new DNNbased density estimator that can be easily adapted to the change of the distribution. AdaFlow is a unified model of a Normalizing Flow and Adaptive BatchNormalizations, a module that enables DNNs to adapt to new distributions. AdaFlow can be adapted to a new distribution by just conducting forward propagation once per sample; hence, it can be used on devices that have limited computational resources. We have confirmed the effectiveness of the proposed model through an anomaly detection in a sound task. We also propose a method of applying AdaFlow to the unpaired crossdomain translation problem, in which one has to train a crossdomain translation model with only unpaired samples. We have confirmed that our model can be used for the crossdomain translation problem through experiments on image datasets. 
AdaGAN  Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to train and can suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. We propose an iterative procedure, called AdaGAN, where at every step we add a new component into a mixture model by running a GAN algorithm on a reweighted sample. This is inspired by boosting algorithms, where many potentially weak individual predictors are greedily aggregated to form a strong composite predictor. We prove that such an incremental procedure leads to convergence to the true distribution in a finite number of steps if each step is optimal, and convergence at an exponential rate otherwise. We also illustrate experimentally that this procedure addresses the problem of missing modes. 
ADAGE  The ability to generalize across visual domains is crucial for the robustness of visual recognition systems in the wild. Several works have been dedicated to close the gap between a single labeled source domain and a target domain with transductive access to its data. In this paper we focus on the wider domain generalization task involving multiple sources and seamlessly extending to unsupervised domain adaptation when unlabeled target samples are available at training time. We propose a hybrid architecture that we name ADAGE: it gracefully maps different source data towards an agnostic visual domain through pixeladaptation based on a novel incremental architecture, and closes the remaining domain gap through feature adaptation. Both the adaptive processes are guided by adversarial learning. Extensive experiments show remarkable improvements compared to the state of the art. 
AdaGraph  The ability to categorize is a cornerstone of visual intelligence, and a key functionality for artificial, autonomous visual machines. This problem will never be solved without algorithms able to adapt and generalize across visual domains. Within the context of domain adaptation and generalization, this paper focuses on the predictive domain adaptation scenario, namely the case where no target data are available and the system has to learn to generalize from annotated source images plus unlabeled samples with associated metadata from auxiliary domains. Our contributionis the first deep architecture that tackles predictive domainadaptation, able to leverage over the information broughtby the auxiliary domains through a graph. Moreover, we present a simple yet effective strategy that allows us to take advantage of the incoming target data at test time, in a continuous domain adaptation scenario. Experiments on three benchmark databases support the value of our approach. 
ADAM  We introduce Adam, an algorithm for firstorder gradientbased optimization of stochastic objective functions, based on adaptive estimates of lowerorder moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for nonstationary objectives and problems with very noisy and/or sparse gradients. The hyperparameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm. ➘ “GENESYS” SAdam: A Variant of Adam for Strongly Convex Functions 
AdaNet  AdaNet is a lightweight and scalable TensorFlow AutoML framework for training and deploying adaptive neural networks using the AdaNet algorithm [Cortes et al. ICML 2017]. AdaNet combines several learned subnetworks in order to mitigate the complexity inherent in designing effective neural networks. AdaNet: Adaptive Structural Learning of Artificial Neural Networks 
Adaptable and Automated Shrinkage Estimation (AAShNet) 
This paper considers improved forecasting in possibly nonlinear dynamic settings, with highdimension predictors (‘big data’ environments). To overcome the curse of dimensionality and manage data and model complexity, we examine shrinkage estimation of a backpropagation algorithm of a deep neural net with skiplayer connections. We expressly include both linear and nonlinear components. This is a highdimensional learning approach including both sparsity L1 and smoothness L2 penalties, allowing highdimensionality and nonlinearity to be accommodated in one step. This approach selects significant predictors as well as the topology of the neural network. We estimate optimal values of shrinkage hyperparameters by incorporating a gradientbased optimization technique resulting in robust predictions with improved reproducibility. The latter has been an issue in some approaches. This is statistically interpretable and unravels some network structure, commonly left to a black box. An additional advantage is that the nonlinear part tends to get pruned if the underlying process is linear. In an application to forecasting equity returns, the proposed approach captures nonlinear dynamics between equities to enhance forecast performance. It offers an appreciable improvement over current univariate and multivariate models by RMSE and actual portfolio performance. 
Adapted Geographically Weighted Lasso (AdaGWL) 
Ridership estimation at station level plays a critical role in metro transportation planning. Among various existing ridership estimation methods, direct demand model has been recognized as an effective approach. However, existing direct demand models including Geographically Weighted Regression (GWR) have rarely included local model selection in ridership estimation. In practice, acquiring insights into metro ridership under multiple influencing factors from a local perspective is important for passenger volume management and transportation planning operations adapting to local conditions. In this study, we propose an Adapted Geographically Weighted Lasso (AdaGWL) framework for modelling metro ridership, which performs regressioncoefficient shrinkage and local model selection. It takes metro network connection intermedia into account and adopts networkbased distance metric instead of Euclideanbased distance metric, making it socalled adapted to the context of metro networks. The realworld case of Shenzhen Metro is used to validate the superiority of our proposed model. The results show that the AdaGWL model performs the best compared with the global model (Ordinary Least Square (OLS), GWR, GWR calibrated with networkbased distance metric and GWL in terms of estimation error of the dependent variable and goodnessoffit. Through understanding the variation of each coefficient across space (elasticities) and variables selection of each station, it provides more realistic conclusions based on local analysis. Besides, through clustering analysis of the stations according to the regression coefficients, clusters’ functional characteristics are found to be in compliance with the facts of the functional land use policy of Shenzhen. These results of the proposed AdaGWL model demonstrate a great spatial explanatory power in transportation planning. 
Adapted Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) 
We introduce a class of Adapted Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) algorithms where the underlying Markov kernel is allowed to be changed based on the whole available chain output but only at specific time points separated by an increasing number of iterations. The main motivation is the ease of analysis of such algorithms. Under the assumption of either simultaneous or (weaker) local simultaneous geometric drift condition, or simultaneous polynomial drift we prove the $L_2$convergence, Weak and Strong Laws of Large Numbers (WLLN, SLLN), Central Limit Theorem (CLT), and discuss how our approach extends the existing results. We argue that many of the known Adaptive MCMC algorithms may be transformed into the corresponding Air versions, and provide an empirical evidence that performance of the Air version stays virtually the same. 
AdapterNet  Deep neural networks have demonstrated impressive performance in various machine learning tasks. However, they are notoriously sensitive to changes in data distribution. Often, even a slight change in the distribution can lead to drastic performance reduction. Artificially augmenting the data may help to some extent, but in most cases, fails to achieve model invariance to the data distribution. Some examples where this subclass of domain adaptation can be valuable are various imaging modalities such as thermal imaging, Xray, ultrasound, and MRI, where changes in acquisition parameters or acquisition device manufacturer will result in different representation of the same input. Our work shows that standard finetuning fails to adapt the model in certain important cases. We propose a novel method of adapting to a new data source, and demonstrate near perfect adaptation on a customized ImageNet benchmark. 
Adapting to Changing Environment (ACE) 
Deep neural networks exhibit exceptional accuracy when they are trained and tested on the same data distributions. However, neural classifiers are often extremely brittle when confronted with domain shift—changes in the input distribution that occur over time. We present ACE, a framework for semantic segmentation that dynamically adapts to changing environments over the time. By aligning the distribution of labeled training data from the original source domain with the distribution of incoming data in a shifted domain, ACE synthesizes labeled training data for environments as it sees them. This stylized data is then used to update a segmentation model so that it performs well in new environments. To avoid forgetting knowledge from past environments, we introduce a memory that stores feature statistics from previously seen domains. These statistics can be used to replay images in any of the previously observed domains, thus preventing catastrophic forgetting. In addition to standard batch training using stochastic gradient decent (SGD), we also experiment with fast adaptation methods based on adaptive metalearning. Extensive experiments are conducted on two datasets from SYNTHIA, the results demonstrate the effectiveness of the proposed approach when adapting to a number of tasks. 
ADaPTION Toolbox  Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are useful for many practical tasks in machine learning. Synaptic weights, as well as neuron activation functions within the deep network are typically stored with highprecision formats, e.g. 32 bit floating point. However, since storage capacity is limited and each memory access consumes power, both storage capacity and memory access are two crucial factors in these networks. Here we present a method and present the ADaPTION toolbox to extend the popular deep learning library Caffe to support training of deep CNNs with reduced numerical precision of weights and activations using fixed point notation. ADaPTION includes tools to measure the dynamic range of weights and activations. Using the ADaPTION tools, we quantized several CNNs including VGG16 down to 16bit weights and activations with only 0.8% drop in Top1 accuracy. The quantization, especially of the activations, leads to increase of up to 50% of sparsity especially in early and intermediate layers, which we exploit to skip multiplications with zero, thus performing faster and computationally cheaper inference. 
ADaptive Algorithm for Betweenness via Random Approximation (KADABRA) 
We present KADABRA, a new algorithm to approximate betweenness centrality in directed and undirected graphs, which significantly outperforms all previous approaches on realworld complex networks. The efficiency of the new algorithm relies on two new theoretical contribution, of independent interest. The first contribution focuses on sampling shortest paths, a subroutine used by most algorithms that approximate betweenness centrality. We show that, on realistic random graph models, we can perform this task in time $E^{\frac{1}{2}+o(1)}$ with high probability, obtaining a significant speedup with respect to the $\Theta(E)$ worstcase performance. We experimentally show that this new technique achieves similar speedups on realworld complex networks, as well. The second contribution is a new rigorous application of the adaptive sampling technique. This approach decreases the total number of shortest paths that need to be sampled to compute all betweenness centralities with a given absolute error, and it also handles more general problems, such as computing the $k$ most central nodes. Furthermore, our analysis is general, and it might be extended to other settings, as well. 
Adaptive Automatic Threat Recognition (AATR) 
This work addresses the question whether it is possible to design a computervision based automatic threat recognition (ATR) system so that it can adapt to changing specifications of a threat without having to create a new ATR each time. The changes in threat specifications, which may be warranted by intelligence reports and world events, are typically regarding the physical characteristics of what constitutes a threat: its material composition, its shape, its method of concealment, etc. Here we present our design of an AATR system (Adaptive ATR) that can adapt to changing specifications in materials characterization (meaning density, as measured by its xray attenuation coefficient), its mass, and its thickness. Our design uses a twostage cascaded approach, in which the first stage is characterized by a high recall rate over the entire range of possibilities for the threat parameters that are allowed to change. The purpose of the second stage is to then finetune the performance of the overall system for the current threat specifications. The computational effort for this finetuning for achieving a desired PD/PFA rate is far less than what it would take to create a new classifier with the same overall performance for the new set of threat specifications. 
Adaptive BAll COver for Classification (ABACOC) 
Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments. Moreover, the learning system must be parameterless —traditional tuning methods are problematic in streaming settings— and avoid requiring prior knowledge of the number of distinct class labels occurring in the stream. In this paper, we introduce a new algorithmic approach for nonparametric learning in data streams. Our approach addresses all above mentioned challenges by learning a model that covers the input space using simple local classifiers. The distribution of these classifiers dynamically adapts to the local (unknown) complexity of the classification problem, thus achieving a good balance between model complexity and predictive accuracy. We design four variants of our approach of increasing adaptivity. By means of an extensive empirical evaluation against standard nonparametric baselines, we show stateoftheart results in terms of accuracy versus model size. For the variant that imposes a strict bound on the model size, we show better performance against all other methods measured at the same model size value. Our empirical analysis is complemented by a theoretical performance guarantee which does not rely on any stochastic assumption on the source generating the stream. ART 
Adaptive Batch Size (AdaBatch) 
Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch size for all epochs, adaptively increases the batch size during the training process. Our method delivers the convergence rate of small batch sizes while achieving performance similar to large batch sizes. We analyse our approach using the standard AlexNet, ResNet, and VGG networks operating on the popular CIFAR10, CIFAR100, and ImageNet datasets. Our results demonstrate that learning with adaptive batch sizes can improve performance by factors of up to 6.25 on 4 NVIDIA Tesla P100 GPUs while changing accuracy by less than 1% relative to training with fixed batch sizes. 
Adaptive Blending Unit (ABU) 
The most widely used activation functions in current deep feedforward neural networks are rectified linear units (ReLU), and many alternatives have been successfully applied, as well. However, none of the alternatives have managed to consistently outperform the rest and there is no unified theory connecting properties of the task and network with properties of activation functions for most efficient training. A possible solution is to have the network learn its preferred activation functions. In this work, we introduce Adaptive Blending Units (ABUs), a trainable linear combination of a set of activation functions. Since ABUs learn the shape, as well as the overall scaling of the activation function, we also analyze the effects of adaptive scaling in common activation functions. We experimentally demonstrate advantages of both adaptive scaling and ABUs over common activation functions across a set of systematically varied network specifications. We further show that adaptive scaling works by mitigating covariate shifts during training, and that the observed advantages in performance of ABUs likewise rely largely on the activation function’s ability to adapt over the course of training. 
Adaptive Boosting (AdaBoost) 
AdaBoost, short for “Adaptive Boosting”, is a machine learning metaalgorithm formulated by Yoav Freund and Robert Schapire who won the prestigious “Gödel Prize” in 2003 for their work. It can be used in conjunction with many other types of learning algorithms to improve their performance. The output of the other learning algorithms (‘weak learners’) is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. In some problems, however, it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (i.e., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner. While every learning algorithm will tend to suit some problem types better than others, and will typically have many different parameters and configurations to be adjusted before achieving optimal performance on a dataset, AdaBoost (with decision trees as the weak learners) is often referred to as the best outofthebox classifier. When used with decision tree learning, information gathered at each stage of the AdaBoost algorithm about the relative ‘hardness’ of each training sample is fed into the tree growing algorithm such that later trees tend to focus on harder to classify examples. artfima 
Adaptive Boosting LSTM with Attention  Relation extraction has been widely studied to extract new relational facts from open corpus. Previous relation extraction methods are faced with the problem of wrong labels and noisy data, which substantially decrease the performance of the model. In this paper, we propose an ensemble neural network model – Adaptive Boosting LSTMs with Attention, to more effectively perform relation extraction. Specifically, our model first employs the recursive neural network LSTMs to embed each sentence. Then we import attention into LSTMs by considering that the words in a sentence do not contribute equally to the semantic meaning of the sentence. Next via adaptive boosting, we build strategically several such neural classifiers. By ensembling multiple such LSTM classifiers with adaptive boosting, we could build a more effective and robust joint ensemble neural networks based relation extractor. Experiment results on real dataset demonstrate the superior performance of the proposed model, improving F1score by about 8% compared to the stateoftheart models. The code of this work is publicly available on https://…/re. 
Adaptive Collaborative Similarity Learning (ACSL) 
In this paper, we investigate the research problem of unsupervised multiview feature selection. Conventional solutions first simply combine multiple preconstructed viewspecific similarity structures into a collaborative similarity structure, and then perform the subsequent feature selection. These two processes are separate and independent. The collaborative similarity structure remains fixed during feature selection. Further, the simple undirected view combination may adversely reduce the reliability of the ultimate similarity structure for feature selection, as the viewspecific similarity structures generally involve noises and outlying entries. To alleviate these problems, we propose an adaptive collaborative similarity learning (ACSL) for multiview feature selection. We propose to dynamically learn the collaborative similarity structure, and further integrate it with the ultimate feature selection into a unified framework. Moreover, a reasonable rank constraint is devised to adaptively learn an ideal collaborative similarity structure with proper similarity combination weights and desirable neighbor assignment, both of which could positively facilitate the feature selection. An effective solution guaranteed with the proved convergence is derived to iteratively tackle the formulated optimization problem. Experiments demonstrate the superiority of the proposed approach. 
Adaptive Computation Steps (ACS) 
In this paper, we present Adaptive Computation Steps (ACS) algorithm, which enables endtoend speech recognition models to dynamically decide how many frames should be processed to predict a linguistic output. The ACS equipped model follows the classic encoderdecoder framework, while unlike the attentionbased models, it produces alignments independently at the encoder side using the correlation between adjacent frames. Thus, predictions can be made as soon as sufficient interframe information is received, which makes the model applicable in online cases. We verify the ACS algorithm on an opensource Mandarin speech corpus AIShell1, and it achieves a parity of 35.2% CER with the attentionbased model in the online occasion. To fully demonstrate the advantage of ACS algorithm, offline experiments are conducted, in which our ACS model achieves 21.6% and 20.1% CERs with and without language model, both outperforming the attentionbased counterpart. Index Terms: Adaptive Computation Steps, EncoderDecoder Recurrent Neural Networks, EndtoEnd Training. 
Adaptive Computation Time (ACT) 
➘ “Universal Transformer” 
Adaptive Coordinate Descent  Adaptive coordinate descent is an extension of the coordinate descent algorithm to nonseparable optimization. The adaptive coordinate descent approach gradually builds a transformation of the coordinate system such that the new coordinates are as decorrelated as possible with respect to the objective function. The adaptive coordinate descent was shown to be competitive to the stateoftheart evolutionary algorithms and has the following invariance properties: 1. Invariance with respect to monotonous transformations of the function (scaling) 2. Invariance with respect to orthogonal transformations of the search space (rotation). CMAlike Adaptive Encoding Update mostly based on principal component analysis is used to extend the coordinate descent method to the optimization of nonseparable problems. The adaptation of an appropriate coordinate system allows adaptive coordinate descent to outperform coordinate descent on nonseparable functions. ARTIVA 
Adaptive CrossModal FewShot Learning  Metricbased metalearning techniques have successfully been applied to fewshot classification problems. However, leveraging crossmodal information in a fewshot setting has yet to be explored. When the support from visual information is limited in fewshot image classification, semantic representatins (learned from unsupervised text corpora) can provide strong prior knowledge and context to help learning. Based on this intuition, we design a model that is able to leverage visual and semantic features in the context of fewshot classification. We propose an adaptive mechanism that is able to effectively combine both modalities conditioned on categories. Through a series of experiments, we show that our method boosts the performance of metricbased approaches by effectively exploiting language structure. Using this extra modality, our model bypass current unimodal stateoftheart methods by a large margin on two important benchmarks: miniImageNet and tieredImageNet. The improvement in performance is particularly large when the number of shots are small. 
Adaptive Density Peak Detection (ADPclust) 
ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon Rodriguez and Laio’s idea. ADPclust clusters data by finding density peaks in a densitydistance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes an user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional ‘densitydistance plot’. asremlPlus 
Adaptive DensityBased Spatial Clustering of Applications with Noise (ADBSCAN) 
Densitybased spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the highperformance rate for dataset where clusters have the constant density of data points. One of the significant attributes of this algorithm is noise cancellation. However, DBSCAN demonstrates reduced performances for clusters with different densities. Therefore, in this paper, an adaptive DBSCAN is proposed which can work significantly well for identifying clusters with varying densities. 
Adaptive Feeding (AF) 
Object detection aims at high speed and accuracy simultaneously. However, fast models are usually less accurate, while accurate models cannot satisfy our need for speed. A fast model can be 10 times faster but 50\% less accurate than an accurate model. In this paper, we propose Adaptive Feeding (AF) to combine a fast (but less accurate) detector and an accurate (but slow) detector, by adaptively determining whether an image is easy or hard and choosing an appropriate detector for it. In practice, we build a cascade of detectors, including the AF classifier which make the easy vs. hard decision and the two detectors. The AF classifier can be tuned to obtain different tradeoff between speed and accuracy, which has negligible training time and requires no additional training data. Experimental results on the PASCAL VOC, MS COCO and Caltech Pedestrian datasets confirm that AF has the ability to achieve comparable speed as the fast detector and comparable accuracy as the accurate one at the same time. As an example, by combining the fast SSD300 with the accurate SSD500 detector, AF leads to 50\% speedup over SSD500 with the same precision on the VOC2007 test set. 
Adaptive FunctiononScalar Regression with a Smoothing Elastic Net (AFSSEN) 
This paper presents a new methodology, called AFSSEN, to simultaneously select significant predictors and produce smooth estimates in a highdimensional functiononscalar linear model with a subGaussian errors. Outcomes are assumed to lie in a general real separable Hilbert space, H, while parameters lie in a subspace known as a Cameron Martin space, K, which are closely related to Reproducing Kernel Hilbert Spaces, so that parameter estimates inherit particular properties, such as smoothness or periodicity, without enforcing such properties on the data. We propose a regularization method in the style of an adaptive Elastic Net penalty that involves mixing two types of functional norms, providing a fine tune control of both the smoothing and variable selection in the estimated model. Asymptotic theory is provided in the form of a functional oracle property, and the paper concludes with a simulation study demonstrating the advantage of using AFSSEN over existing methods in terms of prediction error and variable selection. 
Adaptive Generalized PCA (adaptive gPCA) 
When working with large biological data sets, exploratory analysis is an important first step for understanding the latent structure and for generating hypotheses to be tested in subsequent analyses. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results which are unstable and difficult to interpret. To mitigate these problems, we have developed a method which allows the analyst to incorporate side information about the relationships between the variables in a way that encourages similar variables to have similar loadings on the principal axes. This leads to a lowdimensional representation of the samples which both describes the latent structure and which has axes which are interpretable in terms of groups of closely related variables. The method is derived by putting a prior encoding the relationships between the variables on the data and following through the analysis on the posterior distributions of the samples. We show that our method does well at reconstructing true latent structure in simulated data and we also demonstrate the method on a dataset investigating the effects of antibiotics on the composition of bacteria in the human gut. adaptiveGPCA 
Adaptive gPCA  When working with large biological data sets, exploratory analysis is an important first step for understanding the latent structure and for generating hypotheses to be tested in subsequent analyses. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results which are unstable and difficult to interpret. To mitigate these problems, we have developed a method which allows the analyst to incorporate side information about the relationships between the variables in a way that encourages similar variables to have similar loadings on the principal axes. This leads to a lowdimensional representation of the samples which both describes the latent structure and which has axes which are interpretable in terms of groups of closely related variables. The method is derived by putting a prior encoding the relationships between the variables on the data and following through the analysis on the posterior distributions of the samples. We show that our method does well at reconstructing true latent structure in simulated data and we also demonstrate the method on a dataset investigating the effects of antibiotics on the composition of bacteria in the human gut. 
Adaptive Graph Convolutional Neural Network  Graph Convolutional Neural Networks (Graph CNNs) are generalizations of classical CNNs to handle graph data such as molecular data, point could and social networks. Current filters in graph CNNs are built for fixed and shared graph structure. However, for most real data, the graph structures varies in both size and connectivity. The paper proposes a generalized and flexible graph CNN taking data of arbitrary graph structure as input. In that way a taskdriven adaptive graph is learned for each graph data while training. To efficiently learn the graph, a distance metric learning is proposed. Extensive experiments on nine graphstructured datasets have demonstrated the superior performance improvement on both convergence speed and predictive accuracy. ➘ “Graph Convolutional Neural Network” 
Adaptive Huber Regression  This paper investigates the theoretical underpinnings of two fundamental statistical inference problems, the construction of confidence sets and largescale simultaneous hypothesis testing, in the presence of heavytailed data. With heavytailed observation noise, finite sample properties of the least squaresbased methods, typified by the sample mean, are suboptimal both theoretically and empirically. In this paper, we demonstrate that the adaptive Huber regression, integrated with the multiplier bootstrap procedure, provides a useful robust alternative to the method of least squares. Our theoretical and empirical results reveal the effectiveness of the proposed method, and highlight the importance of having inference methods that are robust to heavy tailedness. 
Adaptive Intelligence Optimizer (AIO) 
Particle Swarm Optimization (PSO) is an Evolutionary Algorithm (EA) that utilizes a swarm of particles to solve an optimization problem. Slow Intelligence System (SIS) is a learning framework which slowly learns the solution to a problem performing a series of operations. Moreover, Learning Automata (LA) are minuscule but effective decision making entities which are best suited to act as a controller component. In this paper, we combine two isolate populations of PSO to forge the Adaptive Intelligence Optimizer (AIO) which harnesses the advantages of a bipopulation PSO to escape from the local minimum and avoid premature convergence. Furthermore, using the rich framework of SIS and the nifty control theory that LA derived from, we find the perfect matching between SIS and LA where acting slowly is the pillar of both of them. Both SIS and LA need time to converge to the optimal decision where this enables AIO to outperform standard PSO having an incomparable performance on evolutionary optimization benchmark functions. 
Adaptive Labeled MultiBernoulli Filter  This paper proposes a new multiBernoulli filter called the Adaptive Labeled MultiBernoulli filter. It combines the relative strengths of the known DeltaGeneralized Labeled MultiBernoulli and the Labeled MultiBernoulli filter. The proposed filter provides a more precise target tracking in critical situations, where the Labeled MultiBernoulli filter looses information through the approximation error in the update step. In noncritical situations it inherits the advantage of the Labeled MultiBernoulli filter to reduce the computational complexity by using the LMB approximation. 
Adaptive Learning for MultiAgent Navigation (ALAN) 
In multiagent navigation, agents need to move towards their goal locations while avoiding collisions with other agents and static obstacles, often without communication with each other. Existing methods compute motions that are optimal locally but do not account for the aggregated motions of all agents, producing inefficient global behavior especially when agents move in a crowded space. In this work, we develop methods to allow agents to dynamically adapt their behavior to their local conditions. We accomplish this by formulating the multiagent navigation problem as an actionselection problem, and propose an approach, ALAN, that allows agents to compute timeefficient and collisionfree motions. ALAN is highly scalable because each agent makes its own decisions on how to move using a set of velocities optimized for a variety of navigation tasks. Experimental results show that the agents using ALAN, in general, reach their destinations faster than using ORCA, a stateoftheart collision avoidance framework, the Social Forces model for pedestrian navigation, and a Predictive collision avoidance model. 
Adaptive Learning Rate (ADADELTA) 
Automatically set learning rate for each neuron in a neural network based on it´s training history. asVPC 
Adaptive Least Absolute Shrinkage and Screening Operator (ALADE) 
➘ “Least Absolute Shrinkage and Screening Operator” 
Adaptive Linear Neuron (ADALINE) 
ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early singlelayer artificial neural network and the name of the physical device that implemented this network. The network uses memistors. It was developed by Professor Bernard Widrow and his graduate student Ted Hoff at Stanford University in 1960. It is based on the McCullochPitts neuron. It consists of a weight, a bias and a summation function. The difference between Adaline and the standard (McCullochPitts) perceptron is that in the learning phase the weights are adjusted according to the weighted sum of the inputs (the net). In the standard perceptron, the net is passed to the activation (transfer) function and the function’s output is used for adjusting the weights. There also exists an extension known as Madaline. ATE 
Adaptive Locality Preserving Regression (ALPR) 
This paper proposes a novel discriminative regression method, called adaptive locality preserving regression (ALPR) for classification. In particular, ALPR aims to learn a more flexible and discriminative projection that not only preserves the intrinsic structure of data, but also possesses the properties of feature selection and interpretability. To this end, we introduce a target learning technique to adaptively learn a more discriminative and flexible target matrix rather than the predefined strict zeroone label matrix for regression. Then a locality preserving constraint regularized by the adaptive learned weights is further introduced to guide the projection learning, which is beneficial to learn a more discriminative projection and avoid overfitting. Moreover, we replace the conventional `Frobenius norm’ with the special l21 norm to constrain the projection, which enables the method to adaptively select the most important features from the original highdimensional data for feature extraction. In this way, the negative influence of the redundant features and noises residing in the original data can be greatly eliminated. Besides, the proposed method has good interpretability for features owing to the rowsparsity property of the l21 norm. Extensive experiments conducted on the synthetic database with manifold structure and many realworld databases prove the effectiveness of the proposed method. 
Adaptive Massively Parallel Computation (AMPC) 
We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round, all machines are provided with random read access to the data store, subject to the same constraints on the total amount of communication as in the MPC model. Our model is inspired by the previous empirical studies of distributed graph algorithms using MapReduce and a distributed hash table service. This extension allows us to give new graph algorithms with much lower round complexities compared to the best known solutions in the MPC model. In particular, in the AMPC model we show how to solve maximal independent set in $O(1)$ rounds and connectivity/minimum spanning tree in $O(\log\log_{m/n} n)$ rounds both using $O(n^\delta)$ space per machine for constant $\delta < 1$. In the same memory regime for MPC, the best known algorithms for these problems require polylog $n$ rounds. Our results imply that the 2Cycle conjecture, which is widely believed to hold in the MPC model, does not hold in the AMPC model. 
Adaptive Memory Network (AMN) 
We present Adaptive Memory Networks (AMN) that processes inputquestion pairs to dynamically construct a network architecture optimized for lower inference times for Question Answering (QA) tasks. AMN processes the input story to extract entities and stores them in memory banks. Starting from a single bank, as the number of input entities increases, AMN learns to create new banks as the entropy in a single bank becomes too high. Hence, after processing an inputquestion(s) pair, the resulting network represents a hierarchical structure where entities are stored in different banks, distanced by question relevance. At inference, one or few banks are used, creating a tradeoff between accuracy and performance. AMN is enabled by dynamic networks that allow input dependent network creation and efficiency in dynamic minibatching as well as our novel bank controller that allows learning discrete decision making with high accuracy. In our results, we demonstrate that AMN learns to create variable depth networks depending on task complexity and reduces inference times for QA tasks. 
Adaptive Mixture Discriminant Analysis (AMDA) 
In supervised learning, an important issue usually not taken into account by classical methods is the possibility of having in the test set individuals belonging to a class which has not been observed during the learning phase. Classical supervised algorithms will automatically label such observations as belonging to one of the known classes in the training set and will not be able to detect new classes. This work introduces a modelbased discriminant analysis method, called adaptive mixture discriminant analysis (AMDA), which is able to detect several unobserved groups of points and to adapt the learned classifier to the new situation. Two EMbased procedures are proposed for parameter estimation and model selection criteria are used for selecting the actual number of classes. Experiments on artificial and real data demonstrate the ability of the proposed method to deal with complex and real word problems. The proposed approach is also applied to the detection of unobserved communities in social network analysis. bacr 
Adaptive Moving Average (AMA) 
The Adaptive Moving Average (AMA) study is similar to the exponential moving average (EMA), except the AMA uses a scalable constant instead of a fixed constant for smoothing the data. Barnard 
Adaptive Moving Selforganizing Map (AMSOM) 
SelfOrganizing Map (SOM) is a neural network model which is used to obtain a topologypreserving mapping from the (usually high dimensional) input/feature space to an output/map space of fewer dimensions (usually two or three in order to facilitate visualization). Neurons in the output space are connected with each other but this structure remains fixed throughout training and learning is achieved through the updating of neuron reference vectors in feature space. Despite the fact that growing variants of SOM overcome the fixed structure limitation they increase computational cost and also do not allow the removal of a neuron after its introduction. In this paper, a variant of SOM is proposed called AMSOM (Adaptive Moving SelfOrganizing Map) that on the one hand creates a more flexible structure where neuron positions are dynamically altered during training and on the other hand tackles the drawback of having a predefined grid by allowing neuron addition and/or removal during training. Experiments using multiple literature datasets show that the proposed method improves training performance of SOM, leads to a better visualization of the input dataset and provides a framework for determining the optimal number and structure of neurons. 
Adaptive Network Scaling (ANS) 
This work provides a thorough study on how reward scaling can affect performance of deep reinforcement learning agents. In particular, we would like to answer the question that how does reward scaling affect nonsaturating ReLU networks in RL? This question matters because ReLU is one of the most effective activation functions for deep learning models. We also propose an Adaptive Network Scaling framework to find a suitable scale of the rewards during learning for better performance. We conducted empirical studies to justify the solution. 
Adaptive Neural Tree (ANT) 
Deep neural networks and decision trees operate on largely separate paradigms; typically, the former performs representation learning with prespecified architectures, while the latter is characterised by learning hierarchies over prespecified features with datadriven architectures. We unite the two via adaptive neural trees (ANTs), a model that incorporates representation learning into edges, routing functions and leaf nodes of a decision tree, along with a backpropagationbased training algorithm that adaptively grows the architecture from primitive modules (e.g., convolutional layers). We demonstrate that, whilst achieving over 99% and 90% accuracy on MNIST and CIFAR10 datasets, ANTs benefit from (i) faster inference via conditional computation, (ii) increased interpretability via hierarchical clustering e.g. learning meaningful class associations, such as separating natural vs. manmade objects, and (iii) a mechanism to adapt the architecture to the size and complexity of the training dataset. 
AdaPtive Noisy Data Augmentation (PANDA) 
We propose PANDA, an AdaPtive Noise Augmentation technique to regularize estimating and constructing undirected graphical models (UGMs). PANDA iteratively solves MLEs given noise augmented data in the regressionbased framework until convergence to achieve the designed regularization effects. The augmented noises can be designed to achieve various regularization effects on graph estimation, including the bridge, elastic net, adaptive lasso, and SCAD penalization; it can also offer group lasso and fused ridge when some nodes belong to the same group. We establish theoretically that the noiseaugmented loss functions and its minimizer converge almost surely to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized regression coefficients through PANDA in GLMs, based on which, the inferences for the parameters can be obtained simultaneously with variable selection. Our empirical results suggest the inferences achieve nominal or nearnominal coverage and are far more efficient compared to some existing postselection procedures. On the algorithm level, PANDA can be easily programmed in any standard software without resorting to complicated optimization techniques. We show the noninferior performance of PANDA in constructing graphs of different types in simulation studies and also apply PANDA to the autism spectrum disorder data to construct a mixednode graph. 
Adaptive Nonparametric Clustering  ➘ “Adaptive Weights Clustering” 
Adaptive Principal Component Monitoring (APC) 
The highdimensionality and volume of large scale multistream data has inhibited significant research progress in developing an integrated monitoring and diagnostics (M&D) approach. This data, also categorized as big data, is becoming common in manufacturing plants. In this paper, we propose an integrated M\&D approach for large scale streaming data. We developed a novel monitoring method named Adaptive Principal Component monitoring (APC) which adaptively chooses PCs that are most likely to vary due to the change for early detection. Importantly, we integrate a novel diagnostic approach, Principal Component Signal Recovery (PCSR), to enable a streamlined SPC. This diagnostics approach draws inspiration from Compressed Sensing and uses Adaptive Lasso for identifying the sparse change in the process. We theoretically motivate our approaches and do a performance evaluation of our integrated M&D method through simulations and case studies. 
Adaptive pvalue Thresholding (AdaPT) 
We consider the problem of multiple hypothesis testing with generic side information: for each hypothesis $H_i$ we observe both a pvalue $p_i$ and some predictor $x_i$ encoding contextual information about the hypothesis. For largescale problems, adaptively focusing power on the more promising hypotheses (those more likely to yield discoveries) can lead to much more powerful multiple testing procedures. We propose a general iterative framework for this problem, called the Adaptive pvalue Thresholding (AdaPT) procedure, which adaptively estimates a Bayesoptimal pvalue rejection threshold and controls the false discovery rate (FDR) in finite samples. At each iteration of the procedure, the analyst proposes a rejection threshold and observes partially censored pvalues, estimates the false discovery proportion (FDP) below the threshold, and either stops to reject or proposes another threshold, until the estimated FDP is below $\alpha$. Our procedure is adaptive in an unusually strong sense, permitting the analyst to use any statistical or machine learning method she chooses to estimate the optimal threshold, and to switch between different models at each iteration as information accrues. 
Adaptive Quantile LowRank Matrix Factorization (AQLRMF) 
Lowrank matrix factorization (LRMF) has received much popularity owing to its successful applications in both computer vision and data mining. By assuming the noise term to come from a Gaussian, Laplace or a mixture of Gaussian distributions, significant efforts have been made on optimizing the (weighted) $L_1$ or $L_2$norm loss between an observed matrix and its bilinear factorization. However, the type of noise distribution is generally unknown in real applications and inappropriate assumptions will inevitably deteriorate the behavior of LRMF. On the other hand, real data are often corrupted by skew rather than symmetric noise. To tackle this problem, this paper presents a novel LRMF model called AQLRMF by modeling noise with a mixture of asymmetric Laplace distributions. An efficient algorithm based on the expectationmaximization (EM) algorithm is also offered to estimate the parameters involved in AQLRMF. The AQLRMF model possesses the advantage that it can approximate noise well no matter whether the real noise is symmetric or skew. The core idea of AQLRMF lies in solving a weighted $L_1$ problem with weights being learned from data. The experiments conducted with synthetic and real datasets show that AQLRMF outperforms several stateoftheart techniques. Furthermore, AQLRMF also has the superiority over the other algorithms that it can capture local structural information contained in real images. 
Adaptive Quantile Sparse Image (AQuaSI) 
Inverse problems play a central role for many classical computer vision and image processing tasks. A key challenge in solving an inverse problem is to find an appropriate prior to convert an illposed problem into a wellposed task. Many of the existing priors, like total variation, are based on adhoc assumptions that have difficulties to represent the actual distribution of natural images. In this work, we propose the Adaptive Quantile Sparse Image (AQuaSI) prior. It is based on a quantile filter, can be used as a joint filter on guidance data, and be readily plugged into a wide range of numerical optimization algorithms. We demonstrate the efficacy of the proposed prior in joint RGB/depth upsampling, on RGB/NIR image restoration, and in a comparison with related regularization by denoising approaches. 
Adaptive Resonance Theory (ART) 
Adaptive Resonance Theory, or ART, is a cognitive and neural theory of how the brain autonomously learns to categorize, recognize, and predict objects and events in a changing world. This article reviews classical and recent developments of ART, and provides a synthesis of concepts, principles, mechanisms, architectures, and the interdisciplinary data bases that they have helped to explain and predict. The review illustrates that ART is currently the most highly developed cognitive and neural theory available, with the broadest explanatory and predictive range. Central to ART’s predictive power is its ability to carry out fast, incremental, and stable unsupervised and supervised learning in response to a changing world. ART specifies mechanistic links between processes of consciousness, learning, expectation, attention, resonance, and synchrony during both unsupervised and supervised learning. ART provides functional and mechanistic explanations of such diverse topics as laminar cortical circuitry; invariant object and scenic gist learning and recognition; prototype, surface, and boundary attention; gamma and beta oscillations; learning of entorhinal grid cells and hippocampal place cells; computation of homologous spatial and temporal mechanisms in the entorhinalhippocampal system; vigilance breakdowns during autism and medial temporal amnesia; cognitiveemotional interactions that focus attention on valued objects in an adaptively timed way; itemorderrank working memories and learned list chunks for the planning and control of sequences of linguistic, spatial, and motor information; conscious speech percepts that are influenced by future context; auditory streaming in noise during source segregation; and speaker normalization. Brain regions that are functionally described include visual and auditory neocortex; specific and nonspecific thalamic nuclei; inferotemporal, parietal, prefrontal, entorhinal, hippocampal, parahippocampal, perirhinal, and motor cortices; frontal eye fields; supplementary eye fields; amygdala; basal ganglia: cerebellum; and superior colliculus. Due to the complementary organization of the brain, ART does not describe many spatial and motor behaviors whose matching and learning laws differ from those of ART. ART algorithms for engineering and technology are listed, as are comparisons with other types of models. 
Adaptive Robust Control  In this paper we propose a new methodology for solving an uncertain stochastic Markovian control problem in discrete time. We call the proposed methodology the adaptive robust control. We demonstrate that the uncertain control problem under consideration can be solved in terms of associated adaptive robust Bellman equation. The success of our approach is to the great extend owed to the recursive methodology for construction of relevant confidence regions. We illustrate our methodology by considering an optimal portfolio allocation problem, and we compare results obtained using the adaptive robust control method with some other existing methods. 
Adaptive Scaling  This paper focuses on detection tasks in information extraction, where positive instances are sparsely distributed and models are usually evaluated using Fmeasure on positive classes. These characteristics often result in deficient performance of neural network based detection models. In this paper, we propose adaptive scaling, an algorithm which can handle the positive sparsity problem and directly optimize over Fmeasure via dynamic costsensitive learning. To this end, we borrow the idea of marginal utility from economics and propose a theoretical framework for instance importance measuring without introducing any additional hyperparameters. Experiments show that our algorithm leads to a more effective and stable training of neural network based detection models. 
Adaptive Sequence Submodularity  In many machine learning applications, one needs to interactively select a sequence of items (e.g., recommending movies based on a user’s feedback) or make sequential decisions in certain orders (e.g., guiding an agent through a series of states). Not only do sequences already pose a dauntingly large search space, but we must take into account past observations, as well as the uncertainty of future outcomes. Without further structure, finding an optimal sequence is notoriously challenging, if not completely intractable. In this paper, we introduce adaptive sequence submodularity, a rich framework that generalizes the notion of submodularity to adaptive policies that explicitly consider sequential dependencies between items. We show that once such dependencies are encoded by a directed graph, an adaptive greedy policy is guaranteed to achieve a constant factor approximation guarantee, where the constant naturally depends on the structural properties of the underlying graph. Additionally, to demonstrate the practical utility of our results, we run experiments on Amazon product recommendation and Wikipedia link prediction tasks. 
Adaptive Sequential Machine Learning  A framework previously introduced in [3] for solving a sequence of stochastic optimization problems with bounded changes in the minimizers is extended and applied to machine learning problems such as regression and classification. The stochastic optimization problems arising in these machine learning problems is solved using algorithms such as stochastic gradient descent (SGD). A method based on estimates of the change in the minimizers and properties of the optimization algorithm is introduced for adaptively selecting the number of samples at each time step to ensure that the excess risk, i.e., the expected gap between the loss achieved by the approximate minimizer produced by the optimization algorithm and the exact minimizer, does not exceed a target level. A bound is developed to show that the estimate of the change in the minimizers is nontrivial provided that the excess risk is small enough. Extensions relevant to the machine learning setting are considered, including a costbased approach to select the number of samples with a cost budget over a fixed horizon, and an approach to applying crossvalidation for model selection. Finally, experiments with synthetic and real data are used to validate the algorithms. 
Adaptive Shivers Sort (ASS) 
We present a stable mergesort, called~\ASS, that exploits the existence of monotonic runs for sorting efficiently partially sorted data. We also prove that, although this algorithm is simple to implement, its computational cost, in number of comparisons performed, is optimal up to an additive linear term. 
Adaptive Skip Interval (ASI) 
We introduce a method which enables a recurrent dynamics model to be temporally abstract. Our approach, which we call Adaptive Skip Intervals (ASI), is based on the observation that in many sequential prediction tasks, the exact time at which events occur is irrelevant to the underlying objective. Moreover, in many situations, there exist prediction intervals which result in particularly easytopredict transitions. We show that there are prediction tasks for which we gain both computational efficiency and prediction accuracy by allowing the model to make predictions at a sampling rate which it can choose itself. 
Adaptive Software  http://…/Adaptive_software_development batteryreduction 
Adaptive Stress Testing (AST) 
Finding the most likely path to a set of failure states is important to the analysis of safetycritical dynamic systems. While efficient solutions exist for certain classes of systems, a scalable general solution for stochastic, partiallyobservable, and continuousvalued systems remains challenging. Existing approaches in formal and simulationbased methods either cannot scale to large systems or are computationally inefficient. This paper presents adaptive stress testing (AST), a framework for searching a simulator for the most likely path to a failure event. We formulate the problem as a Markov decision process and use reinforcement learning to optimize it. The approach is simulationbased and does not require internal knowledge of the system. As a result, the approach is very suitable for black box testing of large systems. We present formulations for both systems where the state is fullyobservable and partiallyobservable. In the latter case, we present a modified Monte Carlo tree search algorithm that only requires access to the pseudorandom number generator of the simulator to overcome partial observability. We also present an extension of the framework, called differential adaptive stress testing (DAST), that can be used to find failures that occur in one system but not in another. This type of differential analysis is useful in applications such as regression testing, where one is concerned with finding areas of relative weakness compared to a baseline. We demonstrate the effectiveness of the approach on an aircraft collision avoidance application, where we stress test a prototype aircraft collision avoidance system to find highprobability scenarios of near midair collisions. 
Adaptive SVM+  Incorporating additional knowledge in the learning process can be beneficial for several computer vision and machine learning tasks. Whether privileged information originates from a source domain that is adapted to a target domain, or as additional features available at training time only, using such privileged (i.e., auxiliary) information is of high importance as it improves the recognition performance and generalization. However, both primary and privileged information are rarely derived from the same distribution, which poses an additional challenge to the recognition task. To address these challenges, we present a novel learning paradigm that leverages privileged information in a domain adaptation setup to perform visual recognition tasks. The proposed framework, named Adaptive SVM+, combines the advantages of both the learning using privileged information (LUPI) paradigm and the domain adaptation framework, which are naturally embedded in the objective function of a regular SVM. We demonstrate the effectiveness of our approach on the publicly available Animals with Attributes and INTERACT datasets and report stateoftheart results in both of them. 
Adaptive System  The term adaptation is used in biology in relation to how living beings adapt to their environments, but with two different meanings. First, the continuous adaptation of an organism to its environment, so as to maintain itself in a viable state, through sensory feedback mechanisms. Second, the development (through evolutionary steps) of an adaptation (an anatomic structure, physiological process or behavior characteristic) that increases the probability of an organism reproducing itself (although sometimes not directly). Generally speaking, an adaptive system is a set of interacting or interdependent entities, real or abstract, forming an integrated whole that together are able to respond to environmental changes or changes in the interacting parts. Feedback loops represent a key feature of adaptive systems, allowing the response to changes; examples of adaptive systems include: natural ecosystems, individual organisms, human communities, human organizations, and human families. Some artificial systems can be adaptive as well; for instance, robots employ control systems that utilize feedback loops to sense new conditions in their environment and adapt accordingly. BayesBridge 
Adaptive Sznajd Model  Understanding the way in which human opinion changes along time and space constitutes one of the great challenges in complex systems research. Among the several approaches that have been attempted at studying this problem, the Sznajd model provides some particularly interesting features, such as its simplicity and ability to represent some of the mechanisms believed to be involved in opinion dynamics. The standard Sznajd model at zero temperature is characterized by converging to one stable state, implying null diversity of opinions. In the present work, we develop an approach — namely the adaptive Sznajd model — in which changes of opinion by an individual (i.e. a network node) implies in possible alterations in the network topology. This is accomplished by allowing agents to change their connections preferentially to other neighbors with the same state. The diversity of opinions along time is quantified in terms of the exponential of the entropy of the opinions density. Several interesting results are reported, including the possible formation of echo chambers or social bubbles. Also, depending on the parameters configuration, the dynamics may converge to different equilibrium states for the same parameter setting. We also investigate the dynamics of the proposed adaptive model at nonnull temperatures. 
Adaptive ThoulessAndersonPalmer Mean Field Approach (ADATAP) 
We develop an advanced mean field method for approximating averages in probabilistic data models that is based on the TAP approach of disorder physics. In contrast to conventional TAP, where the knowledge of the distribution of couplings between the random variables is required, our method adapts to the concrete couplings. We demonstrate the validity of our approach, which is sofar restricted to models with nonglassy behaviour, by replica calculations for a wide class of models as well as by simulations for a real data set. http://…nergybeliefpropagationandsparsity.pdf BayesLCA 
Adaptive UpperConfidenceBound Algorithm (AdaLinUCB) 
In this paper, we propose and study opportunistic contextual bandits – a special case of contextual bandits where the exploration cost varies under different environmental conditions, such as network load or return variation in recommendations. When the exploration cost is low, so is the actual regret of pulling a suboptimal arm (e.g., trying a suboptimal recommendation). Therefore, intuitively, we could explore more when the exploration cost is relatively low and exploit more when the exploration cost is relatively high. Inspired by this intuition, for opportunistic contextual bandits with Linear payoffs, we propose an Adaptive UpperConfidenceBound algorithm (AdaLinUCB) to adaptively balance the explorationexploitation tradeoff for opportunistic learning. We prove that AdaLinUCB achieves O((log T)^2) problemdependent regret upper bound, which has a smaller coefficient than that of the traditional LinUCB algorithm. Moreover, based on both synthetic and realworld dataset, we show that AdaLinUCB significantly outperforms other contextual bandit algorithms, under large exploration cost fluctuations. 
Adaptive Virtual Patient (AVP) 
Deep neural networks have achieved great success in multiple learning problems, and attracted increasing attention from the medicine community. In reality, however, the limited availability and high costs of medical data is a major challenge of applying deep neural networks to computeraided diagnosis and treatment planning. We address this challenge with adaptive virtual patients (AVPs) and the associated physicsinformed learning framework. Specifically, the original training dataset is fused with an additional dataset of AVPs, which are generated by a datadriven model and the associated supervision (e.g., labels) is obtained by a physicsbased approach. A key novelty in the proposed framework is the bidirectional and uncoupled generative invertible networks (GIN), which can extract pathophysiological features from the training medical image and generate pathophysiologically meaningful virtual patients. In order to mitigate the possibly high labeling cost of physical experiments, a $\mu$measure design is conducted: this allows the AVPs to not only further explore the uncertain regions, but also balance the label distribution. We then discuss the pathophysiological interpretability of GIN both theoretically and experimentally, and demonstrate the effectiveness of AVPs using a real medical image dataset, in which the proposed AVPs lower the labeling cost by 90% while achieving a 15% improvement in prediction accuracy. 
Adaptive Weighted SuperResolution Network (AWSRN) 
Deep learning has been successfully applied to the singleimage superresolution (SISR) task with great performance in recent years. However, most convolutional neural network based SR models require heavy computation, which limit their realworld applications. In this work, a lightweight SR network, named Adaptive Weighted SuperResolution Network (AWSRN), is proposed for SISR to address this issue. A novel local fusion block (LFB) is designed in AWSRN for efficient residual learning, which consists of stacked adaptive weighted residual units (AWRU) and a local residual fusion unit (LRFU). Moreover, an adaptive weighted multiscale (AWMS) module is proposed to make full use of features in reconstruction layer. AWMS consists of several different scale convolutions, and the redundancy scale branch can be removed according to the contribution of adaptive weights in AWMS for lightweight network. The experimental results on the commonly used datasets show that the proposed lightweight AWSRN achieves superior performance on x2, x3, x4, and x8 scale factors to stateoftheart methods with similar parameters and computational overhead. Code is avaliable at: https://…/AWSRN 
Adaptive Weights Clustering (AWC) 
This paper presents a new approach to nonparametric cluster analysis called Adaptive Weights Clustering (AWC). The idea is to identify the clustering structure by checking at different points and for different scales on departure from local homogeneity. The proposed procedure describes the clustering structure in terms of weights \( w_{ij} \) each of them measures the degree of local inhomogeneity for two neighbor local clusters using statistical tests of ‘no gap’ between them. % The procedure starts from very local scale, then the parameter of locality grows by some factor at each step. The method is fully adaptive and does not require to specify the number of clusters or their structure. The clustering results are not sensitive to noise and outliers, the procedure is able to recover different clusters with sharp edges or manifold structure. The method is scalable and computationally feasible. An intensive numerical study shows a stateoftheart performance of the method in various artificial examples and applications to text data. Our theoretical study states optimal sensitivity of AWC to local inhomogeneity. 
Adaptive Windowbased Streaming Edge Partitioning (ADWISE) 
In recent years, the graph partitioning problem gained importance as a mandatory preprocessing step for distributed graph processing on very large graphs. Existing graph partitioning algorithms minimize partitioning latency by assigning individual graph edges to partitions in a streaming manner — at the cost of reduced partitioning quality. However, we argue that the mere minimization of partitioning latency is not the optimal design choice in terms of minimizing total graph analysis latency, i.e., the sum of partitioning and processing latency. Instead, for complex and longrunning graph processing algorithms that run on very large graphs, it is beneficial to invest more time into graph partitioning to reach a higher partitioning quality — which drastically reduces graph processing latency. In this paper, we propose ADWISE, a novel windowbased streaming partitioning algorithm that increases the partitioning quality by always choosing the best edge from a set of edges for assignment to a partition. In doing so, ADWISE controls the partitioning latency by adapting the window size dynamically at runtime. Our evaluations show that ADWISE can reach the sweet spot between graph partitioning latency and graph processing latency, reducing the total latency of partitioning plus processing by up to 2347 percent compared to the stateoftheart. 
Adaptive Wing Loss  Heatmap regression has became one of the mainstream approaches to localize facial landmarks. As Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are becoming popular in solving computer vision tasks, extensive research has been done on these architectures. However, the loss function for heatmap regression is rarely studied. In this paper, we analyze the ideal loss function properties for heatmap regression in face alignment problems. Then we propose a novel loss function, named Adaptive Wing loss, that is able to adapt its shape to different types of ground truth heatmap pixels. This adaptability decreases the loss to zero on foreground pixels while leaving some loss on background pixels. To address the imbalance between foreground and background pixels, we also propose Weighted Loss Map, which assigns high weights on foreground and difficult background pixels to help training process focus more on pixels that are crucial to landmark localization. To further improve face alignment accuracy, we introduce boundary prediction and CoordConv with boundary coordinates. Extensive experiments on different benchmarks, including COFW, 300W and WFLW, show our approach outperforms the stateoftheart by a significant margin on various evaluation metrics. Besides, the Adaptive Wing loss also helps other heatmap regression tasks. Code will be made publicly available. 
AdaptiveDynamic PCA  mvMonitoring 
Adaptively Connected Neural Network (ACNet) 
This paper presents a novel adaptively connected neural network (ACNet) to improve the traditional convolutional neural networks (CNNs) {in} two aspects. First, ACNet employs a flexible way to switch global and local inference in processing the internal feature representations by adaptively determining the connection status among the feature nodes (e.g., pixels of the feature maps) \footnote{In a computer vision domain, a node refers to a pixel of a feature map{, while} in {the} graph domain, a node denotes a graph node.}. We can show that existing CNNs, the classical multilayer perceptron (MLP), and the recently proposed nonlocal network (NLN) \cite{nonlocalnn17} are all special cases of ACNet. Second, ACNet is also capable of handling nonEuclidean data. Extensive experimental analyses on {a variety of benchmarks (i.e.,} ImageNet1k classification, COCO 2017 detection and segmentation, CUHK03 person reidentification, CIFAR analysis, and Cora document categorization) demonstrate that {ACNet} cannot only achieve stateoftheart performance but also overcome the limitation of the conventional MLP and CNN \footnote{Corresponding author: Liang Lin (linliang@ieee.org)}. The code is available at \url{https://…/AdaptivelyConnectedNeuralNetworks}. 
Adaptively Scaled Recurrent Neural Network (ASRNN) 
Recent advancements in recurrent neural network (RNN) research have demonstrated the superiority of utilizing multiscale structures in learning temporal representations of time series. Currently, most of multiscale RNNs use fixed scales, which do not comply with the nature of dynamical temporal patterns among sequences. In this paper, we propose Adaptively Scaled Recurrent Neural Networks (ASRNN), a simple but efficient way to handle this problem. Instead of using predefined scales, ASRNNs are able to learn and adjust scales based on different temporal contexts, making them more flexible in modeling multiscale patterns. Compared with other multiscale RNNs, ASRNNs are bestowed upon dynamical scaling capabilities with much simpler structures, and are easy to be integrated with various RNN cells. The experiments on multiple sequence modeling tasks indicate ASRNNs can efficiently adapt scales based on different sequence contexts and yield better performances than baselines without dynamical scaling abilities. 
Adaptively Transforming Graph Matching (ATGM) 
Recently, many graph matching methods that incorporate pairwise constraint and that can be formulated as a quadratic assignment problem (QAP) have been proposed. Although these methods demonstrate promising results for the graph matching problem, they have high complexity in space or time. In this paper, we introduce an adaptively transforming graph matching (ATGM) method from the perspective of functional representation. More precisely, under a transformation formulation, we aim to match two graphs by minimizing the discrepancy between the original graph and the transformed graph. With a linear representation map of the transformation, the pairwise edge attributes of graphs are explicitly represented by unary node attributes, which enables us to reduce the space and time complexity significantly. Due to an efficient FrankWolfe methodbased optimization strategy, we can handle graphs with hundreds and thousands of nodes within an acceptable amount of time. Meanwhile, because transformation map can preserve graph structures, a domain adaptationbased strategy is proposed to remove the outliers. The experimental results demonstrate that our proposed method outperforms the stateoftheart graph matching algorithms. 
AdaptiveMiner  Extraction of valuable data from extensive datasets is a standout amongst the most vital exploration issues. Association rule mining is one of the highly used methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent itemsets) is most crucial part of the association rule mining task. Many singlemachine based association rule mining algorithms exist but the massive amount of data available these days is above the capacity of a single machine based algorithm. Therefore, to meet the demands of this evergrowing enormous data, there is a need for distributed association rule mining algorithm which can run on multiple machines. For these types of parallel/distributed applications, MapReduce is one of the best faulttolerant frameworks. Hadoop is one of the most popular opensource software frameworks with MapReduce based approach for distributed storage and processing of large datasets using standalone clusters built from commodity hardware. But heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce based platforms are being developed for parallel computing in recent years. Among them, a platform, namely, Spark have attracted a lot of attention because of its inbuilt support to distributed computations. Therefore, we implemented a distributed association rule mining algorithm on Spark named as AdaptiveMiner which uses adaptive approach for finding frequent patterns with higher accuracy and efficiency. AdaptiveMiner uses an adaptive strategy based on the partial processing of datasets. AdaptiveMiner makes execution plans before every iteration and goes with the best suitable plan to minimize time and space complexity. AdpativeMiner is a dynamic association rule mining algorithm which change its approach based on the nature of dataset. Therefore, it is different and better than stateoftheart static association rule mining algorithms. We conduct indepth experiments to gain insight into the effectiveness, efficiency, and scalability of the AdaptiveMiner algorithm on Spark. Available: https ://githu b.com/sanja ysing hrath i/ Adapt iveMiner 
adaQN  Recurrent Neural Networks (RNNs) are powerful models that achieve unparalleled performance on several pattern recognition problems. However, training of RNNs is a computationally difficult task owing to the wellknown ‘vanishing/exploding’ gradient problems. In recent years, several algorithms have been proposed for training RNNs. These algorithms either: exploit no (or limited) curvature information and have cheap periteration complexity; or attempt to gain significant curvature information at the cost of increased periteration cost. The former set includes diagonallyscaled firstorder methods such as ADAM and ADAGRAD while the latter consists of secondorder algorithms like HessianFree Newton and KFAC. In this paper, we present an novel stochastic quasiNewton algorithm (adaQN) for training RNNs. Our approach retains a low periteration cost while allowing for nondiagonal scaling through a stochastic LBFGS updating scheme. The method is judicious in storing and retaining LBFGS curvature pairs which is indirectly used as a means of controlling the quality of the steps. We present numerical experiments on two language modeling tasks and show that adaQN performs at par, if not better, than popular RNN training algorithms. These results suggest that quasiNewton algorithms have the potential to be a viable alternative to first and secondorder methods for training RNNs. 
ADARES  Virtual execution environments allow for consolidation of multiple applications onto the same physical server, thereby enabling more efficient use of server resources. However, users often statically configure the resources of virtual machines through guesswork, resulting in either insufficient resource allocations that hinder VM performance, or excessive allocations that waste precious data center resources. In this paper, we first characterize realworld resource allocation and utilization of VMs through the analysis of an extensive dataset, consisting of more than 250k VMs from over 3.6k private enterprise clusters. Our largescale analysis confirms that VMs are often misconfigured, either overprovisioned or underprovisioned, and that this problem is pervasive across a wide range of private clusters. We then propose ADARES, an adaptive system that dynamically adjusts VM resources using machine learning techniques. In particular, ADARES leverages the contextual bandits framework to effectively manage the adaptations. Our system exploits easily collectible data, at the cluster, node, and VM levels, to make more sensible allocation decisions, and uses transfer learning to safely explore the configurations space and speed up training. Our empirical evaluation shows that ADARES can significantly improve system utilization without sacrificing performance. For instance, when compared to threshold and predictionbased baselines, it achieves more predictable VMlevel performance and also reduces the amount of virtual CPUs and memory provisioned by up to 35% and 60% respectively for synthetic workloads on real clusters. 
ADAReverse Engineering (ADARE) 
This paper addresses detection of a reverse engineering (RE) attack targeting a deep neural network (DNN) image classifier; by querying, RE’s aim is to discover the classifier’s decision rule. RE can enable testtime evasion attacks, which require knowledge of the classifier. Recently, we proposed a quite effective approach (ADA) to detect testtime evasion attacks. In this paper, we extend ADA to detect RE attacks (ADARE). We demonstrate our method is successful in detecting ‘stealthy’ RE attacks before they learn enough to launch effective testtime evasion attacks. 
ADATE  Deep learning and deep architectures are emerging as the best machine learning methods so far in many practical applications such as reducing the dimensionality of data, image classification, speech recognition or object segmentation. In fact, many leading technology companies such as Google, Microsoft or IBM are researching and using deep architectures in their systems to replace other traditional models. Therefore, improving the performance of these models could make a strong impact in the area of machine learning. However, deep learning is a very fastgrowing research domain with many core methodologies and paradigms just discovered over the last few years. This thesis will first serve as a short summary of deep learning, which tries to include all of the most important ideas in this research area. Based on this knowledge, we suggested, and conducted some experiments to investigate the possibility of improving the deep learning based on automatic programming (ADATE). Although our experiments did produce good results, there are still many more possibilities that we could not try due to limited time as well as some limitations of the current ADATE version. I hope that this thesis can promote future work on this topic, especially when the next version of ADATE comes out. This thesis also includes a short analysis of the power of ADATE system, which could be useful for other researchers who want to know what it is capable of. 
ADCrowdNet  We propose an attentioninjective deformable convolutional network called ADCrowdNet for crowd understanding that can address the accuracy degradation problem of highly congested noisy scenes. ADCrowdNet contains two concatenated networks. An attentionaware network called Attention Map Generator (AMG) first detects crowd regions in images and computes the congestion degree of these regions. Based on detected crowd regions and congestion priors, a multiscale deformable network called Density Map Estimator (DME) then generates highquality density maps. With the attentionaware training scheme and multiscale deformable convolutional scheme, the proposed ADCrowdNet achieves the capability of being more effective to capture the crowd features and more resistant to various noises. We have evaluated our method on four popular crowd counting datasets (ShanghaiTech, UCF_CC_50, WorldEXPO’10, and UCSD) and an extra vehicle counting dataset TRANCOS, our approach overwhelmingly beats existing approaches on all of these datasets. 
Additive Latent Effect Model (ALE) 
The past decade has seen a growth in the development and deployment of educational technologies for assisting collegegoing students in choosing majors, selecting courses and acquiring feedback based on past academic performance. Grade prediction methods seek to estimate a grade that a student may achieve in a course that she may take in the future (e.g., next term). Accurate and timely prediction of students’ academic grades is important for developing effective degree planners and early warning systems, and ultimately improving educational outcomes. Existing grade pre diction methods mostly focus on modeling the knowledge components associated with each course and student, and often overlook other factors such as the difficulty of each knowledge component, course instructors, student interest, capabilities and effort. In this paper, we propose additive latent effect models that incorporate these factors to predict the student nextterm grades. Specifically, the proposed models take into account four factors: (i) student’s academic level, (ii) course instructors, (iii) student global latent factor, and (iv) latent knowledge factors. We compared the new models with several stateoftheart methods on students of various characteristics (e.g., whether a student transferred in or not). The experimental results demonstrate that the proposed methods significantly outperform the baselines on grade prediction problem. Moreover, we perform a thorough analysis on the importance of different factors and how these factors can practically assist students in course selection, and finally improve their academic performance. 
Additive Noise Models (ANM) 
The core idea of these socalled Additive Noise Models (ANM) is that if X>Y, then the variability observed in Y will be either explained by X, or by some noise that is independent of X. BCEE 
Additive Polynomial Design Matrix (AP) 
An implementation of the additive polynomial (AP) design matrix. It constructs and appends an AP design matrix to a data frame for use with longitudinal data subject to seasonality. apdesign 
Additive Principal Components (APC) 
Additive principal components are a nonlinear generalization of linear principal components. 
Additive Smoothing  In statistics, additive smoothing, also called Laplace smoothing (not to be confused with Laplacian smoothing), or Lidstone smoothing, is a technique used to smooth categorical data. bcpa,BreakoutDetection 
Additive White Gaussian Noise (AWGN) 
Additive white Gaussian noise (AWGN) is a basic noise model used in Information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics: • Additive because it is added to any noise that might be intrinsic to the information system. • White refers to the idea that it has uniform power across the frequency band for the information system. It is an analogy to the color white which has uniform emissions at all frequencies in the visible spectrum. • Gaussian because it has a normal distribution in the time domain with an average time domain value of zero. Wideband noise comes from many natural noise, such as the thermal vibrations of atoms in conductors (referred to as thermal noise or JohnsonNyquist noise), shot noise, blackbody radiation from the earth and other warm objects, and from celestial sources such as the Sun. The central limit theorem of probability theory indicates that the summation of many random processes will tend to have distribution called Gaussian or Normal. 
ADef Algorithm  While deep neural networks have proven to be a powerful tool for many recognition and classification tasks, their stability properties are still not well understood. In the past, image classifiers have been shown to be vulnerable to socalled adversarial attacks, which are created by additively perturbing the correctly classified image. In this paper, we propose the ADef algorithm to construct a different kind of adversarial attack created by iteratively applying small deformations to the image, found through a gradient descent step. We demonstrate our results on MNIST with a convolutional neural network and on ImageNet with Inceptionv3 and ResNet101. 
Adjacency Diagram  The adjacency diagram is a spacefilling variant of the nodelink diagram; rather than drawing a link between parent and child in the hierarchy, nodes are drawn as solid areas (either arcs or bars), and their placement relative to adjacent nodes reveals their position in the hierarchy. The icicle layout in figure 4D is similar to the first nodelink diagram in that the root node appears at the top, with child nodes underneath. Because the nodes are now spacefilling, however, we can use a length encoding for the size of software classes and packages. This reveals an additional dimension that would be difficult to show in a nodelink diagram. bdvis 
Adjacency Matrix  In mathematics and computer science, an adjacency matrix is a means of representing which vertices (or nodes) of a graph are adjacent to which other vertices. Another matrix representation for a graph is the incidence matrix. bentcableAR 
Adjustment of Recommendation List (ARL) 
Recommender system is a critically important tool in online commercial system and provide users with personalized recommendation on items. So far, numerous recommendation algorithms have been made to further improve the recommendation performance in a singlestep recommendation, while the longterm recommendation performance is neglected. In this paper, we proposed an approach called Adjustment of Recommendation List (ARL) to enhance the longterm recommendation accuracy. In order to observe the longterm accuracy, we developed an evolution model of network to simulate the interaction between the recommender system and user’s behaviour. The result shows that not only longterm recommendation accuracy can be enhanced significantly but the diversity of item in online system maintains healthy. Notably, an optimal parameter n* of ARL existed in longterm recommendation, indicating that there is a tradeoff between keeping diversity of item and user’s preference to maximize the longterm recommendation accuracy. Finally, we confirmed that the optimal parameter n* is stable during evolving network, which reveals the robustness of ARL method. 
Admixture Models  The core of the admixture model, proposed by Smith (1953) to deal with heterogeneous traits in genetic linkage analysis, is the hypothesis that a fraction lambda of pedigrees is linked to the marker locus being tested while a fraction 1 – lambda is unlinked. This model leads naturally to the vexing problem of testing for a mixture and has given rise to a large literature. bipartite,lpbrim 
ADNet  Online video advertising gives content providers the ability to deliver compelling content, reach a growing audience, and generate additional revenue from online media. Recently, advertising strategies are designed to look for original advert(s) in a video frame, and replacing them with new adverts. These strategies, popularly known as product placement or embedded marketing, greatly help the marketing agencies to reach out to a wider audience. However, in the existing literature, such detection of candidate frames in a video sequence for the purpose of advert integration, is done manually. In this paper, we propose a deeplearning architecture called ADNet, that automatically detects the presence of advertisements in video frames. Our approach is the first of its kind that automatically detects the presence of adverts in a video frame, and achieves stateoftheart results on a public dataset. 
Adometry  Adometry by Google solves the complex challenge of integrating, measuring, and optimizing marketing data across all channels – both online and offline – so you can generate actionable insights that improve ROI. BlandAltmanLeh 
ADSaS  Since with massive data growth, the need for autonomous and generic anomaly detection system is increased. However, developing one standalone generic anomaly detection system that is accurate and fast is still a challenge. In this paper, we propose conventional timeseries analysis approaches, the Seasonal Autoregressive Integrated Moving Average (SARIMA) model and Seasonal Trend decomposition using Loess (STL), to detect complex and various anomalies. Usually, SARIMA and STL are used only for stationary and periodic timeseries, but by combining, we show they can detect anomalies with high accuracy for data that is even noisy and nonperiodic. We compared the algorithm to Long Short Term Memory (LSTM), a deeplearningbased algorithm used for anomaly detection system. We used a total of seven realworld datasets and four artificial datasets with different timeseries properties to verify the performance of the proposed algorithm. 
Advanced Analysis of Variance (AANOVA) 
Book: Advanced Analysis of Variance 
Advanced Analytics  There is an increasing use of the term advanced analytics, typically used to describe the technical aspects of analytics, especially predictive modeling, machine learning techniques, and neural networks. BLCOP 
Advanced LSTM (ALSTM) 
Long shortterm memory (LSTM) is normally used in recurrent neural network (RNN) as basic recurrent unit. However,conventional LSTM assumes that the state at current time step depends on previous time step. This assumption constraints the time dependency modeling capability. In this study, we propose a new variation of LSTM, advanced LSTM (ALSTM), for better temporal context modeling. We employ ALSTM in weighted pooling RNN for emotion recognition. The ALSTM outperforms the conventional LSTM by 5.5% relatively. The ALSTM based weighted pooling RNN can also complement the stateoftheart emotion classification framework. This shows the advantage of ALSTM. 
Advanced Supervised Principal Component Analysis (ASPCA) 
We present a straightforward noniterative method for shallowing of deep Convolutional Neural Network (CNN) by combination of several layers of CNNs with Advanced Supervised Principal Component Analysis (ASPCA) of their outputs. We tested this new method on a practically important case of `friendorfoe’ face recognition. This is the backyard dog problem: the dog should (i) distinguish the members of the family from possible strangers and (ii) identify the members of the family. Our experiments revealed that the method is capable of drastically reducing the depth of deep learning CNNs, albeit at the cost of mild performance deterioration. 
AdvBNN  We present a new algorithm to train a robust neural network against adversarial attacks. Our algorithm is motivated by the following two ideas. First, although recent work has demonstrated that fusing randomness can improve the robustness of neural networks (Liu 2017), we noticed that adding noise blindly to all the layers is not the optimal way to incorporate randomness. Instead, we model randomness under the framework of Bayesian Neural Network (BNN) to formally learn the posterior distribution of models in a scalable way. Second, we formulate the minimax problem in BNN to learn the best model distribution under adversarial attacks, leading to an adversarialtrained Bayesian neural net. Experiment results demonstrate that the proposed algorithm achieves stateoftheart performance under strong attacks. On CIFAR10 with VGG network, our model leads to 14\% accuracy improvement compared with adversarial training (Madry 2017) and random selfensemble (Liu 2017) under PGD attack with $0.035$ distortion, and the gap becomes even larger on a subset of ImageNet. 
ADVENT  Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at runtime. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixelwise predictions. To this end, we propose two novel, complementary methods using (i) entropy loss and (ii) adversarial loss respectively. We demonstrate stateoftheart performance in semantic segmentation on two challenging ‘synthetic2real’ setups and show that the approach can also be used for detection. 
AdvEntuRe  We consider the problem of learning textual entailment models with limited supervision (5K10K training examples), and present two complementary approaches for it. First, we propose knowledgeguided adversarial example generators for incorporating large lexical resources in entailment models via only a handful of rule templates. Second, to make the entailment model – a discriminator – more robust, we propose the first GANstyle approach for training it using a natural language example generator that iteratively adjusts based on the discriminator’s performance. We demonstrate effectiveness using two entailment datasets, where the proposed methods increase accuracy by 4.7% on SciTail and by 2.8% on a 1% training subsample of SNLI. Notably, even a single handwritten rule, negate, improves the accuracy on the negation examples in SNLI by 6.1%. 
Adversarial Clustering  Nowadays more and more data are gathered for detecting and preventing cyber attacks. In cyber security applications, data analytics techniques have to deal with active adversaries that try to deceive the data analytics models and avoid being detected. The existence of such adversarial behavior motivates the development of robust and resilient adversarial learning techniques for various tasks. Most of the previous work focused on adversarial classification techniques, which assumed the existence of a reasonably large amount of carefully labeled data instances. However, in practice, labeling the data instances often requires costly and timeconsuming human expertise and becomes a significant bottleneck. Meanwhile, a large number of unlabeled instances can also be used to understand the adversaries’ behavior. To address the above mentioned challenges, in this paper, we develop a novel grid based adversarial clustering algorithm. Our adversarial clustering algorithm is able to identify the core normal regions, and to draw defensive walls around the centers of the normal objects utilizing game theoretic ideas. Our algorithm also identifies subclusters of attack objects, the overlapping areas within clusters, and outliers which may be potential anomalies. 
Adversarial Complementary Learning (ACoL) 
In this work, we propose Adversarial Complementary Learning (ACoL) to automatically localize integral objects of semantic interest with weak supervision. We first mathematically prove that class localization maps can be obtained by directly selecting the classspecific feature maps of the last convolutional layer, which paves a simple way to identify object regions. We then present a simple network architecture including two parallelclassifiers for object localization. Specifically, we leverage one classification branch to dynamically localize some discriminative object regions during the forward pass. Although it is usually responsive to sparse parts of the target objects, this classifier can drive the counterpart classifier to discover new and complementary object regions by erasing its discovered regions from the feature maps. With such an adversarial learning, the two parallelclassifiers are forced to leverage complementary object regions for classification and can finally generate integral object localization together. The merits of ACoL are mainly twofold: 1) it can be trained in an endtoend manner; 2) dynamically erasing enables the counterpart classifier to discover complementary object regions more effectively. We demonstrate the superiority of our ACoL approach in a variety of experiments. In particular, the Top1 localization error rate on the ILSVRC dataset is 45.14%, which is the new stateoftheart. 
Adversarial Constraint Learning  Constraintbased learning reduces the burden of collecting labels by having users specify general properties of structured outputs, such as constraints imposed by physical laws. We propose a novel framework for simultaneously learning these constraints and using them for supervision, bypassing the difficulty of using domain expertise to manually specify constraints. Learning requires a blackbox simulator of structured outputs, which generates valid labels, but need not model their corresponding inputs or the inputlabel relationship. At training time, we constrain the model to produce outputs that cannot be distinguished from simulated labels by adversarial training. Providing our framework with a small number of labeled inputs gives rise to a new semisupervised structured prediction model; we evaluate this model on multiple tasks — tracking, pose estimation and time series prediction — and find that it achieves high accuracy with only a small number of labeled inputs. In some cases, no labels are required at all. 
Adversarial Contrastive Estimation  Learning by contrasting positive and negative samples is a general strategy adopted by many methods. Noise contrastive estimation (NCE) for word embeddings and translating embeddings for knowledge graphs are examples in NLP employing this approach. In this work, we view contrastive learning as an abstraction of all such methods and augment the negative sampler into a mixture distribution containing an adversarially learned sampler. The resulting adaptive sampler finds harder negative examples, which forces the main model to learn a better representation of the data. We evaluate our proposal on learning word embeddings, order embeddings and knowledge graph embeddings and observe both faster convergence and improved results on multiple metrics. 
Adversarial Data Programming (ADP) 
Paucity of large curated handlabeled training data for every domainofinterest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in nearconstant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label has given a set of weak labeling functions. We validated our method on the MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many stateoftheart models. We conducted extensive experiments to study its usefulness, as well as showed how the proposed ADP framework can be used for transfer learning as well as multitask learning, where data from two domains are generated simultaneously using the framework along with the label information. Our future work will involve understanding the theoretical implications of this new framework from a gametheoretic perspective, as well as explore the performance of the method on more complex datasets. 
Adversarial Discriminative Domain Generalization (ADDoG) 
Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To address this, recent work has focused on adversarial methods to find more generalized representations of emotional speech. However, many of these methods have issues converging, and only involve datasets collected in laboratory conditions. In this paper, we introduce Adversarial Discriminative Domain Generalization (ADDoG), which follows an easier to train ‘meet in the middle’ approach. The model iteratively moves representations learned for each dataset closer to one another, improving crossdataset generalization. We also introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed method to more than two datasets, simultaneously. Our results show consistent convergence for the introduced methods, with significantly improved results when not using labels from the target dataset. We also show how, in most cases, ADDoG and MADDoG can be used to improve upon baseline stateoftheart methods when target dataset labels are added and inthewild data are considered. Even though our experiments focus on crosscorpus speech emotion, these methods could be used to remove unwanted factors of variation in other settings. 
Adversarial Dropout  Successful application processing sequential data, such as text and speech, requires an improved generalization performance of recurrent neural networks (RNNs). Dropout techniques for RNNs were introduced to respond to these demands, but we conjecture that the dropout on RNNs could have been improved by adopting the adversarial concept. This paper investigates ways to improve the dropout for RNNs by utilizing intentionally generated dropout masks. Specifically, the guided dropout used in this research is called as adversarial dropout, which adversarially disconnects neurons that are dominantly used to predict correct targets over time. Our analysis showed that our regularizer, which consists of a gap between the original and the reconfigured RNNs, was the upper bound of the gap between the training and the inference phases of the random dropout. We demonstrated that minimizing our regularizer improved the effectiveness of the dropout for RNNs on sequential MNIST tasks, semisupervised text classification tasks, and language modeling tasks. 
Adversarial Dual Autoencoder (ADAE) 
Semisupervised and unsupervised Generative Adversarial Networks (GAN)based methods have been gaining popularity in anomaly detection task recently. However, GAN training is somewhat challenging and unstable. Inspired from previous work in GANbased image generation, we introduce a GANbased anomaly detection framework – Adversarial Dual Autoencoders (ADAE) – consists of two autoencoders as generator and discriminator to increase training stability. We also employ discriminator reconstruction error as anomaly score for better detection performance. Experiments across different datasets of varying complexity show strong evidence of a robust model that can be used in different scenarios, one of which is brain tumor detection. 
Adversarial Eliminating With GAN (AEGAN) 
Although Neural networks could achieve stateoftheart performance while recongnizing images, they often suffer a tremendous defeat from adversarial examples–inputs generated by utilizing imperceptible but intentional perturbations to samples from the datasets. How to defense against adversarial examples is an important problem which is well worth to research. So far, only two wellknown methods adversarial training and defensive distillation have provided a significant defense. In contrast to existing methods mainly based on model itself, we address the problem purely based on the adversarial examples itself. In this paper, a novel idea and the first framework based Generative Adversarial Nets named AEGAN capable of resisting adversarial examples are proposed. Extensive experiments on benchmark datasets indicate that AEGAN is able to defense against adversarial examples effectively. 
Adversarial Erasing Embedding Network With the Guidance of HighOrder Attributes (AEENHOA) 
In this paper, an adversarial erasing embedding network with the guidance of highorder attributes (AEENHOA) is proposed for going further to solve the challenging ZSL/GZSL task. AEENHOA consists of two branches, i.e., the upper stream is capable of erasing some initially discovered regions, then the highorder attribute supervision is incorporated to characterize the relationship between the class attributes. Meanwhile, the bottom stream is trained by taking the current background regions to train the same attribute. As far as we know, it is the first time of introducing the erasing operations into the ZSL task. In addition, we first propose a class attribute activation map for the visualization of ZSL output, which shows the relationship between class attribute feature and attention map. Experiments on four standard benchmark datasets demonstrate the superiority of AEENHOA framework. 
Adversarial Feature Genome (AFG) 
Convolutional neural networks (CNNs) are easily spoofed by adversarial examples which lead to wrong classification result. Most of the oneway defense methods focus only on how to improve the robustness of a CNN or to identify adversarial examples. They are incapable of identifying and correctly classifying adversarial examples simultaneously due to the lack of an effective way to quantitatively represent changes in the characteristics of the sample within the network. We find that adversarial examples and original ones have diverse representation in the feature space. Moreover, this difference grows as layers go deeper, which we call Adversarial Feature Separability (AFS). Inspired by AFS, we propose an Adversarial Feature Genome (AFG) based adversarial examples defense framework which can detect adversarial examples and correctly classify them into original category simultaneously. First, we extract the representations of adversarial examples and original ones with labels by the group visualization method. Then, we encode the representations into the feature database AFG. Finally, we model adversarial examples recognition as a multilabel classification or prediction problem by training a CNN for recognizing adversarial examples and original examples on the AFG. Experiments show that the proposed framework can not only effectively identify the adversarial examples in the defense process, but also correctly classify adversarial examples with mean accuracy up to 63\%. Our framework potentially gives a new perspective, i.e. datadriven way, to adversarial examples defense. We believe that adversarial examples defense research may benefit from a large scale AFG database which is similar to ImageNet. The database and source code can be visited at https://…/Adversarial_Feature_Genome. 
Adversarial Feature Learning (AFL) 
The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing generators learn to ‘linearize semantics’ in the latent space of such models. Intuitively, such latent spaces may serve as useful feature representations for auxiliary problems where semantics are relevant. However, in their existing form, GANs have no means of learning the inverse mapping — projecting data back into the latent space. We propose Bidirectional Generative Adversarial Networks (BiGANs) as a means of learning this inverse mapping, and demonstrate that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and selfsupervised feature learning. 
Adversarial Gain  Adversarial examples can be defined as inputs to a model which induce a mistake – where the model output is different than that of an oracle, perhaps in surprising or malicious ways. Original models of adversarial attacks are primarily studied in the context of classification and computer vision tasks. While several attacks have been proposed in natural language processing (NLP) settings, they often vary in defining the parameters of an attack and what a successful attack would look like. The goal of this work is to propose a unifying model of adversarial examples suitable for NLP tasks in both generative and classification settings. We define the notion of adversarial gain: based in control theory, it is a measure of the change in the output of a system relative to the perturbation of the input (caused by the socalled adversary) presented to the learner. This definition, as we show, can be used under different feature spaces and distance conditions to determine attack or defense effectiveness across different intuitive manifolds. This notion of adversarial gain not only provides a useful way for evaluating adversaries and defenses, but can act as a building block for future work in robustness under adversaries due to its rooted nature in stability and manifold theory. 
Adversarial Gated Network (Gated GAN) 
Style transfer describes the rendering of an image semantic content as different artistic styles. Recently, generative adversarial networks (GANs) have emerged as an effective approach in style transfer by adversarially training the generator to synthesize convincing counterfeits. However, traditional GAN suffers from the mode collapse issue, resulting in unstable training and making style transfer quality difficult to guarantee. In addition, the GAN generator is only compatible with one style, so a series of GANs must be trained to provide users with choices to transfer more than one kind of style. In this paper, we focus on tackling these challenges and limitations to improve style transfer. We propose adversarial gated networks (Gated GAN) to transfer multiple styles in a single model. The generative networks have three modules: an encoder, a gated transformer, and a decoder. Different styles can be achieved by passing input images through different branches of the gated transformer. To stabilize training, the encoder and decoder are combined as an autoencoder to reconstruct the input images. The discriminative networks are used to distinguish whether the input image is a stylized or genuine image. An auxiliary classifier is used to recognize the style categories of transferred images, thereby helping the generative networks generate images in multiple styles. In addition, Gated GAN makes it possible to explore a new style by investigating styles learned from artists or genres. Our extensive experiments demonstrate the stability and effectiveness of the proposed model for multistyle transfer. 
Adversarial Generalized Method of Moments  We provide an approach for learning deep neural net representations of models described via conditional moment restrictions. Conditional moment restrictions are widely used, as they are the language by which social scientists describe the assumptions they make to enable causal inference. We formulate the problem of estimating the underling model as a zerosum game between a modeler and an adversary and apply adversarial training. Our approach is similar in nature to Generative Adversarial Networks (GAN), though here the modeler is learning a representation of a function that satisfies a continuum of moment conditions and the adversary is identifying violating moments. We outline ways of constructing effective adversaries in practice, including kernels centered by kmeans clustering, and random forests. We examine the practical performance of our approach in the setting of nonparametric instrumental variable regression. 
Adversarial GeneratorEncoder Networks  We present a new autoencodertype architecture, that is trainable in an unsupervised mode, sustains both generation and inference, and has the quality of conditional and unconditional samples boosted by adversarial learning. Unlike previous hybrids of autoencoders and adversarial networks, the adversarial game in our approach is set up directly between the encoder and the generator, and no external mappings are trained in the process of learning. The game objective compares the divergences of each of the real and the generated data distributions with the canonical distribution in the latent space. We show that direct generatorvsencoder game leads to a tight coupling of the two components, resulting in samples and reconstructions of a comparable quality to some recentlyproposed more complex architectures. 
Adversarial Graphical Model (AGM) 
In many structured prediction problems, complex relationships between variables are compactly defined using graphical structures. The most prevalent graphical prediction methods—probabilistic graphical models and large margin methods—have their own distinct strengths but also possess significant drawbacks. Conditional random fields (CRFs) are Fisher consistent, but they do not permit integration of customized loss metrics into their learning process. Largemargin models, such as structured support vector machines (SSVMs), have the flexibility to incorporate customized loss metrics, but lack Fisher consistency guarantees. We present adversarial graphical models (AGM), a distributionally robust approach for constructing a predictor that performs robustly for a class of data distributions defined using a graphical structure. Our approach enjoys both the flexibility of incorporating customized loss metrics into its design as well as the statistical guarantee of Fisher consistency. We present exact learning and prediction algorithms for AGM with time complexity similar to existing graphical models and show the practical benefits of our approach with experiments. 
Adversarial Hashing Network (HashGAN) 
As the rapid growth of multimodal data, hashing methods for crossmodal retrieval have received considerable attention. Deepnetworksbased crossmodal hashing methods are appealing as they can integrate feature learning and hash coding into endtoend trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multimodal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multimodal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multimodal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other stateoftheart crossmodal hashing methods. 
Adversarial Label Learning  We consider the task of training classifiers without labels. We propose a weakly supervised method—adversarial label learning—that trains classifiers to perform well against an adversary that chooses labels for training data. The weak supervision constrains what labels the adversary can choose. The method therefore minimizes an upper bound of the classifier’s error rate using projected primaldual subgradient descent. Minimizing this bound protects against bias and dependencies in the weak supervision. Experiments on three real datasets show that our method can train without labels and outperforms other approaches for weakly supervised learning. 
Adversarial Machine Learning (AML) 
Adversarial machine learning is the formal name for studying what happens when conceding even a slightly more realistic alternative to assumptions of these types (harmlessly called ‘relaxing assumptions’ …. Adversarial Machine Learning 
ADversarial MetaLearner (ADML) 
Metalearning enables a model to learn from very limited data to undertake a new task. In this paper, we study the general metalearning with adversarial samples. We present a metalearning algorithm, ADML (ADversarial MetaLearner), which leverages clean and adversarial samples to optimize the initialization of a learning model in an adversarial manner. ADML leads to the following desirable properties: 1) it turns out to be very effective even in the cases with only clean samples; 2) it is modelagnostic, i.e., it is compatible with any learning model that can be trained with gradient descent; and most importantly, 3) it is robust to adversarial samples, i.e., unlike other metalearning methods, it only leads to a minor performance degradation when there are adversarial samples. We show via extensive experiments that ADML delivers the stateoftheart performance on two widelyused image datasets, MiniImageNet and CIFAR100, in terms of both accuracy and robustness. 
Adversarial Metric Learning (AML) 
In the past decades, intensive efforts have been put to design various loss functions and metric forms for metric learning problem. These improvements have shown promising results when the test data is similar to the training data. However, the trained models often fail to produce reliable distances on the ambiguous test pairs due to the distribution bias between training set and test set. To address this problem, the Adversarial Metric Learning (AML) is proposed in this paper, which automatically generates adversarial pairs to remedy the distribution bias and facilitate robust metric learning. Specifically, AML consists of two adversarial stages, i.e. confusion and distinguishment. In confusion stage, the ambiguous but critical adversarial data pairs are adaptively generated to mislead the learned metric. In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both the adversarial pairs and the original training pairs. Thanks to the challenges posed by the confusion stage in such competing process, the AML model is able to grasp plentiful difficult knowledge that has not been contained by the original training pairs, so the discriminability of AML can be significantly improved. The entire model is formulated into optimization framework, of which the global convergence is theoretically proved. The experimental results on toy data and practical datasets clearly demonstrate the superiority of AML to the representative stateoftheart metric learning methodologies. 
Adversarial Multimedia Recommendation (AMR) 
With the prevalence of multimedia content on the Web, developing recommender solutions that can effectively leverage the rich signal in multimedia data is in urgent need. Owing to the success of deep neural networks in representation learning, recent advance on multimedia recommendation has largely focused on exploring deep learning methods to improve the recommendation accuracy. To date, however, there has been little effort to investigate the robustness of multimedia representation and its impact on the performance of multimedia recommendation. In this paper, we shed light on the robustness of multimedia recommender system. Using the stateoftheart recommendation framework and deep image features, we demonstrate that the overall system is not robust, such that a small (but purposeful) perturbation on the input image will severely decrease the recommendation accuracy. This implies the possible weakness of multimedia recommender system in predicting user preference, and more importantly, the potential of improvement by enhancing its robustness. To this end, we propose a novel solution named Adversarial Multimedia Recommendation (AMR), which can lead to a more robust multimedia recommender model by using adversarial learning. The idea is to train the model to defend an adversary, which adds perturbations to the target image with the purpose of decreasing the model’s accuracy. We conduct experiments on two representative multimedia recommendation tasks, namely, image recommendation and visuallyaware product recommendation. Extensive results verify the positive effect of adversarial learning and demonstrate the effectiveness of our AMR method. Source codes are available in https://…/AMR. 
Adversarial Negative Sampling  In recent years, the Word2Vec model trained with the Negative Sampling loss function has shown stateoftheart results in a number of machine learning tasks, including language modeling tasks, such as word analogy and word similarity, and in recommendation tasks, through Prod2Vec, an extension that applies to modeling user shopping activity and user preferences. Several methods that aim to improve upon the standard Negative Sampling loss have been proposed. In our paper we pursue more sophisticated Negative Sampling, by leveraging ideas from the field of Generative Adversarial Networks (GANs), and propose Adversarial Negative Sampling. We build upon the recent progress made in stabilizing the training objective of GANs in the discrete data setting, and introduce a new GANWord2Vec model.We evaluate our model on the task of basket completion, and show significant improvements in performance over Word2Vec trained using standard loss functions, including Noise Contrastive Estimation and Negative Sampling. 
Adversarial Network Embedding  Learning lowdimensional representations of networks has proved effective in a variety of tasks such as node classification, link prediction and network visualization. Existing methods can effectively encode different structural properties into the representations, such as neighborhood connectivity patterns, global structural role similarities and other highorder proximities. However, except for objectives to capture network structural properties, most of them suffer from lack of additional constraints for enhancing the robustness of representations. In this paper, we aim to exploit the strengths of generative adversarial networks in capturing latent features, and investigate its contribution in learning stable and robust graph representations. Specifically, we propose an Adversarial Network Embedding (ANE) framework, which leverages the adversarial learning principle to regularize the representation learning. It consists of two components, i.e., a structure preserving component and an adversarial learning component. The former component aims to capture network structural properties, while the latter contributes to learning robust representations by matching the posterior distribution of the latent representations to given priors. As shown by the empirical results, our method is competitive with or superior to stateoftheart approaches on benchmark network embedding tasks. 
Adversarial Noise Layer (ANL) 
In this paper, we introduce a novel regularization method called Adversarial Noise Layer (ANL), which significantly improve the CNN’s generalization ability by adding adversarial noise in the hidden layers. ANL is easy to implement and can be integrated with most of the CNNbased models. We compared the impact of the different type of noise and visually demonstrate that adversarial noise guide CNNs to learn to extract cleaner feature maps, further reducing the risk of overfitting. We also conclude that the model trained with ANL is more robust to FGSM and IFGSM attack. Code is available at: https://…/ANL 
Adversarial Nonlinear ICA (ANICA) 
Nonlinear ICA based on CramerWold metric 
Adversarial Oversampling  Cardiovascular diseases are one of the most common causes of death in the world. Prevention, knowledge of previous cases in the family, and early detection is the best strategy to reduce this fact. Different machine learning approaches to automatic diagnostic are being proposed to this task. As in most health problems, the imbalance between examples and classes is predominant in this problem and affects the performance of the automated solution. In this paper, we address the classification of heartbeats images in different cardiovascular diseases. We propose a twodimensional Convolutional Neural Network for classification after using a InfoGAN architecture for generating synthetic images to unbalanced classes. We call this proposal Adversarial Oversampling and compare it with the classical oversampling methods as SMOTE, ADASYN, and RandomOversampling. The results show that the proposed approach improves the classifier performance for the minority classes without harming the performance in the balanced classes. 
Adversarial Personalized Ranking (APR) 
Item recommendation is a personalized ranking task. To this end, many recommender systems optimize models with pairwise ranking objectives, such as the Bayesian Personalized Ranking (BPR). Using matrix Factorization (MF) — the most widely used model in recommendation — as a demonstration, we show that optimizing it with BPR leads to a recommender model that is not robust. In particular, we find that the resultant model is highly vulnerable to adversarial perturbations on its model parameters, which implies the possibly large error in generalization. To enhance the robustness of a recommender model and thus improve its generalization performance, we propose a new optimization framework, namely Adversarial Personalized Ranking (APR). In short, our APR enhances the pairwise ranking method BPR by performing adversarial training. It can be interpreted as playing a minimax game, where the minimization of the BPR objective function meanwhile defends an adversary, which adds adversarial perturbations on model parameters to maximize the BPR objective function. To illustrate how it works, we implement APR on MF by adding adversarial perturbations on the embedding vectors of users and items. Extensive experiments on three public realworld datasets demonstrate the effectiveness of APR — by optimizing MF with APR, it outperforms BPR with a relative improvement of 11.2% on average and achieves stateoftheart performance for item recommendation. Our implementation is available at: https://…/adversarial_personalized_ranking. 
Adversarial REward Learning (AREL) 
Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a littletapped problem. Different from captions, stories have more expressive language styles and contain many imaginary concepts that do not appear in the images. Thus it poses challenges to behavioral cloning algorithms. Furthermore, due to the limitations of automatic metrics on evaluating story quality, reinforcement learning methods with handcrafted rewards also face difficulties in gaining an overall performance boost. Therefore, we propose an Adversarial REward Learning (AREL) framework to learn an implicit reward function from human demonstrations, and then optimize policy search with the learned reward function. Though automatic evaluation indicates slight performance boost over stateoftheart (SOTA) methods in cloning expert behaviors, human evaluation shows that our approach achieves significant improvement in generating more humanlike stories than SOTA systems. 
Adversarial Robustness Toolbox (ART) 
Adversarial examples have become an indisputable threat to the security of modern AI systems based on deep neural networks (DNNs). The Adversarial Robustness Toolbox (ART) is a Python library designed to support researchers and developers in creating novel defence techniques, as well as in deploying practical defences of realworld AI systems. Researchers can use ART to benchmark novel defences against the stateoftheart. For developers, the library provides interfaces which support the composition of comprehensive defence systems using individual methods as building blocks. The Adversarial Robustness Toolbox supports machine learning models (and deep neural networks (DNNs) specifically) implemented in any of the most popular deep learning frameworks (TensorFlow, Keras, PyTorch). Currently, the library is primarily intended to improve the adversarial robustness of visual recognition systems, however, future releases that will comprise adaptations to other data modes (such as speech, text or time series) are envisioned. The ART source code is released (https://…/adversarialrobustnesstoolbox ) under an MIT license. The release includes code examples and extensive documentation (http://adversarialrobustnesstoolbox.readthedocs.io ) to help researchers and developers get quickly started. 
Adversarial Training (AT) 

Adversarial Transferring on Generative Adversarial Net (ATGAN) 
Recent studies have discovered the vulnerability of Deep Neural Networks (DNNs) to adversarial examples, which are imperceptible to humans but can easily fool DNNs. Existing methods for crafting adversarial examples are mainly based on adding smallmagnitude perturbations to the original images so that the generated adversarial examples are constrained by the benign examples within a small matrix norm. In this work, we propose a new attack method called ATGAN that directly generates the adversarial examples from random noise using generative adversarial nets (GANs). The key idea is to transfer a pretrained GAN to generate adversarial examples for the target classifier to be attacked. Once the model is transferred for attack, ATGAN can generate diverse adversarial examples efficiently, making it helpful to potentially accelerate the adversarial training on defenses. We evaluate ATGAN in both semiwhitebox and blackbox settings under typical defense methods on the MNIST handwritten digit database. Empirical comparisons with existing attack baselines demonstrate that ATGAN can achieve a higher attack success rate. 
Adversarial Transformation Network (ATN) 
Time series classification models have been garnering significant importance in the research community. However, not much research has been done on generating adversarial samples for these models. These adversarial samples can become a security concern. In this paper, we propose utilizing an adversarial transformation network (ATN) on a distilled model to attack various time series classification models. The proposed attack on the classification model utilizes a distilled model as a surrogate that mimics the behavior of the attacked classical time series classification models. Our proposed methodology is applied onto 1Nearest Neighbor Dynamic Time Warping (1NN ) DTW, a Fully Connected Network and a Fully Convolutional Network (FCN), all of which are trained on 43 University of California Riverside (UCR) datasets. In this paper, we show both models were susceptible to attacks on all 43 datasets. To the best of our knowledge, such an attack on time series classification models has never been done before. Finally, we recommend future researchers that develop time series classification models to incorporating adversarial data samples into their training data sets to improve resilience on adversarial samples and to consider model robustness as an evaluative metric. 
Adversarial Transformation Networks  Multiple different approaches of generating adversarial examples have been proposed to attack deep neural networks. These approaches involve either directly computing gradients with respect to the image pixels, or directly solving an optimization on the image pixels. In this work, we present a fundamentally new method for generating adversarial examples that is fast to execute and provides exceptional diversity of output. We efficiently train feedforward neural networks in a selfsupervised manner to generate adversarial examples against a target network or set of networks. We call such a network an Adversarial Transformation Network (ATN). ATNs are trained to generate adversarial examples that minimally modify the classifier’s outputs given the original input, while constraining the new classification to match an adversarial target class. We present methods to train ATNs and analyze their effectiveness targeting a variety of MNIST classifiers as well as the latest stateoftheart ImageNet classifier Inception ResNet v2. 
Adversarial Variational Inference and Learning (AVIL) 
Markov random fields (MRFs) find applications in a variety of machine learning areas, while the inference and learning of such models are challenging in general. In this paper, we propose the Adversarial Variational Inference and Learning (AVIL) algorithm to solve the problems with a minimal assumption about the model structure of an MRF. AVIL employs two variational distributions to approximately infer the latent variables and estimate the partition function, respectively. The variational distributions, which are parameterized as neural networks, provide an estimate of the negative log likelihood of the MRF. On one hand, the estimate is in an intuitive form of approximate contrastive free energy. On the other hand, the estimate is a minimax optimization problem, which is solved by stochastic gradient descent in an alternating manner. We apply AVIL to various undirected generative models in a fully blackbox manner and obtain better results than existing competitors on several real datasets. 
Adversarially Learned Anomaly Detection (ALAD) 
Anomaly detection is a significant and hence wellstudied problem. However, developing effective anomaly detection methods for complex and highdimensional data remains a challenge. As Generative Adversarial Networks (GANs) are able to model the complex highdimensional distributions of realworld data, they offer a promising approach to address this challenge. In this work, we propose an anomaly detection method, Adversarially Learned Anomaly Detection (ALAD) based on bidirectional GANs, that derives adversarially learned features for the anomaly detection task. ALAD then uses reconstruction errors based on these adversarially learned features to determine if a data sample is anomalous. ALAD builds on recent advances to ensure dataspace and latentspace cycleconsistencies and stabilize GAN training, which results in significantly improved anomaly detection performance. ALAD achieves stateoftheart performance on a range of image and tabular datasets while being several hundredfold faster at test time than the only published GANbased method. 
Adversarially Learned Mixture Model (AMM) 
The Adversarially Learned Mixture Model (AMM) is a generative model for unsupervised or semisupervised data clustering. The AMM is the first adversarially optimized method to model the conditional dependence between inferred continuous and categorical latent variables. Experiments on the MNIST and SVHN datasets show that the AMM allows for semantic separation of complex data when little or no labeled data is available. The AMM achieves a stateoftheart unsupervised clustering error rate of 2.86% on the MNIST dataset. A semisupervised extension of the AMM yields competitive results on the SVHN dataset. 
Adversarially Robust Distillation (ARD) 
Knowledge distillation is effective for producing small highperformance neural networks for classification, but these small networks are vulnerable to adversarial attacks. We first study how robustness transfers from robust teacher to student network during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images. Second, we introduce Adversarially Robust Distillation (ARD) for distilling robustness onto small student networks. ARD is an analogue of adversarial training but for distillation. In addition to producing small models with high test accuracy like conventional distillation, ARD also passes the superior robustness of large networks onto the student. In our experiments, we find that ARD student models decisively outperform adversarially trained networks of identical architecture on robust accuracy. Finally, we adapt recent fast adversarial training methods to ARD for accelerated robust distillation. 
AdversariallyTrained Normalized NoisyFeature AutoEncoder (ATNNFAE) 
This article proposes AdversariallyTrained Normalized NoisyFeature AutoEncoder (ATNNFAE) for bytelevel text generation. An ATNNFAE consists of an autoencoder where the internal code is normalized on the unit sphere and corrupted by additive noise. Simultaneously, a replica of the decoder (sharing the same parameters as the AE decoder) is used as the generator and fed with random latent vectors. An adversarial discriminator is trained to distinguish training samples reconstructed from the AE from samples produced through the randominput generator, making the entire generatordiscriminator path differentiable for discrete data like text. The combined effect of noise injection in the code and shared weights between the decoder and the generator can prevent the mode collapsing phenomenon commonly observed in GANs. Since perplexity cannot be applied to nonsequential text generation, we propose a new evaluation method using the total variance distance between frequencies of hashcoded bytelevel ngrams (NGTVD). NGTVD is a single benchmark that can characterize both the quality and the diversity of the generated texts. Experiments are offered in 6 largescale datasets in Arabic, Chinese and English, with comparisons against ngram baselines and recurrent neural networks (RNNs). Ablation study on both the noise level and the discriminator is performed. We find that RNNs have trouble competing with the ngram baselines, and the ATNNFAE results are generally competitive. 
AdversarialNeural Topic Model (ATM) 
Topic models are widely used for thematic structure discovery in text. But traditional topic models often require dedicated inference procedures for specific tasks at hand. Also, they are not designed to generate wordlevel semantic representations. To address these limitations, we propose a topic modeling approach based on Generative Adversarial Nets (GANs), called Adversarialneural Topic Model (ATM). The proposed ATM models topics with Dirichlet prior and employs a generator network to capture the semantic patterns among latent topics. Meanwhile, the generator could also produce wordlevel semantic representations. To illustrate the feasibility of porting ATM to tasks other than topic modeling, we apply ATM for open domain event extraction. Our experimental results on the two public corpora show that ATM generates more coherence topics, outperforming a number of competitive baselines. Moreover, ATM is able to extract meaningful events from news articles. 
Adversary Model  In computer science, an online algorithm measures its competitiveness against different adversary models. For deterministic algorithms, the adversary is the same, the adaptive offline adversary. For randomized online algorithms competitiveness can depend upon the adversary model used. BMA,dga 
advertorch  advertorch is a toolbox for adversarial robustness research. It contains various implementations for attacks, defenses and robust training methods. advertorch is built on PyTorch (Paszke et al., 2017), and leverages the advantages of the dynamic computational graph to provide concise and efficient reference implementations. The code is licensed under the LGPL license and is open sourced at https://…/advertorch . 
Adviser Problem  Humans have an unparalleled visual intelligence and can overcome visual ambiguities that machines currently cannot. Recent works have shown that incorporating guidance from humans during inference for realworld, challenging tasks like viewpointestimation and finegrained classification, can help overcome difficult cases in which the computeralone would have otherwise failed. These hybrid intelligence approaches are hence gaining traction. However, deciding what question to ask the human in the loop at inference time remains an unknown for these problems. We address this question by formulating it as what we call the Adviser Problem: can we learn a mapping from the input to a specific question to ask the human in the loop so as to maximize the expected positive impact to the overall task? We formulate a solution to the adviser problem using a deep network and apply it to the viewpoint estimation problem where the question asks for the location of a specific keypoint in the input image. We show that by using the keypoint guidance from the Adviser Network and the human, the model is able to outperform the previous hybridintelligence stateoftheart by 3.27%, and outperform the computeronly stateoftheart by 10.44% absolute. 
AE1WELM  In this paper, we propose a novel approach based on costsensitive ensemble weighted extreme learning machine; we call this approach AE1WELM. We apply this approach to text classification. AE1WELM is an algorithm including balanced and imbalanced multiclassification for text classification. Weighted ELM assigning the different weights to the different samples improves the classification accuracy to a certain extent, but weighted ELM considers the differences between samples in the different categories only and ignores the differences between samples within the same categories. We measure the importance of the documents by the sample information entropy, and generate costsensitive matrix and factor based on the document importance, then embed the costsensitive weighted ELM into the AdaBoost.M1 framework seamlessly. Vector space model(VSM) text representation produces the high dimensions and sparse features which increase the burden of ELM. To overcome this problem, we develop a text classification framework combining the word vector and AE1WELM. The experimental results show that our method provides an accurate, reliable and effective solution for text classification. 
AEQUITAS  Recent work has raised concerns on the risk of unintended bias in algorithmic decision making systems being used nowadays that can affect individuals unfairly based on race, gender or religion, among other possible characteristics. While a lot of bias metrics and fairness definitions have been proposed in recent years, there is no consensus on which metric/definition should be used and there are very few available resources to operationalize them. Therefore, despite recent awareness, auditing for bias and fairness when developing and deploying algorithmic decision making systems is not yet a standard practice. We present Aequitas, an open source bias and fairness audit toolkit that is an intuitive and easy to use addition to the machine learning workflow, enabling users to seamlessly test models for several bias and fairness metrics in relation to multiple population subgroups. We believe Aequitas will facilitate informed and equitable decisions around developing and deploying algorithmic decision making systems for both data scientists, machine learning researchers and policymakers. 
aESIM  Attention mechanism has been proven effective on natural language processing. This paper proposes an attention boosted natural language inference model named aESIM by adding word attention and adaptive directionoriented attention mechanisms to the traditional BiLSTM layer of natural language inference models, e.g. ESIM. This makes the inference model aESIM has the ability to effectively learn the representation of words and model the local subsentential inference between pairs of premise and hypothesis. The empirical studies on the SNLI, MultiNLI and Quora benchmarks manifest that aESIM is superior to the original ESIM model. 
AFELREC  In this paper, we present preliminary results of AFELREC, a recommender system for social learning environments. AFELREC is build upon a scalable software architecture to provide recommendations of learning resources in near realtime. Furthermore, AFELREC can cope with any kind of data that is present in social learning environments such as resource metadata, user interactions or social tags. We provide a preliminary evaluation of three recommendation use cases implemented in AFELREC and we find that utilizing social data in form of tags is helpful for not only improving recommendation accuracy but also coverage. This paper should be valuable for both researchers and practitioners interested in providing resource recommendations in social learning environments. 
Affective Computing  Affective computing (sometimes called artificial emotional intelligence, or emotion AI) is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science. While the origins of the field may be traced as far back as to early philosophical inquiries into emotion, the more modern branch of computer science originated with Rosalind Picard’s 1995 paper on affective computing. A motivation for the research is the ability to simulate empathy. The machine should interpret the emotional state of humans and adapt its behavior to them, giving an appropriate response to those emotions. The difference between sentiment analysis and affective analysis is that the latter detects the different emotions instead of identifying only the polarity of the phrase. 
AffectLM  Human verbal communication includes affective messages which are conveyed through use of emotionally colored words. There has been a lot of research in this direction but the problem of integrating stateoftheart neural language models with affective information remains an area ripe for exploration. In this paper, we propose an extension to an LSTM (Long ShortTerm Memory) language model for generating conversational text, conditioned on affect categories. Our proposed model, AffectLM enables us to customize the degree of emotional content in generated sentences through an additional design parameter. Perception studies conducted using Amazon Mechanical Turk show that AffectLM generates naturally looking emotional sentences without sacrificing grammatical correctness. AffectLM also learns affectdiscriminative word representations, and perplexity experiments show that additional affective information in conversational text can improve language model prediction. 
Affine Forward Variance (AFV) 
We introduce the class of affine forward variance (AFV) models of which both the conventional Heston model and the rough Heston model are special cases. We show that AFV models can be characterized by the affine form of their cumulant generating function, which can be obtained as solution of a convolution Riccati equation. We further introduce the class of affine forward order flow intensity (AFI) models, which are structurally similar to AFV models, but driven by jump processes, and which include Hawkestype models. We show that the cumulant generating function of an AFI model satisfies a generalized convolution Riccati equation and that a highfrequency limit of AFI models converges in distribution to the AFV model. 
Affine Variational Autoencoder (AVAE) 
In this study, we propose the Affine Variational Autoencoder (AVAE), a variant of Variational Autoencoder (VAE) designed to improve robustness by overcoming the inability of VAEs to generalize to distributional shifts in the form of affine perturbations. By optimizing an affine transform to maximize ELBO, the proposed AVAE transforms an input to the training distribution without the need to increase model complexity to model the full distribution of affine transforms. In addition, we introduce a training procedure to create an efficient model by learning a subset of the training distribution, and using the AVAE to improve generalization and robustness to distributional shift at test time. Experiments on affine perturbations demonstrate that the proposed AVAE significantly improves generalization and robustness to distributional shift in the form of affine perturbations without an increase in model complexity. 
Affinity  In the field of social networking services, finding similar users based on profile data is common practice. Smartphones harbor sensor and personal context data that can be used for user profiling. Yet, one vast source of personal data, that is text messaging data, has hardly been studied for user profiling. We see three reasons for this: First, private text messaging data is not shared due to their intimate character. Second, the definition of an appropriate privacypreserving similarity measure is nontrivial. Third, assessing the quality of a similarity measure on text messaging data representing a potentially infinite set of topics is nontrivial. In order to overcome these obstacles we propose affinity, a system that assesses the similarity between text messaging histories of users reliably and efficiently in a privacypreserving manner. Private texting data stays on user devices and data for comparison is compared in a latent format that neither allows to reconstruct the comparison words nor any original private plain text. We evaluate our approach by calculating similarities between Twitter histories of 60 US senators. The resulting similarity network reaches an average 85.0% accuracy on a political party classification task. 
Affinity Analysis  Affinity analysis is a data analysis and data mining technique that discovers cooccurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of crossselling and upselling, in addition to influencing sales promotions, loyalty programs, store design, and discount plans. ➘ “Market Basket Analysis” bnclassify 
agate  agate is a Python data analysis library that is optimized for humans instead of machines. It is an alternative to numpy and pandas that helps you solve realworld problems CAM 
Age of Information (AoI) 
Age of information (AoI) was introduced in the early 2010s as a notion to characterize the freshness of the knowledge a system has about a process observed remotely. AoI was shown to be a fundamentally novel metric of timeliness, significantly different, to existing ones such as delay and latency. The importance of such a tool is paramount, especially in contexts other than transport of information, since communication takes place also to control, or to compute, or to infer, and not just to reproduce messages of a source. This volume comes to present and discuss the first body of works on AoI and discuss future directions that could yield more challenging and interesting research. Age of Information in Poisson Networks On the Role of AgeofInformation in Internet of Things 
Age Period Cohort Model (APC) 
AgePeriodCohort models is a class of models for demographic rates (mortality/morbidity/fertility/…) observed for a broad age range over a reasonably long time period, and classified by age and date of followup (period) and date of birth (cohort). This type of followup can be shown in a Lexisdiagram; a coordinate system with data of followup along the xaxis, and age along the yaxis. A single persons lifetrajectory is therefore a straight line with slope 1 (as calender time and age advance at the same pace). Tabulated data enumerates the number of events and the risk time (sum of lengths of lifetrajectories) in some subsets of the Lexis diagram, usually subsets classified by age and period in equally long intervals. Individual lifelines can be shown with colouring according to states, or the diagram can just be shown to indicate what ages and periods are covered, and what subsets are used for classification of events and risk time. The AgePeriodCohort model describes the (log)rates as a sum of (nonlinear) age period and cohorteffects. The three variables age (at followup), a, period (i.e. date of followup), p, and cohort (date of birth), c, are related by a=pc – any one person’s age is calculated by subtracting the date of birth from the current date. Hence the three variables used to describe rates are linearly related, and the model can therefore be parametrized in different ways, and still produce the same estimated rates. In popular terms you can say that it is possible to move two linear effects around between the three terms, because the ageterms contains the linear effect of age, the periodterms contains the linear effect of period and the cohort effect contains the linear effect of cohort. An illustration of this phenomenon is in this little “film” of APCeffects on testis cancer rates in Denmark. All sets of estimates will yield the same set of fitted rates. CausalImpact,treatSens,MatchingFrontier 
Agent Based Model (ABM) 
An agentbased model (ABM) is a class of computational models for simulating the actions and interactions of autonomous agents (both individual or collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole. It combines elements of game theory, complex systems, emergence, computational sociology, multiagent systems, and evolutionary programming. Monte Carlo Methods are used to introduce randomness. Particularly within ecology, ABMs are also called individualbased models (IBMs), and individuals within IBMs may be simpler than than fully autonomous agents within ABMs. A review of recent literature on individualbased models, agentbased models, and multiagent systems shows that ABMs are used on noncomputing related scientific domains including biology, ecology and social science. Agentbased modeling is related to, but distinct from, the concept of multiagent systems or multiagent simulation in that the goal of ABM is to search for explanatory insight into the collective behavior of agents obeying simple rules, typically in natural systems, rather than in designing agents or solving specific practical or engineering problems. CEC 
AgentBased Computational Economics (ACE) 
Agentbased computational economics (ACE) is the area of computational economics that studies economic processes, including whole economies, as dynamic systems of interacting agents. As such, it falls in the paradigm of complex adaptive systems. In corresponding agentbased models, the ‘agents’ are ‘computational objects modeled as interacting according to rules’ over space and time, not real people. The rules are formulated to model behavior and social interactions based on incentives and information. Such rules could also be the result of optimization, realized through use of AI methods (such as Qlearning and other reinforcement learning techniques). The theoretical assumption of mathematical optimization by agents in equilibrium is replaced by the less restrictive postulate of agents with bounded rationality adapting to market forces. ACE models apply numerical methods of analysis to computerbased simulations of complex dynamic problems for which more conventional methods, such as theorem formulation, may not find ready use. Starting from initial conditions specified by the modeler, the computational economy evolves over time as its constituent agents repeatedly interact with each other, including learning from interactions. In these respects, ACE has been characterized as a bottomup culturedish approach to the study of economic systems. ACE has a similarity to, and overlap with, game theory as an agentbased method for modeling social interactions. But practitioners have also noted differences from standard methods, for example in ACE events modeled being driven solely by initial conditions, whether or not equilibria exist or are computationally tractable, and in the modeling facilitation of agent autonomy and learning. The method has benefited from continuing improvements in modeling techniques of computer science and increased computer capabilities. The ultimate scientific objective of the method is to ‘test theoretical findings against realworld data in ways that permit empirically supported theories to cumulate over time, with each researcher’s work building appropriately on the work that has gone before.’ The subject has been applied to research areas like asset pricing, competition and collaboration, transaction costs, market structure and industrial organization and dynamics, welfare economics, and mechanism design, information and uncertainty, macroeconomics, and Marxist economics. Book: Introduction to AgentBased Economics 
AgentBased Interactive Discrete Event Simulation (ABIDES) 
We introduce ABIDES, an AgentBased Interactive Discrete Event Simulation environment. ABIDES is designed from the ground up to support AI agent research in market applications. While simulations are certainly available within trading firms for their own internal use, there are no broadly available highfidelity market simulation environments. We hope that the availability of such a platform will facilitate AI research in this important area. ABIDES currently enables the simulation of tens of thousands of trading agents interacting with an exchange agent to facilitate transactions. It supports configurable pairwise network latencies between each individual agent as well as the exchange. Our simulator’s messagebased design is modeled after NASDAQ’s published equity trading protocols ITCH and OUCH. We introduce the design of the simulator and illustrate its use and configuration with sample code, validating the environment with example trading scenarios. The utility of ABIDES is illustrated through experiments to develop a market impact model. We close with discussion of future experimental problems it can be used to explore, such as the development of MLbased trading algorithms. 
AgentBased Model (ABM) 
An agentbased model (ABM) is a class of computational models for simulating the actions and interactions of autonomous agents (both individual or collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole. It combines elements of game theory, complex systems, emergence, computational sociology, multiagent systems, and evolutionary programming. Monte Carlo methods are used to introduce randomness. Particularly within ecology, ABMs are also called individualbased models (IBMs), and individuals within IBMs may be simpler than fully autonomous agents within ABMs. A review of recent literature on individualbased models, agentbased models, and multiagent systems shows that ABMs are used on noncomputing related scientific domains including biology, ecology and social science. Agentbased modeling is related to, but distinct from, the concept of multiagent systems or multiagent simulation in that the goal of ABM is to search for explanatory insight into the collective behavior of agents obeying simple rules, typically in natural systems, rather than in designing agents or solving specific practical or engineering problems. Agentbased models are a kind of microscale model that simulate the simultaneous operations and interactions of multiple agents in an attempt to recreate and predict the appearance of complex phenomena. The process is one of emergence from the lower (micro) level of systems to a higher (macro) level. As such, a key notion is that simple behavioral rules generate complex behavior. This principle, known as K.I.S.S. (‘Keep it simple, stupid’), is extensively adopted in the modeling community. Another central tenet is that the whole is greater than the sum of the parts. Individual agents are typically characterized as boundedly rational, presumed to be acting in what they perceive as their own interests, such as reproduction, economic benefit, or social status, using heuristics or simple decisionmaking rules. ABM agents may experience ‘learning’, adaptation, and reproduction. Most agentbased models are composed of: (1) numerous agents specified at various scales (typically referred to as agentgranularity); (2) decisionmaking heuristics; (3) learning rules or adaptive processes; (4) an interaction topology; and (5) an environment. ABMs are typically implemented as computer simulations, either as custom software, or via ABM toolkits, and this software can be then used to test how changes in individual behaviors will affect the system’s emerging overall behavior. 
AgentBased Modeling and Simulation (ABMS) 
➘ “Agent Based Model” AgentBased Modeling and Simulation rrepast 
AgeofInformation (AoI) 
➘ “Age of Information” 
Agglomerative Contextual Decomposition (ACD) 
Deep neural networks (DNNs) have achieved impressive predictive performance due to their ability to learn complex, nonlinear relationships between variables. However, the inability to effectively visualize these relationships has led to DNNs being characterized as black boxes and consequently limited their applications. To ameliorate this problem, we introduce the use of hierarchical interpretations to explain DNN predictions through our proposed method, agglomerative contextual decomposition (ACD). Given a prediction from a trained DNN, ACD produces a hierarchical clustering of the input features, along with the contribution of each cluster to the final prediction. This hierarchy is optimized to identify clusters of features that the DNN learned are predictive. Using examples from Stanford Sentiment Treebank and ImageNet, we show that ACD is effective at diagnosing incorrect predictions and identifying dataset bias. Through human experiments, we demonstrate that ACD enables users both to identify the more accurate of two DNNs and to better trust a DNN’s outputs. We also find that ACD’s hierarchy is largely robust to adversarial perturbations, implying that it captures fundamental aspects of the input and ignores spurious noise. 
Agglomerative Hierarchical Clustering (AHC) 
Hierarchical clustering algorithms are either topdown or bottomup. Bottomup algorithms treat each document as a singleton cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all documents. Bottomup hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC. Topdown clustering requires a method for splitting a cluster. It proceeds by splitting clusters recursively until individual documents are reached. http://…tive_Hierarchical_Clustering_Overview.htm cents 
Agglomerative InfoClustering  An agglomerative clustering of random variables is proposed, where clusters of random variables sharing the maximum amount of multivariate mutual information are merged successively to form larger clusters. Compared to the previous infoclustering algorithms, the agglomerative approach allows the computation to stop earlier when clusters of desired size and accuracy are obtained. An efficient algorithm is also derived based on the submodularity of entropy and the duality between the principal sequence of partitions and the principal sequence for submodular functions. 
Aggregate Valuation of Antecedents (AVA) 
Current approaches for explaining machine learning models fall into two distinct classes: antecedent event influence and value attribution. The former leverages training instances to describe how much influence a training point exerts on a test point, while the latter attempts to attribute value to the features most pertinent to a given prediction. In this work, we discuss an algorithm, AVA: Aggregate Valuation of Antecedents, that fuses these two explanation classes to form a new approach to feature attribution that not only retrieves local explanations but also captures global patterns learned by a model. Our experimentation convincingly favors weighting and aggregating feature attributions via AVA. 
Aggregated Learning  ➘ “AgrLearn” 
Aggregated Lexical Table (ALT) 
Correspondence Analysis on Generalised Aggregated Lexical Tables CGP 
Aggregated Wasserstein  We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time spot follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as Gaussian mixture modelHMM (GMMHMM). The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semimetric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of the states. This distance quantifies the dissimilarity of GMMHMMs by measuring both the difference between the two marginal GMMs and the difference between the two transition matrices. Our new distance is tested on the tasks of retrieval and classification of time series. Experiments on both synthetic data and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the KullbackLeibler divergence. 
Aggregation CrossEntropy (ACE) 
In this paper, we propose a novel method, aggregation crossentropy (ACE), for sequence recognition from a brand new perspective. The ACE loss function exhibits competitive performance to CTC and the attention mechanism, with much quicker implementation (as it involves only four fundamental formulas), faster inference\backpropagation (approximately O(1) in parallel), less storage requirement (no parameter and negligible runtime memory), and convenient employment (by replacing CTC with ACE). Furthermore, the proposed ACE loss function exhibits two noteworthy properties: (1) it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input and (2) it requires only characters and their numbers in the sequence annotation for supervision, which allows it to advance beyond sequence recognition, e.g., counting problem. The code is publicly available at https://…/AggregationCrossEntropy. 
Aggregation Operator  Aggregation operators are mathematical functions that are used to combine information. That is, they are used to combine N data (for example, N numerical values) in a single datum. The arithmetic mean and the weighted mean are the most wellknown aggregation operators. The median and the mode can also been as aggregation operators. The main difference between the arithmetic mean and the weighted mean is that the latter permits us to weight the different data according to their relevance. There exist a large number of different aggregation operators that differ on the assumptions on the data (data types) and about the type of information that we can incorporate in the model. For example, fuzzy integrals permits us to assign relevance to sets of information sources and not only to individual sources as for the weighted mean. chopthin 
AgileNet  The success of deep learning models is heavily tied to the use of massive amount of labeled data and excessively long training time. With the emergence of intelligent edge applications that use these models, the critical challenge is to obtain the same inference capability on a resourceconstrained device while providing adaptability to cope with the dynamic changes in the data. We propose AgileNet, a novel lightweight dictionarybased fewshot learning methodology which provides reduced complexity deep neural network for efficient execution at the edge while enabling lowcost updates to capture the dynamics of the new data. Evaluations of stateoftheart fewshot learning benchmarks demonstrate the superior accuracy of AgileNet compared to prior arts. Additionally, AgileNet is the first fewshot learning approach that prevents model updates by eliminating the knowledge obtained from the primary training. This property is ensured through the dictionaries learned by our novel endtoend structured decomposition, which also reduces the memory footprint and computation complexity to match the edge device constraints. 
Agnostic Disambiguation of Named Entities Using Linked Open Data (AGDISTIS) 
AGDISTIS is an Open Source Named Entity Disambiguation Framework able to link entities against every Linked Data Knowledge Base. The ongoing transition from the current Web of unstructured data to the Data Web yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework). One of the key steps towards extracting RDF from naturallanguage corpora is the disambiguation of named entities. AGDISTIS combines the HITS algorithm with label expansion strategies and string similarity measures. Based on this combination, it can efficiently detect the correct URIs for a given set of named entities within an input text. Furthermore, AGDISTIS is agnostic of the underlying knowledge base. AGDISTIS has been evaluated on different datasets against stateoftheart named entity disambiguation frameworks. http://…/public.pdf clusterCrit 
Agnostic Federated Learning  A key learning scenario in largescale applications is that of federated learning, where a centralized model is trained based on data originating from a large number of clients. We argue that, with the existing training and inference, federated models can be biased towards different clients. Instead, we propose a new framework of agnostic federated learning, where the centralized model is optimized for any target distribution formed by a mixture of the client distributions. We further show that this framework naturally yields a notion of fairness. We present datadependent Rademacher complexity guarantees for learning with this objective, which guide the definition of an algorithm for agnostic federated learning. We also give a fast stochastic optimization algorithm for solving the corresponding optimization problem, for which we prove convergence bounds, assuming a convex loss function and hypothesis set. We further empirically demonstrate the benefits of our approach in several datasets. Beyond federated learning, our framework and algorithm can be of interest to other learning scenarios such as cloud computing, domain adaptation, drifting, and other contexts where the training and test distributions do not coincide. 
AgreementBased Learning  Model selection is a problem that has occupied machine learning researchers for a long time. Recently, its importance has become evident through applications in deep learning. We propose an agreementbased learning framework that prevents many of the pitfalls associated with model selection. It relies on coupling the training of multiple models by encouraging them to agree on their predictions while training. In contrast with other model selection and combination approaches used in machine learning, the proposed framework is inspired by human learning. We also propose a learning algorithm defined within this framework which manages to significantly outperform alternatives in practice, and whose performance improves further with the availability of unlabeled data. Finally, we describe a number of potential directions for developing more flexible agreementbased learning algorithms. 
AgrLearn  We establish an equivalence between information bottleneck (IB) learning and an unconventional quantization problem, `IB quantization’. Under this equivalence, standard neural network models correspond to scalar IB quantizers. We prove a coding theorem for IB quantization, which implies that scalar IB quantizers are in general inferior to vector IB quantizers. This inspires us to develop a learning framework for neural networks, AgrLearn, that corresponds to vector IB quantizers. We experimentally verify that AgrLearn applied to some deep network models of current art improves upon them, while requiring less training data. With a heuristic smoothing, AgrLearn further improves its performance, resulting in new state of the art in image classification on Cifar10. 
AI Alignment  The challenge we face controlling a machine which thinks in an entirely different way to us and may well be far more intelligent than we are. Even if we have a perfect solution to the control problem we are left with a second problem, what should we ask the AI to do, think and value? This problem is AI alignment. 
AI Fairness 360 (AIF360) 
Fairness is an increasingly important concern as machine learning models are used to support decision making in highstakes applications such as mortgage lending, hiring, and prison sentencing. This paper introduces a new open source Python toolkit for algorithmic fairness, AI Fairness 360 (AIF360), released under an Apache v2.0 license {https://…/aif360 ). The main objectives of this toolkit are to help facilitate the transition of fairness research algorithms to use in an industrial setting and to provide a common framework for fairness researchers to share and evaluate algorithms. The package includes a comprehensive set of fairness metrics for datasets and models, explanations for these metrics, and algorithms to mitigate bias in datasets and models. It also includes an interactive Web experience (https://aif360.mybluemix.net ) that provides a gentle introduction to the concepts and capabilities for lineofbusiness users, as well as extensive documentation, usage guidance, and industryspecific tutorials to enable data scientists and practitioners to incorporate the most appropriate tool for their problem into their work products. The architecture of the package has been engineered to conform to a standard paradigm used in data science, thereby further improving usability for practitioners. Such architectural design and abstractions enable researchers and developers to extend the toolkit with their new algorithms and improvements, and to use it for performance benchmarking. A builtin testing infrastructure maintains code quality. 
AI Pipeline  Next generation of embedded Information and Communication Technology (ICT) systems are interconnected collaborative intelligent systems able to perform autonomous tasks. Training and deployment of such systems on Edge devices however require a finegrained integration of data and tools to achieve high accuracy and overcome functional and nonfunctional requirements. In this work, we present a modular AI pipeline as an integrating framework to bring data, algorithms and deployment tools together. By these means, we are able to interconnect the different entities or stages of particular systems and provide an endtoend development of AI products. We demonstrate the effectiveness of the AI pipeline by solving an Automatic Speech Recognition challenge and we show that all the steps leading to an endtoend development for Keyword Spotting tasks: importing, partitioning and preprocessing of speech data, training of different neural network architectures and their deployment on heterogeneous embedded platforms. 
AI Planning  The planning problem in Artificial Intelligence is about the decision making performed by intelligent creatures like robots, humans, or computer programs when trying to achieve some goal. It involves choosing a sequence of actions that will (with a high likelihood) transform the state of the world, step by step, so that it will satisfy the goal. The world is typically viewed to consist of atomic facts (state variables), and actions make some facts true and some facts false. In the following we discuss a number of ways of formalizing planning, and show how the planning problem can be solved automatically. ➘ “Automated Planning and Scheduling” 
AIDE  In this paper, we present two new communicationefficient methods for distributed minimization of an average of functions. The first algorithm is an inexact variant of the DANE algorithm that allows any local algorithm to return an approximate solution to a local subproblem. We show that such a strategy does not affect the theoretical guarantees of DANE significantly. In fact, our approach can be viewed as a robustification strategy since the method is substantially better behaved than DANE on data partition arising in practice. It is well known that DANE algorithm does not match the communication complexity lower bounds. To bridge this gap, we propose an accelerated variant of the first method, called AIDE, that not only matches the communication lower bounds but can also be implemented using a purely firstorder oracle. Our empirical results show that AIDE is superior to other communication efficient algorithms in settings that naturally arise in machine learning applications. 
Aika  Aika is an artificial neural network designed specifically for the processing of natural language texts.A key feature of the Aika algorithm is the ability to evaluate and process various interpretations of the individual sections of a text.Aika combines several ideas and approaches from the field of AI such as artificial neural networks, frequent pattern mining andlogic based expert systems. It can be applied to a broad spectrum of text analysis task such as word sense disambiguation,entity resolution, named entity recognition, text classification and information extraction. 
AIOps  AIOps platforms utilize big data, modern machine learning and other advanced analytics technologies to directly and indirectly enhance IT operations (monitoring, automation and service desk) functions with proactive, personal and dynamic insight. AIOps platforms enable the concurrent use of multiple data sources, data collection methods, analytical (realtime and deep) technologies, and presentation technologies. 
AIQ  Focusing on Business AI, this article introduces the AIQ quadrant that enables us to measure AI for business applications in a relative comparative manner, i.e. to judge that software A has more or less intelligence than software B. Recognizing that the goal of Business software is to maximize value in terms of business results, the dimensions of the quadrant are the key factors that determine the business value of AI software: Level of Output Quality (Smartness) and Level of Automation. The use of the quadrant is illustrated by several software solutions to support the real life business challenge of field service scheduling. The role of machine learning and conversational digital assistants in increasing the business value are also discussed and illustrated with a recent integration of existing intelligent digital assistants for factory floor decision making with the new version of Google Glass. Such hands free AI solutions elevate the AIQ level to its ultimate position. 
AIXIjs  Reinforcement learning is a general and powerful framework with which to study and implement artificial intelligence. Recent advances in deep learning have enabled RL algorithms to achieve impressive performance in restricted domains such as playing Atari video games (Mnih et al., 2015) and, recently, the board game Go (Silver et al., 2016). However, we are still far from constructing a generally intelligent agent. Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments? What, in fact, does it mean to be optimal in the general sense? The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the subfield of general reinforcement learning (GRL). Recently, AIXI has been shown to be flawed in important ways; it doesn’t explore enough to be asymptotically optimal (Orseau, 2010), and it can perform poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI have been proposed to attempt to address these shortfalls: among them are entropyseeking agents (Orseau, 2011), knowledgeseeking agents (Orseau et al., 2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike, 2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL agents. This implementation is accompanied by a framework for running experiments against various environments, similar to OpenAI Gym (Brockman et al., 2016), and a suite of interactive demos that explore different properties of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to present numerous experiments illustrating fundamental properties of, and differences between, these agents. 
Akaike Information Criterion (AIC) 
The Akaike information criterion (AIC) is a measure of the relative quality of a statistical model for a given set of data. As such, AIC provides a means for model selection. AIC deals with the tradeoff between the goodness of fit of the model and the complexity of the model. It is founded on information theory: it offers a relative estimate of the information lost when a given model is used to represent the process that generates the data. AIC does not provide a test of a model in the sense of testing a null hypothesis; i.e. AIC can tell nothing about the quality of the model in an absolute sense. If all the candidate models fit poorly, AIC will not give any warning of that. cna 
Akid  Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet another statistic model, but a model for the natural processing engine — the brain. In this work, we describe a neural network library named {\texttt akid}. It provides higher level of abstraction for entities (abstracted as blocks) in nature upon the abstraction done on signals (abstracted as tensors) by Tensorflow, characterizing the dataism observation that all entities in nature processes input and emit out in some ways. It includes a full stack of software that provides abstraction to let researchers focus on research instead of implementation, while at the same time the developed program can also be put into production seamlessly in a distributed environment, and be production ready. At the top application stack, it provides outofbox tools for neural network applications. Lower down, akid provides a programming paradigm that lets user easily build customized models. The distributed computing stack handles the concurrency and communication, thus letting models be trained or deployed to a single GPU, multiple GPUs, or a distributed environment without affecting how a model is specified in the programming paradigm stack. Lastly, the distributed deployment stack handles how the distributed computing is deployed, thus decoupling the research prototype environment with the actual production environment, and is able to dynamically allocate computing resources, so development (Devs) and operations (Ops) could be separated. Please refer to http://…/latest for documentation. 
Akismet  Akismet or Automattic Kismet is a spam filtering service. It attempts to filter link spam from blog comments and spam TrackBack pings. The filter works by combining information about spam captured on all participating blogs, and then using those spam rules to block future spam. Akismet is offered by Automattic, the company behind WordPress.com. Launched on October 25, 2005, Akismet is said to have captured over 100 billion spam comments and pings as of October 2013. cometExactTest 
akka  Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant eventdriven applications on the JVM. CompareCausalNetworks 
ALAMO  ALAMO is a computational methodology for leaning algebraic functions from data. Given a data set, the approach begins by building a lowcomplexity, linear model composed of explicit nonlinear transformations of the independent variables. Linear combinations of these nonlinear transformations allow a linear model to better approximate complex behavior observed in real processes. The model is refined, as additional data are obtained in an adaptive fashion through error maximization sampling using derivativefree optimization. Models built using ALAMO can enforce constraints on the response variables to incorporate firstprinciples knowledge. The ability of ALAMO to generate simple and accurate models for a number of reaction problems is demonstrated. The error maximization sampling is compared with Latin hypercube designs to demonstrate its sampling efficiency. ALAMO’s constrained regression methodology is used to further refine concentration models, resulting in models that perform better on validation data and satisfy upper and lower bounds placed on model outputs. 
Alchemist  The interest and demand for training deep neural networks have been experiencing rapid growth, spanning a wide range of applications in both academia and industry. However, training them distributed and at scale remains difficult due to the complex ecosystem of tools and hardware involved. One consequence is that the responsibility of orchestrating these complex components is often left to oneoff scripts and glue code customized for specific problems. To address these restrictions, we introduce \emph{Alchemist} – an internal service built at Apple from the ground up for \emph{easy}, \emph{fast}, and \emph{scalable} distributed training. We discuss its design, implementation, and examples of running different flavors of distributed training. We also present case studies of its internal adoption in the development of autonomous systems, where training times have been reduced by 10x to keep up with the evergrowing data collection. 
Aleph  In this paper we propose Aleph, a leaderless, fully asynchronous, Byzantine fault tolerant consensus protocol for ordering messages exchanged among processes. It is based on a distributed construction of a partially ordered set and the algorithm for reaching a consensus on its extension to a total order. To achieve the consensus, the processes perform computations based only on a local copy of the data structure, however, they are bound to end with the same results. Our algorithm uses a dualthreshold cointossing scheme as a randomization strategy and establishes the agreement in an expected constant number of rounds. In addition, we introduce a fast way of validating messages that can occur prior to determining the total ordering. 
AlephStar  A* is a popular pathfinding algorithm, but it can only be applied to those domains where a good heuristic function is known. Inspired by recent methods combining Deep Neural Networks (DNNs) and trees, this study demonstrates how to train a heuristic represented by a DNN and combine it with A*. This new algorithm which we call alephstar can be used efficiently in domains where the input to the heuristic could be processed by a neural network. We compare alephstar to NStep Deep QLearning (DQN Mnih et al. 2013) in a driving simulation with pixelbased input, and demonstrate significantly better performance in this scenario. 
Algebraic Machine Learning  Machine learning algorithms use error function minimization to fit a large set of parameters in a preexisting model. However, error minimization eventually leads to a memorization of the training dataset, losing the ability to generalize to other datasets. To achieve generalization something else is needed, for example a regularization method or stopping the training when error in a validation dataset is minimal. Here we propose a different approach to learning and generalization that is parameterfree, fully discrete and that does not use function minimization. We use the training data to find an algebraic representation with minimal size and maximal freedom, explicitly expressed as a product of irreducible components. This algebraic representation is shown to directly generalize, giving high accuracy in test data, more so the smaller the representation. We prove that the number of generalizing representations can be very large and the algebra only needs to find one. We also derive and test a relationship between compression and error rate. We give results for a simple problem solved step by step, handwritten character recognition, and the Queens Completion problem as an example of unsupervised learning. As an alternative to statistical learning, \enquote{algebraic learning} may offer advantages in combining bottomup and topdown information, formal concept derivation from data and largescale parallelization. 
Algebraic Statistics  Algebraic statistics is the use of algebra to advance statistics. Algebra has been useful for experimental design, parameter estimation, and hypothesis testing. Traditionally, algebraic statistics has been associated with the design of experiments and multivariate analysis (especially time series). In recent years, the term “algebraic statistics” has been sometimes restricted, sometimes being used to label the use of algebraic geometry and commutative algebra in statistics. Compind 
Algebraic Subspace Clustering (ASC) 
Algebraic Subspace Clustering (ASC) is a simple and elegant method based on polynomial fitting and differentiation for clustering noiseless data drawn from an arbitrary union of subspaces. In practice, however, ASC is limited to equidimensional subspaces because the estimation of the subspace dimension via algebraic methods is sensitive to noise. This paper proposes a new ASC algorithm that can handle noisy data drawn from subspaces of arbitrary dimensions. The key ideas are (1) to construct, at each point, a decreasing sequence of subspaces containing the subspace passing through that point; (2) to use the distances from any other point to each subspace in the sequence to construct a subspace clustering affinity, which is superior to alternative affinities both in theory and in practice. Experiments on the Hopkins 155 dataset demonstrate the superiority of the proposed method with respect to sparse and low rank subspace clustering methods. compLasso 
Algojammer  Algojammer is an experimental, proofofconcept code editor for writing algorithms in Python. It was mainly written to assist with solving the kind of algorithm problems that feature in competitions like Google Code Jam, Topcoder and HackerRank. 
Algorithm QuasiOptimal Learning (AQ) 
The algorithm quasioptimal (AQ) is a powerful machine learning methodology aimed at learning symbolic decision rules from a set of examples and counterexamples. It was first proposed in the late 1960s to solve the Boolean function satisfiability problem and further refined over the following decade to solve the general covering problem. In its newest implementations, it is a powerful but yet little explored methodology for symbolic machine learning classification. It has been applied to solve several problems from different domains, including the generation of individuals within an evolutionary computation framework. The current article introduces the main concepts of the AQ methodology and describes AQ for source detection(AQ4SD), a tailored implementation of the AQ methodology to solve the problem of finding the sources of atmospheric releases using distributed sensor measurements. compositions 
Algorithmia  An Open Marketplace For Algorithms: We’re building a community around stateoftheart algorithm development. Users can create, share, and build on other algorithms and then instantly make them available as a web service. http://…ldingwebsiteexplorer5easysteps.html conformal 
Algorithmic Complexity (AC) 
The information content or complexity of an object can be measured by the length of its shortest description. For instance the string “01010101010101010101010101010101” has the short description “16 repetitions of 01”, while “11001000011000011101111011101100” presumably has no simpler description other than writing down the string itself. More formally, the Algorithmic “Kolmogorov” Complexity (AC) of a string x is defined as the length of the shortest program that computes or outputs x , where the program is run on some fixed reference universal computer. conover.test 
Algorithmic Neural Network (AlgoNet) 
Artificial neural networks revolutionized many areas of computer science in recent years since they provide solutions to a number of previously unsolved problems. On the other hand, for many problems, classic algorithms exist, which typically exceed the accuracy and stability of neural networks. To combine these two concepts, we present a new kind of neural networks—algorithmic neural networks (AlgoNets). These networks integrate smooth versions of classic algorithms and data structures into the topology of neural networks. A forward AlgoNet includes algorithmic layers into existing architectures while a backward AlgoNet can solve inverse problems without or with only weak supervision. In addition, we present the \texttt{algonet} package, a PyTorch based library that includes, inter alia, a smooth evaluated programming language, a smooth 3D mesh renderer, and smooth sorting algorithms. 
Algorithmic Polynomial  The approximate degree of a Boolean function $f(x_{1},x_{2},\ldots,x_{n})$ is the minimum degree of a real polynomial that approximates $f$ pointwise within $1/3$. Upper bounds on approximate degree have a variety of applications in learning theory, differential privacy, and algorithm design in general. Nearly all known upper bounds on approximate degree arise in an existential manner from bounds on quantum query complexity. We develop a firstprinciples, classical approach to the polynomial approximation of Boolean functions. We use it to give the first constructive upper bounds on the approximate degree of several fundamental problems: – $O\bigl(n^{\frac{3}{4}\frac{1}{4(2^{k}1)}}\bigr)$ for the $k$element distinctness problem; – $O(n^{1\frac{1}{k+1}})$ for the $k$subset sum problem; – $O(n^{1\frac{1}{k+1}})$ for any $k$DNF or $k$CNF formula; – $O(n^{3/4})$ for the surjectivity problem. In all cases, we obtain explicit, closedform approximating polynomials that are unrelated to the quantum arguments from previous work. Our first three results match the bounds from quantum query complexity. Our fourth result improves polynomially on the $\Theta(n)$ quantum query complexity of the problem and refutes the conjecture by several experts that surjectivity has approximate degree $\Omega(n)$. In particular, we exhibit the first natural problem with a polynomial gap between approximate degree and quantum query complexity. 
Algorithmic Social Intervention  Social and behavioral interventions are a critical tool for governments and communities to tackle deeprooted societal challenges such as homelessness, disease, and poverty. However, realworld interventions are almost always plagued by limited resources and limited data, which creates a computational challenge: how can we use algorithmic techniques to enhance the targeting and delivery of social and behavioral interventions? The goal of my thesis is to provide a unified study of such questions, collectively considered under the name ‘algorithmic social intervention’. This proposal introduces algorithmic social intervention as a distinct area with characteristic technical challenges, presents my published research in the context of these challenges, and outlines open problems for future work. A common technical theme is decision making under uncertainty: how can we find actions which will impact a social system in desirable ways under limitations of knowledge and resources? The primary application area for my work thus far is public health, e.g. HIV or tuberculosis prevention. For instance, I have developed a series of algorithms which optimize social network interventions for HIV prevention. Two of these algorithms have been pilottested in collaboration with LAarea service providers for homeless youth, with preliminary results showing substantial improvement over statusquo approaches. My work also spans other topics in infectious disease prevention and underlying algorithmic questions in robust and riskaware submodular optimization. 
Algorithmic Transparency  What information is your blackbox analytics system really relying on, and should it? copCAR 
Aligned Rank Transform (ART) 
Nonparametric data from multifactor experiments arise often in humancomputer interaction (HCI). Examples may include error counts, Likert responses, and preference tallies. But because multiple factors are involved, common nonparametric tests (e.g., Friedman) are inadequate, as they are unable to examine interaction effects. While some statistical techniques exist to handle such data, these techniques are not widely available and are complex. To address these concerns, we present the Aligned Rank Transform (ART) for nonparametric factorial data analysis in HCI. The ART relies on a preprocessing step that ‘aligns’ data before applying averaged ranks, after which point common ANOVA procedures can be used, making the ART accessible to anyone familiar with the Ftest. Unlike most articles on the ART, which only address two factors, we generalize the ART to N factors. We also provide ARTool and ARTweb, desktop and Webbased programs for aligning and ranking data. Our reexamination of some published HCI results exhibits advantages of the ART. cord 
Alignment of Manifold Structures via Semantic Feature Expansion (AMSSFE) 
Zeroshot learning (ZSL) aims at recognizing unseen classes with knowledge transferred from seen classes. This is typically achieved by exploiting a semantic feature space (FS) shared by both seen and unseen classes, i.e., attributes or word vectors, as the bridge. However, due to the mutually disjoint of training (seen) and testing (unseen) data, existing ZSL methods easily and commonly suffer from the domain shift problem. To address this issue, we propose a novel model called AMSSFE. It considers the Alignment of Manifold Structures by Semantic Feature Expansion. Specifically, we build up an autoencoder based model to expand the semantic features and joint with an alignment to an embedded manifold extracted from the visual FS of data. It is the first attempt to align these two FSs by way of expanding semantic features. Extensive experiments show the remarkable performance improvement of our model compared with other existing methods. 
AliGraph  An increasing number of machine learning tasks require dealing with large graph datasets, which capture rich and complex relationship among potentially billions of elements. Graph Neural Network (GNN) becomes an effective way to address the graph learning problem by converting the graph data into a low dimensional space while keeping both the structural and property information to the maximum extent and constructing a neural network for training and referencing. However, it is challenging to provide an efficient graph storage and computation capabilities to facilitate GNN training and enable development of new GNN algorithms. In this paper, we present a comprehensive graph neural network system, namely AliGraph, which consists of distributed graph storage, optimized sampling operators and runtime to efficiently support not only existing popular GNNs but also a series of inhouse developed ones for different scenarios. The system is currently deployed at Alibaba to support a variety of business scenarios, including product recommendation and personalized search at Alibaba’s ECommerce platform. By conducting extensive experiments on a realworld dataset with 492.90 million vertices, 6.82 billion edges and rich attributes, AliGraph performs an order of magnitude faster in terms of graph building (5 minutes vs hours reported from the stateoftheart PowerGraph platform). At training, AliGraph runs 40%50% faster with the novel caching strategy and demonstrates around 12 times speed up with the improved runtime. In addition, our inhouse developed GNN models all showcase their statistically significant superiorities in terms of both effectiveness and efficiency (e.g., 4.12%17.19% lift by F1 scores). 
ALINE Algorithm (ALINE) 
The ALINE algorithm (Kondrak, 2000) assigns a similarity score to pairs of phoneticallytranscribed words on the basis of the decomposition of phonemes into elementary phonetic features. The algorithm was originally designed to identify and align cognates in vocabularies of related languages. Nevertheless, thanks to its grounding in universal phonetic principles, the algorithm can be used for estimating the similarity of any pair of words. The principal component of ALINE is a function that calculates the similarity of two phonemes that are expressed in terms of about a dozen multivalued phonetic features (Place, Manner, Voice, etc.). The phonetic features are assigned salience weights that express their relative importance. Feature values are encoded as oatingpoint numbers in the range [0;1]. For example, the feature Manner can take any of the following seven values: stop = 1.0, affricate = 0.9, fricative = 0.8, approximant = 0.6, high vowel = 0.4, mid vowel = 0.2, and low vowel = 0.0. The numerical values re ect the distances between vocal organs during speech production. The overall similarity score is the sum of individual similarity scores between pairs of phonemes in an optimal alignment of two words, which is computed by a dynamic programming algorithm (Wagner and Fischer, 1974). A constant insertion/deletion penalty is applied for each unaligned phoneme. Another constant penalty is set to reduce relative importance of vowel as opposed to consonant phoneme matches. The similarity value is normalized by the length of the longer word. ALINE’s behavior is controlled by a number of parameters: the maximum phonemic score, the insertion/ deletion penalty, the vowel penalty, and the feature salience weights. The parameters have default settings for the cognate matching task, but these settings can be optimized (tuned) on a development set that includes both positive and negative examples of similar words. Greg Kondrak Evaluation of Several Phonetic Similarity Algorithms on the Task of Cognate Identification Coxnet 
ALink Inference Module  Action recognition with skeleton data has recently attracted much attention in computer vision. Previous studies are mostly based on fixed skeleton graphs, only capturing local physical dependencies among joints, which may miss implicit joint correlations. To capture richer dependencies, we introduce an encoderdecoder structure, called Alink inference module, to capture actionspecific latent dependencies, i.e. actional links, directly from actions. We also extend the existing skeleton graphs to represent higherorder dependencies, i.e. structural links. Combing the two types of links into a generalized skeleton graph, we further propose the actionalstructural graph convolution network (ASGCN), which stacks actionalstructural graph convolution and temporal convolution as a basic building block, to learn both spatial and temporal features for action recognition. A future pose prediction head is added in parallel to the recognition head to help capture more detailed action patterns through selfsupervision. We validate ASGCN in action recognition using two skeleton data sets, NTURGB+D and Kinetics. The proposed ASGCN achieves consistently large improvement compared to the stateoftheart methods. As a side product, ASGCN also shows promising results for future pose prediction. 
All Learning Rates At Once (Alrao) 
Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the ‘All Learning Rates At Once’ (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. This comes at practically no computational cost. Perhaps surprisingly, stochastic gradient descent (SGD) with Alrao performs close to SGD with an optimally tuned learning rate, for various architectures and problems. Alrao could save time when testing deep learning models: a range of models could be quickly assessed with Alrao, and the most promising models could then be trained more extensively. This text comes with a PyTorch implementation of the method, which can be plugged on an existing PyTorch model. 
Allan Variance (AVAR) 
The Allan variance (AVAR), also known as twosample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan. The Allan variance is intended to estimate stability due to noise processes and not that of systematic errors or imperfections such as frequency drift or temperature effects. The Allan variance and Allan deviation describe frequency stability, i.e. the stability in frequency. See also the section entitled ‘Interpretation of value’ below. There are also different adaptations or alterations of Allan variance, notably the modified Allan variance MAVAR or MVAR, the total variance, and the Hadamard variance. There also exist time stability variants such as time deviation TDEV or time variance TVAR. Allan variance and its variants have proven useful outside the scope of timekeeping and are a set of improved statistical tools to use whenever the noise processes are not unconditionally stable, thus a derivative exists. The general Msample variance remains important since it allows dead time in measurements and bias functions allows conversion into Allan variance values. Nevertheless, for most applications the special case of 2sample, or ‘Allan variance’ with T = tau is of greatest interest. CP 
AllenNLP  This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding. AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily. It is built on top of PyTorch, allowing for dynamic computation graphs, and provides (1) a flexible data API that handles intelligent batching and padding, (2) highlevel abstractions for common operations in working with text, and (3) a modular and extensible experiment framework that makes doing good science easy. It also includes reference implementations of high quality approaches for both core semantic problems (e.g. semantic role labeling (Palmer et al., 2005)) and language understanding applications (e.g. machine comprehension (Rajpurkar et al., 2016)). AllenNLP is an ongoing opensource effort maintained by engineers and researchers at the Allen Institute for Artificial Intelligence. 
AllPairs Testing  In computer science, allpairs testing or pairwise testing is a combinatorial method of software testing that, for each pair of input parameters to a system (typically, a software algorithm), tests all possible discrete combinations of those parameters. Using carefully chosen test vectors, this can be done much faster than an exhaustive search of all combinations of all parameters, by “parallelizing” the tests of parameter pairs. cqrReg 
Alluvial Diagram  Alluvial diagrams are a type of flow diagram originally developed to represent changes in network structure over time. In allusion to both their visual appearance and their emphasis on flow, alluvial diagrams are named after alluvial fans that are naturally formed by the soil deposited from streaming water. cqrReg,ggalluvial 
ALOJA  This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a longterm collaboration between BSC and Microsoft to automate the characterization of costeffectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex runtime environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendorneutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a testbed and tools to deploy and evaluate the costeffectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resourceconstrained search spaces. The predictive analytics extension, ALOJAML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables modelbased anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA datasets and framework to improve the design and deployment of Big Data applications. 
Alpha ModelAgnostic MetaLearning (Alpha MAML) 
Modelagnostic metalearning (MAML) is a metalearning technique to train a model on a multitude of learning tasks in a way that primes the model for fewshot learning of new tasks. The MAML algorithm performs well on fewshot learning problems in classification, regression, and finetuning of policy gradients in reinforcement learning, but comes with the need for costly hyperparameter tuning for training stability. We address this shortcoming by introducing an extension to MAML, called Alpha Modelagnostic metalearning, to incorporate an online hyperparameter adaptation scheme that eliminates the need to tune metalearning and learning rates. Our results with the Omniglot database demonstrate a substantial reduction in the need to tune MAML training hyperparameters and improvement to training stability with less sensitivity to hyperparameter choice. 
alphaAlgorithm  The alphaalgorithm is an algorithm used in process mining, aimed at reconstructing causality from a set of sequences of events. It was first put forward by van der Aalst, Weijters and Maruster. Several extensions or modifications of it have since been presented, which will be listed below. It constructs P/T nets with special properties (workflow nets) from event logs (as might be collected by an ERP system). Each transition in the net corresponds to an observed task. cqrReg,SpatPCA 
AlphaClean  The analyst effort in data cleaning is gradually shifting away from the design of handwritten scripts to building and tuning complex pipelines of automated data cleaning libraries. Hyperparameter tuning for data cleaning is very different than hyperparameter tuning for machine learning since the pipeline components and objective functions have structure that tuning algorithms can exploit. This paper proposes a framework, called AlphaClean, that rethinks parameter tuning for data cleaning pipelines. AlphaClean provides users with a rich library to define data quality measures with weighted sums of SQL aggregate queries. AlphaClean applies generatethensearch framework where each pipelined cleaning operator contributes candidate transformations to a shared pool. Asynchronously, in separate threads, a search algorithm sequences them into cleaning pipelines that maximize the userdefined quality measures. This architecture allows AlphaClean to apply a number of optimizations including incremental evaluation of the quality measures and learning dynamic pruning rules to reduce the search space. Our experiments on real and synthetic benchmarks suggest that AlphaClean finds solutions of upto 9x higher quality than naively applying stateoftheart parameter tuning methods, is significantly more robust to straggling data cleaning methods and redundancy in the data cleaning library, and can incorporate stateoftheart cleaning systems such as HoloClean as cleaning operators. 
AlphaOutliers  Crossover,crossdes 
AlphaPooling  Convolutional neural networks (CNNs) have achieved remarkable performance in many applications, especially image recognition. As a crucial component of CNNs, subsampling plays an important role, and max pooling and arithmetic average pooling are commonly used subsampling methods. In addition to the two pooling methods, however, there could be many other pooling types, such as geometric average, harmonic average, and so on. Since it is not easy for algorithms to find the best pooling method, human experts choose types of pooling, which might not be optimal for different tasks. Following deep learning philosophy, the type of pooling can be driven by data for a given task. In this paper, we propose {\em alphapooling}, which has a trainable parameter $\alpha$ to decide the type of pooling. Alphapooling is a general pooling method including max pooling and arithmetic average pooling as a special case, depending on the parameter $\alpha$. In experiments, alphapooling improves the accuracy of image recognition tasks, and we found that max pooling is not the optimal pooling scheme. Moreover each layer has different optimal pooling types. 
alphaRank  We introduce {\alpha}Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in largescale multiagent interactions, grounded in a novel dynamical gametheoretic solution concept called MarkovConley chains (MCCs). The approach leverages continuoustime and discretetime evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired gametheoretic solution concept (typically the Nash equilibrium). {\alpha}Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and longterm dynamics in terms of basins of attraction and sink components. This is a direct consequence of our new model’s direct correspondence to the dynamical MCC solution concept when its rankingintensity parameter, {\alpha}, is chosen to be large, which exactly forms the basis of {\alpha}Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley’s Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our {\alpha}Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a generalsum game is known to be intractable. We introduce mathematical proofs that reveal the formal underpinnings of the {\alpha}Rank methodology. We illustrate the method in canonical games and in AlphaGo, AlphaZero, MuJoCo Soccer, and Poker. 
AlphaSeq  Sequences play an important role in many applications and systems. Discovering sequences with desired properties has long been an interesting intellectual pursuit. This paper puts forth a new paradigm, AlphaSeq, to discover desired sequences algorithmically using deep reinforcement learning (DRL) techniques. AlphaSeq treats the sequence discovery problem as an episodic symbolfilling game, in which a player fills symbols in the vacant positions of a sequence set sequentially during an episode of the game. Each episode ends with a completelyfilled sequence set, upon which a reward is given based on the desirability of the sequence set. AlphaSeq models the game as a Markov Decision Process (MDP), and adapts the DRL framework of AlphaGo to solve the MDP. Sequences discovered improve progressively as AlphaSeq, starting as a novice, learns to become an expert game player through many episodes of game playing. Compared with traditional sequence construction by mathematical tools, AlphaSeq is particularly suitable for problems with complex objectives intractable to mathematical analysis. We demonstrate the searching capabilities of AlphaSeq in two applications: 1) AlphaSeq successfully rediscovers a set of ideal complementary codes that can zeroforce all potential interferences in multicarrier CDMA systems. 2) AlphaSeq discovers new sequences that triple the signaltointerference ratio — benchmarked against the wellknown Legendre sequence — of a mismatched filter estimator in pulse compression radar systems. 
AlphaSutte Indicator  ➘ “Sutte Indicator” RcmdrPlugin.sutteForecastR,sutteForecastR 
AlphaX  We present AlphaX, a fully automated agent that designs complex neural architectures from scratch. AlphaX explores the exponentially grown search space with a distributed Monte Carlo Tree Search (MCTS) and a MetaDeep Neural Network (DNN). MCTS intrinsically improves the search efficiency by dynamically balancing the exploration and exploitation at finegrained states, while MetaDNN predicts the network accuracy to guide the search, and to provide an estimated reward to speed up the rollout. As the search progresses, AlphaX also generates the training data for MetaDNN. So, the learning of MetaDNN is endtoend. In 14 days with only 16 GPUs (1832 samples), AlphaX found an architecture that reaches the stateoftheart accuracies on both CIFAR10(97.18%) and ImageNet(75.5% top1 and 92.2% top5). This demonstrates up to 10x speedup over the original searching for NASNet that used 500 GPUs in 4 days (20000 samples). On NASBench101, AlphaX demonstrates 3x and 2.8x speedup over Random Search and Regularized Evolution. Finally, we show the searched architecture improves a variety of vision applications from Neural Style Transfer, to Image Captioning and Object Detection. Our implementation is available at https://…/AlphaXNASBench101. 
AlteregoNet  A person dependent network, called an AlterEgo net, is proposed for development. The networks are created per person. It receives at input an object descriptions and outputs a simulation of the internal person’s representation of the objects. The network generates a textual stream resembling the narrative stream of consciousness depicting multitudinous thoughts and feelings related to a perceived object. In this way, the object is described not by a ‘static’ set of its properties, like a dictionary, but by the stream of words and word combinations referring to the object. The network simulates a person’s dialogue with a representation of the object. It is based on an introduced algorithmic scheme, where perception is modeled by two interacting iterative cycles, reminding one respectively the forward and backward propagation executed at training convolution neural networks. The ‘forward’ iterations generate a stream representing the ‘internal world’ of a human. The ‘backward’ iterations generate a stream representing an internal representation of the object. People perceive the world differently. Tuning AlterEgo nets to a specific person or group of persons, will allow simulation of their thoughts and feelings. Thereby these nets is potentially a new human augmentation technology for various applications. 
Alternating Direction Method of Multipliers (ADMM) 
This paper considers the problem of recovering signals from compressed measurements contaminated with sparse outliers, which has arisen in many applications. In this paper, we propose a generative model neural network approach for reconstructing the ground truth signals under sparse outliers. We propose an iterative alternating direction method of multipliers (ADMM) algorithm for solving the outlier detection problem via $\ell_1$ norm minimization, and a gradient descent algorithm for solving the outlier detection problem via squared $\ell_1$ norm minimization. We establish the recovery guarantees for reconstruction of signals using generative models in the presence of outliers, and give an upper bound on the number of outliers allowed for recovery. Our results are applicable to both the linear generator neural network and the nonlinear generator neural network with an arbitrary number of layers. We conduct extensive experiments using variational autoencoder and deep convolutional generative adversarial networks, and the experimental results show that the signals can be successfully reconstructed under outliers using our approach. Our approach outperforms the traditional Lasso and $\ell_2$ minimization approach. crrp 
Alternating Direction Method of Multipliers Neural Network (ADMMNN) 
To facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), two important categories of DNN model compression techniques: weight pruning and weight quantization are investigated. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. However, there lacks a systematic framework of joint weight pruning and quantization of DNNs, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted for besides simply model size reduction. To address these limitations, we present ADMMNN, the first algorithmhardware cooptimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to deal with nonconvex optimization problems with possibly combinatorial constraints. The first part of ADMMNN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than prior work. The second part is hardwareaware DNN optimizations to facilitate hardwarelevel implementations. Without accuracy loss, we can achieve 85$\times$ and 24$\times$ pruning on LeNet5 and AlexNet models, respectively, significantly higher than prior work. The improvement becomes more significant when focusing on computation reductions. Combining weight pruning and quantization, we achieve 1,910$\times$ and 231$\times$ reductions in overall model size on these two benchmarks, when focusing on data storage. Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet50. 
Alternating Directions Dual Decomposition (AD3) 
We present AD3, a new algorithm for approximate maximum a posteriori (MAP) inference on factor graphs, based on the alternating directions method of multipliers. Like other dual decomposition algorithms, AD3 has a modular architecture, where local subproblems are solved independently, and their solutions are gathered to compute a global update. The key characteristic of AD3 is that each local subproblem has a quadratic regularizer, leading to faster convergence, both theoretically and in practice. We provide closedform solutions for these AD3 subproblems for binary pairwise factors and factors imposing rstorder logic constraints. For arbitrary factors (large or combinatorial), we introduce an active set method which requires only an oracle for computing a local MAP con guration, making AD3 applicable to a wide range of problems. Experiments on synthetic and realworld problems show that AD3 compares favorably with the stateoftheart. csrplus 
Alternating Least Squares (ALS) 
In ALS you’re minimizing the entire loss function at once, but, only twiddling half the parameters. That’s because the optimization has an easy algebraic solution – if half your parameters are fixed. So you fix half, recompute the other half, and repeat. There is no gradient in the optimization step since each optimization problem is convex and doesn’t need an approximate approach. But, each problem you’re solving is not the “real” optimization problem – you fixed half the parameters. http://…/netflix_aaim08%28submitted%29.pdf CUSUMdesign 
Alternating Minimization Induced Active Set Algorithms (AMIAS) 
AMIAS 
Alternative Updating with Lagrange Multipliers (AULM) 
The success of convolutional neural networks (CNNs) in computer vision applications has been accompanied by a significant increase of computation and memory costs, which prohibits its usage on resourcelimited environments such as mobile or embedded devices. To this end, the research of CNN compression has recently become emerging. In this paper, we propose a novel filter pruning scheme, termed structured sparsity regularization (SSR), to simultaneously speedup the computation and reduce the memory overhead of CNNs, which can be well supported by various offtheshelf deep learning libraries. Concretely, the proposed scheme incorporates two different regularizers of structured sparsity into the original objective function of filter pruning, which fully coordinates the global outputs and local pruning operations to adaptively prune filters. We further propose an Alternative Updating with Lagrange Multipliers (AULM) scheme to efficiently solve its optimization. AULM follows the principle of ADMM and alternates between promoting the structured sparsity of CNNs and optimizing the recognition loss, which leads to a very efficient solver (2.5x to the most recent work that directly solves the group sparsitybased regularization). Moreover, by imposing the structured sparsity, the online inference is extremely memorylight, since the number of filters and the output feature maps are simultaneously reduced. The proposed scheme has been deployed to a variety of stateoftheart CNN structures including LeNet, AlexNet, VGG, ResNet and GoogLeNet over different datasets. Quantitative results demonstrate that the proposed scheme achieves superior performance over the stateoftheart methods. We further demonstrate the proposed compression scheme for the task of transfer learning, including domain adaptation and object detection, which also show exciting performance gains over the stateofthearts. 
AlthamPoisson Distribution  The multiplicative binomial model was introduced as a generalization of the binomial distribution for modelling correlated binomial data. This distribution has not been extensively explored and is revisited in the present study. Some properties of the multiplicative binomial distribution, such as, expressions for the factorial moments and the information matrix, are investigated. The distribution is also extended to accommodate data arising from a Wadley’s problem setting which is frequently encountered in dosemortality studies and is one in which the number of organisms initially treated with a drug is unobserved. The AlthamPoisson distribution is introduced by modelling the unobserved initial number of organisms, as specified by Formula in the multiplicative binomial model, with a Poisson distribution and its suitability for overdispersed data from a Wadley’s problem setting is explored. cvxbiclustr 
Amazon Machine Learning  Amazon Machine Learning is a new service that makes it easy for developers of all skill levels to use machine learning technology. Amazon Machine Learning provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology. Once your models are ready, Amazon Machine Learning makes it easy to get predictions for your application using simple APIs, without having to implement custom prediction generation code, or manage any infrastructure. Amazon Machine Learning is based on the same proven, highly scalable, ML technology used for years by Amazon’s internal data scientist community. The service uses powerful algorithms to create ML models by finding patterns in your existing data. Then, Amazon Machine Learning uses these models to process new data and generate predictions for your application. Amazon Machine Learning is highly scalable and can generate billions of predictions daily, and serve those predictions in realtime and at high throughput. With Amazon Machine Learning, there is no upfront hardware or software investment, and you pay as you go, so you can start small and scale as your application grows. https://…/introducingamazonmachinelearning https://…rningmakedatadrivendecisionsatscale EffectStars 
Amazon SageMaker Ground Truth  In 1959, Arthur Samuel defined machine learning as a ‘field of study that gives computers the ability to learn without being explicitly programmed’. However, there is no deus ex machina: the learning process requires an algorithm (‘how to learn’) and a training dataset (‘what to learn from’). Today, most machine learning tasks use a technique called supervised learning: an algorithm learns patterns or behaviours from a labeled dataset. A labeled dataset containing data samples as well as the correct answer for each one of them, aka ‘ground truth’. Depending on the problem at hand, one could use labeled images (‘this is a dog’, ‘this is a cat’), labeled text (‘this is spam’, ‘this isn’t’), etc. Fortunately, developers and data scientists can now rely on a vast collection of offtheshelf algorithms (as illustrated by the builtin algorithms in Amazon SageMaker) and of reference datasets. Deep learning has popularized image datasets such as MNIST, CIFAR10 or ImageNet, and more are also available for tasks like machine translation or text classification. These reference datasets are extremely useful for beginners and experienced practitioners alike, but a lot of companies and organizations still need to train machine learning models on their own dataset: think about medical imaging, autonomous driving, etc. 
Amazon Textract  Amazon Textract is a service that automatically extracts text and data from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables. Many companies today extract data from documents and forms through manual data entry that’s slow and expensive or through simple optical character recognition (OCR) software that is difficult to customize. Rules and workflows for each document and form often need to be hardcoded and updated with each change to the form or when dealing with multiple forms. If the form deviates from the rules, the output is often scrambled and unusable. Amazon Textract overcomes these challenges by using machine learning to instantly ‘read’ virtually any type of document to accurately extract text and data without the need for any manual effort or custom code. With Textract you can quickly automate document workflows, enabling you to process millions of document pages in hours. Once the information is captured, you can take action on it within your business applications to initiate next steps for a loan application or medical claims processing. Additionally, you can create smart search indexes, build automated approval workflows, and better maintain compliance with document archival rules by flagging data that may require redaction. 
Ambient Diagnostics  People can usually sense troubles in a car from noises, vibrations, or smells. An experienced driver can even tell where the problem is. We call this kind of skill ‘Ambient Diagnostics’. Ambient Diagnostics is an emerging field that is aimed at detecting abnormities from seemly disconnected ambient data that we take for granted. For example, the human body is a rich ambient data source: temperature, pulses, gestures, sound, forces, moisture, et al. Also, many electronic devices provide pervasive ambient data streams, such as mobile phones, surveillance cameras, satellite images, personal data assistants, wireless networks and so on. efflog 
Ambient Intelligence  In computing, ambient intelligence (AmI) refers to electronic environments that are sensitive and responsive to the presence of people. Ambient intelligence is a vision on the future of consumer electronics, telecommunications and computing that was originally developed in the late 1990s by Eli Zelkha and his team at Palo Alto Ventures for the time frame 20102020. In an ambient intelligence world, devices work in concert to support people in carrying out their everyday life activities, tasks and rituals in an easy, natural way using information and intelligence that is hidden in the network connecting these devices (for example: The Internet of Things). As these devices grow smaller, more connected and more integrated into our environment, the technology disappears into our surroundings until only the user interface remains perceivable by users. 
AmbientGAN  Generative models provide a way to model structure in complex distributions and have been shown to be useful for many tasks of practical interest. However, current techniques for training generative models require access to fullyobserved samples. In many settings, it is expensive or even impossible to obtain fullyobserved samples, but economical to obtain partial, noisy observations. We consider the task of learning an implicit generative model given only lossy measurements of samples from the distribution of interest. We show that the true underlying distribution can be provably recovered even in the presence of persample information loss for a class of measurement models. Based on this, we propose a new method of training Generative Adversarial Networks (GANs) which we call AmbientGAN. On three benchmark datasets, and for various measurement models, we demonstrate substantial qualitative and quantitative improvements. Generative models trained with our method can obtain $2$$4$x higher inception scores than the baselines. 
Amortized Inference Regularization (AIR) 
The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overlyexpressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs. 
AmplifiedDifferential Compression DGD (ADCDGD) 
Network consensus optimization has received increasing attention in recent years and has found important applications in many scientific and engineering fields. To solve network consensus optimization problems, one of the most wellknown approaches is the distributed gradient descent method (DGD). However, in networks with slow communication rates, DGD’s performance is unsatisfactory for solving highdimensional network consensus problems due to the communication bottleneck. This motivates us to design a communicationefficient DGDtype algorithm based on compressed information exchanges. Our contributions in this paper are threefold: i) We develop a communicationefficient algorithm called amplifieddifferential compression DGD (ADCDGD) and show that it converges under {\em any} unbiased compression operator; ii) We rigorously prove the convergence performances of ADCDGD and show that they match with those of DGD without compression; iii) We reveal an interesting phase transition phenomenon in the convergence speed of ADCDGD. Collectively, our findings advance the stateoftheart of network consensus optimization theory. 
AmpliGraph  Open source Python library that predicts links between concepts in a knowledge graph. AmpliGraph is a suite of neural machine learning models for relational Learning, a branch of machine learning that deals with supervised learning on knowledge graphs. Use AmpliGraph if you need to: • Discover new knowledge from an existing knowledge graph. • Complete large knowledge graphs with missing statements. • Generate standalone knowledge graph embeddings. • Develop and evaluate a new relational model. 
AN2VEC  The creation of social ties is largely determined by the entangled effects of people’s similarities in terms of individual characters and friends. However, feature and structural characters of people usually appear to be correlated, making it difficult to determine which has greater responsibility in the formation of the emergent network structure. We propose \emph{AN2VEC}, a node embedding method which ultimately aims at disentangling the information shared by the structure of a network and the features of its nodes. Building on the recent developments of Graph Convolutional Networks (GCN), we develop a multitask GCN Variational Autoencoder where different dimensions of the generated embeddings can be dedicated to encoding feature information, network structure, and shared featurenetwork information. We explore the interaction between these disentangled characters by comparing the embedding reconstruction performance to a baseline case where no shared information is extracted. We use synthetic datasets with different levels of interdependency between feature and network characters and show (i) that shallow embeddings relying on shared information perform better than the corresponding reference with unshared information, (ii) that this performance gap increases with the correlation between network and feature structure, and (iii) that our embedding is able to capture joint information of structure and features. Our method can be relevant for the analysis and prediction of any featured network structure ranging from online social systems to network medicine. 
Anaconda  We investigate distribution testing with access to nonadaptive conditional samples. In the conditional sampling model, the algorithm is given the following access to a distribution: it submits a query set $S$ to an oracle, which returns a sample from the distribution conditioned on being from $S$. In the nonadaptive setting, all query sets must be specified in advance of viewing the outcomes. Our main result is the first polylogarithmicquery algorithm for equivalence testing, deciding whether two unknown distributions are equal to or far from each other. This is an exponential improvement over the previous best upper bound, and demonstrates that the complexity of the problem in this model is intermediate to the the complexity of the problem in the standard sampling model and the adaptive conditional sampling model. We also significantly improve the sample complexity for the easier problems of uniformity and identity testing. For the former, our algorithm requires only $\tilde O(\log n)$ queries, matching the informationtheoretic lower bound up to a $O(\log \log n)$factor. Our algorithm works by reducing the problem from $\ell_1$testing to $\ell_\infty$testing, which enjoys a much cheaper sample complexity. Necessitated by the limited power of the nonadaptive model, our algorithm is very simple to state. However, there are significant challenges in the analysis, due to the complex structure of how two arbitrary distributions may differ. 
Analysis of Covariance (ANCOVA) 
Covariance is a measure of how much two variables change together and how strong the relationship is between them. Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether population means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV), while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV). Therefore, when performing ANCOVA, the DV means are adjusted to what they would be if all groups were equal on the CV. FactoMineR 
Analysis of Means (ANOM) 
The analysis of means (ANOM) is a graphical procedure for comparing a collection of means, rates, or proportions to see if any of them differ significantly from the overall mean, rate, or proportion. The ANOM is a type of multiple comparison procedure. The results of the analysis are summarized in an ANOM decision chart. This chart is similar in appearance to a control chart. It has a centerline, located at the overall mean (rate, or proportion), and upper and lower decision limits. Group means (or rates, or proportions) are plotted on this chart and if one falls beyond a decision limit then that group is said to be statistically different from the overal mean (rate or proportion). In many situations, you need to examine differences among more than two groups. When you are interested in comparing multiple group means, you can use the analysis of means (ANOM) as an alternative to the oneway analysis of variance F test. The ANOM provides a ‘confidence interval type of approach’ that allows you to determine which, if any, of the c groups has a mean significantly different from the overall average of all the group means combined. Instead of looking at the lower and upper limits of a confidence interval, you will be studying which of the c group means are not contained in an interval formed between a lower decision line and an upper decision line. Any individual group mean not contained in this interval is deemed significantly higher than the overall average of all groups if it lies above the upper decision line. Similarly, any group mean that falls below the lower decision line is declared significantly lower than the overall group average. FactoMineR,BiplotGUI 
AnalysisofMarginalTailMeans (ATM) 
This paper presents a novel method, called AnalysisofmarginalTailMeans (ATM), for parameter optimization over a large, discrete design space. The key advantage of ATM is that it offers effective and robust optimization performance for both smooth and rugged response surfaces, using only a small number of function evaluations. This method can therefore tackle a wide range of engineering problems, particularly in applications where the performance metric to optimize is ‘blackbox’ and expensive to evaluate. The ATM framework unifies two parameter optimization methods in the literature: the AnalysisofmarginalMeans (AM) approach (Taguchi, 1986), and the PicktheWinner (PW) approach (Wu et al., 1990). In this paper, we show that by providing a continuum between AM and PW via the novel idea of marginal tail means, the proposed method offers a balance between three fundamental tradeoffs. By adaptively tuning these tradeoffs, ATM can then provide excellent optimization performance over a broad class of response surfaces using limited data. We illustrate the effectiveness of ATM using several numerical examples, and demonstrate how such a method can be used to solve two realworld engineering design problems. 
Analytic Hierarchy Process (AHP) 
The analytic hierarchy process (AHP) is a structured technique for organizing and analyzing complex decisions, based on mathematics and psychology. It was developed by Thomas L. Saaty in the 1970s and has been extensively studied and refined since then. It has particular application in group decision making, and is used around the world in a wide variety of decision situations, in fields such as government, business, industry, healthcare, shipbuilding and education. Rather than prescribing a ‘correct’ decision, the AHP helps decision makers find one that best suits their goal and their understanding of the problem. It provides a comprehensive and rational framework for structuring a decision problem, for representing and quantifying its elements, for relating those elements to overall goals, and for evaluating alternative solutions. Users of the AHP first decompose their decision problem into a hierarchy of more easily comprehended subproblems, each of which can be analyzed independently. The elements of the hierarchy can relate to any aspect of the decision problemtangible or intangible, carefully measured or roughly estimated, well or poorly understoodanything at all that applies to the decision at hand. Once the hierarchy is built, the decision makers systematically evaluate its various elements by comparing them to each other two at a time, with respect to their impact on an element above them in the hierarchy. In making the comparisons, the decision makers can use concrete data about the elements, but they typically use their judgments about the elements’ relative meaning and importance. It is the essence of the AHP that human judgments, and not just the underlying information, can be used in performing the evaluations. The AHP converts these evaluations to numerical values that can be processed and compared over the entire range of the problem. A numerical weight or priority is derived for each element of the hierarchy, allowing diverse and often incommensurable elements to be compared to one another in a rational and consistent way. This capability distinguishes the AHP from other decision making techniques. In the final step of the process, numerical priorities are calculated for each of the decision alternatives. These numbers represent the alternatives’ relative ability to achieve the decision goal, so they allow a straightforward consideration of the various courses of action. FeaLect 
Analytic Network Learning  Based on the property that solving the system of linear matrix equations via the column space and the row space projections boils down to an approximation in the least squares error sense, a formulation for learning the weight matrices of the multilayer network can be derived. By exploiting into the vast number of feasible solutions of these interdependent weight matrices, the learning can be performed analytically layer by layer without needing of gradient computation after an initialization. Possible initialization schemes include utilizing the data matrix as initial weights and random initialization. The study is followed by an investigation into the representation capability and the output variance of the learning scheme. An extensive experimentation on synthetic and realworld data sets validates its numerical feasibility. 
Analytical Data Mart  Data Mart is a table or a collection of tables containing only the information which the analysts require to do their job. This data is pulled from multiple sources, processed in a uniform manner, documented and optimized. Data Marts frequently contain data aggregated on the customer level, such as the average number of transactions in last 6 months, number of cash loans drawn by the client during the last 12 months, etc. When the computed aggregate values are available, it is much easier to prepare reports. Data Marts are built only once, at the start of the analytical process, but they are cyclically and automatically updated, so as to contain all the relevant information pertaining to customers/products/transactions in a given period of time. ➘ “Analytics Data Store” GameTheory 
Analytics  Analytics is the discovery and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight. hierband 
Analytics Data Store (ADS) 
The ADS (which can be a distributed data store on a Cloud somewhere) supports the data requirements of statistical analysis. Here the data is organised, structured, integrated and enriched to meet the ongoing and occasionally volatile needs of the statisticians and data scientists focusing on data mining. Data in the ADS can be accumulative or completely refreshed. It can have a short life span or have a significantly long lifetime. The ADS is the logistics centre for analytics data. Not only can it be used to provide data into the statistical analysis process, but it can also be used to provide persistent long term storage for analysis outcomes and scenarios, and for future analysis, hence the ability to ‘write back’. The data and information in the ADS may also be augmented with data derived from data stored in the data warehouse, it may also benefit from having its own dedicated Data Mart specifically designed for this purpose. Results of statistical analysis on the ADS data may also result in feedback being used to tune the data reduction, filtering and enrichment rules further downstream, either in smart data analytics, complex event and discrimination adapters or in ET(AL) job streams. hiertest 
Anchored Bayesian Gaussian Mixture Model  Finite Gaussian mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We present an alternative to the exchangeable prior: by assuming that a small number of latent class labels are known a priori, we can make inference on the component features without postprocessing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a datadependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide practical guidelines for model selection that are motivated by maximizing prior information about the class labels and we demonstrate our method on real and simulated data. 
AnchorNet  Despite significant progress of deep learning in recent years, stateoftheart semantic matching methods still rely on legacy features such as SIFT or HoG. We argue that the strong invariance properties that are key to the success of recent deep architectures on the classification task make them unfit for dense correspondence tasks, unless a large amount of supervision is used. In this work, we propose a deep network, termed AnchorNet, that produces image representations that are wellsuited for semantic matching. It relies on a set of filters whose response is geometrically consistent across different object instances, even in the presence of strong intraclass, scale, or viewpoint variations. Trained only with weak imagelevel labels, the final representation successfully captures information about the object structure and improves results of stateoftheart semantic matching methods such as the deformable spatial pyramid or the proposal flow methods. We show positive results on the crossinstance matching task where different instances of the same object category are matched as well as on a new crosscategory semantic matching task aligning pairs of instances each from a different object class. 
Anchors  We introduce a novel modelagnostic system that explains the behavior of complex models with highprecision rules called anchors, representing local, ‘sufficient’ conditions for predictions. We propose an algorithm to efficiently compute these explanations for any blackbox model with highprobability guarantees. We demonstrate the flexibility of anchors by explaining a myriad of different models for different domains and tasks. In a user study, we show that anchors enable users to predict how a model would behave on unseen instances with less effort and higher precision, as compared to existing linear explanations or no explanations. 
Ancillary Statistic  In statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on the parameters of the model. An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to construct prediction intervals. This concept was introduced by the statistical geneticist Sir Ronald Fisher. https://… Lesson of the Day – Ancillary Statistics HighDimOut 
AndOr Graph Model (AOG) 
This paper presents an explainable AI (XAI) system that provides explanations for its predictions. The system consists of two key components — namely, the prediction AndOr graph (AOG) model for recognizing and localizing concepts of interest in input data, and the XAI model for providing explanations to the user about the AOG’s predictions. In this work, we focus on the XAI model specified to interact with the user in natural language, whereas the AOG’s predictions are considered given and represented by the corresponding parse graphs (pg’s) of the AOG. Our XAI model takes pg’s as input and provides answers to the user’s questions using the following types of reasoning: direct evidence (e.g., detection scores), partbased inference (e.g., detected parts provide evidence for the concept asked), and other evidences from spatiotemporal context (e.g., constraints from the spatiotemporal surround). We identify several correlations between user’s questions and the XAI answers using Youtube Action dataset. 
AngleBased Outlier Detection (ABOD) 
Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All existing approaches, however, are based on an assessment of distances (sometimes indirectly by assuming certain distributions) in the fulldimensional Euclidean data space. In highdimensional data, these approaches are bound to deteriorate due to the notorious ‘curse of dimensionality’. In this paper, we propose a novel approach named ABOD (AngleBased Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points. This way, the effects of the ‘curse of dimensionality’ are alleviated compared to purely distancebased approaches. A main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking. In a thorough experimental evaluation, we compare ABOD to the wellestablished distancebased method LOF for various artificial and a real world data set and show ABOD to perform especially well on highdimensional data. HMM 
Annabell  A cognitive neural architecture able to learn and communicate through natural language 
ANNBenchmarks  This paper describes ANNBenchmarks, a tool for evaluating the performance of inmemory approximate nearest neighbor algorithms. It provides a standard interface for measuring the performance and quality achieved by nearest neighbor algorithms on different standard data sets. It supports several different ways of integrating $k$NN algorithms, and its configuration system automatically tests a range of parameter settings for each algorithm. Algorithms are compared with respect to many different (approximate) quality measures, and adding more is easy and fast; the included plotting frontends can visualise these as images, $\LaTeX$ plots, and websites with interactive plots. ANNBenchmarks aims to provide a constantly updated overview of the current state of the art of $k$NN algorithms. In the short term, this overview allows users to choose the correct $k$NN algorithm and parameters for their similarity search task; in the longer term, algorithm designers will be able to use this overview to test and refine automatic parameter tuning. The paper gives an overview of the system, evaluates the results of the benchmark, and points out directions for future work. Interestingly, very different approaches to $k$NN search yield comparable qualityperformance tradeoffs. The system is available at http://annbenchmarks.com. 
Annealed Generative Adversarial Networks  We introduce a novel framework for adversarial training where the target distribution is annealed between the uniform distribution and the data distribution. We posited a conjecture that learning under continuous annealing in the nonparametric regime is stable irrespective of the divergence measures in the objective function and proposed an algorithm, dubbed {\ss}GAN, in corollary. In this framework, the fact that the initial support of the generative network is the whole ambient space combined with annealing are key to balancing the minimax game. In our experiments on synthetic data, MNIST, and CelebA, {\ss}GAN with a fixed annealing schedule was stable and did not suffer from mode collapse. 
ANNETTO  Deep learning models, while effective and versatile, are becoming increasingly complex, often including multiple overlapping networks of arbitrary depths, multiple objectives and nonintuitive training methodologies. This makes it increasingly difficult for researchers and practitioners to design, train and understand them. In this paper we present ANNETTO, a muchneeded, generic and computeractionable vocabulary for researchers and practitioners to describe their deep learning configurations, training procedures and experiments. The proposed ontology focuses on topological, training and evaluation aspects of complex deep neural configurations, while keeping peripheral entities more succinct. Knowledge bases implementing ANNETTO can support a wide variety of queries, providing relevant insights to users. In addition to a detailed description of the ontology, we demonstrate its suitability to the task via a number of hypothetical usecases of increasing complexity. 
Annotation Chart  Annotation charts are interactive time series line charts that support annotations. kappalab 
Annotation Query Language (AQL) 
Annotation Query Language (AQL) is the language for developing text analytics extractors in the InfoSphere BigInsights Text Analytics system. An extractor is a program written in AQL that extracts structured information from unstructured or semistructured text. AQL is a declarative language. The syntax of AQL is similar to that of Structured Query Language (SQL), but with several important differences. lba 
Annoy  Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large readonly filebased data structures that are mmapped into memory so that many processes may share the same data. RcppAnnoy 
Anomaly Detection  In data mining, anomaly detection (or outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or finding errors in text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions. multimark,dga 
Anomaly Detection Based Power Saving (ADEPOS) 
In industry 4.0, predictive maintenance(PM) is one of the most important applications pertaining to the Internet of Things(IoT). Machine learning is used to predict the possible failure of a machine before the actual event occurs. However, the main challenges in PM are (a) lack of enough data from failing machines, and (b) paucity of power and bandwidth to transmit sensor data to cloud throughout the lifetime of the machine. Alternatively, edge computing approaches reduce data transmission and consume low energy. In this paper, we propose Anomaly Detection based Power Saving(ADEPOS) scheme using approximate computing through the lifetime of the machine. In the beginning of the machines life, low accuracy computations are used when the machine is healthy. However, on the detection of anomalies, as time progresses, the system is switched to higher accuracy modes. We show using the NASA bearing dataset that using ADEPOS, we need 8.8X less neurons on average and based on postlayout results, the resultant energy savings are 6.4 to 6.65X 
Anonymous Information Delivery (AID) 
We introduce the problem of anonymous information delivery (AID), comprised of $K$ messages, a user, and $N$ servers (each holds $M$ messages) that wish to deliver one out of $K$ messages to the user anonymously, i.e., without revealing the delivered message index to the user. This AID problem may be viewed as the dual of the private information retrieval problem. The information theoretic capacity of AID, $C$, is defined as the maximum number of bits of the desired message that can be anonymously delivered per bit of total communication to the user. For the AID problem with $K$ messages, $N$ servers, $M$ messages stored per server, and $N \geq \lceil \frac{K}{M} \rceil$, we provide an achievable scheme of rate $1/\lceil \frac{K}{M} \rceil$ and an information theoretic converse of rate $M/K$, i.e., the AID capacity satisfies $1/\lceil \frac{K}{M} \rceil \leq C \leq M/K$. This settles the capacity of AID when $\frac{K}{M}$ is an integer. When $\frac{K}{M}$ is not an integer, we show that the converse rate of $M/K$ is achievable if $N \geq \frac{K}{\gcd(K,M)} – (\frac{M}{\gcd(K,M)}1)(\lfloor \frac{K}{M} \rfloor 1)$, and the achievable rate of $1/\lceil \frac{K}{M} \rceil$ is optimal if $N = \lceil \frac{K}{M} \rceil$. Otherwise if $\lceil \frac{K}{M} \rceil < N < \frac{K}{\gcd(K,M)} – (\frac{M}{\gcd(K,M)}1)(\lfloor \frac{K}{M} \rfloor 1)$, we give an improved achievable scheme and prove its optimality for several small settings. 
AnonymousNet  With billions of personal images being generated from social media and cameras of all sorts on a daily basis, security and privacy are unprecedentedly challenged. Although extensive attempts have been made, existing face image deidentification techniques are either insufficient in photoreality or incapable of balancing privacy and usability qualitatively and quantitatively, i.e., they fail to answer counterfactual questions such as ‘is it private now?’, ‘how private is it?’, and ‘can it be more private?’ In this paper, we propose a novel framework called AnonymousNet, with an effort to address these issues systematically, balance usability, and enhance privacy in a natural and measurable manner. The framework encompasses four stages: facial attribute estimation, privacymetricoriented face obfuscation, directed natural image synthesis, and adversarial perturbation. Not only do we achieve the stateofthearts in terms of image quality and attribute prediction accuracy, we are also the first to show that facial privacy is measurable, can be factorized, and accordingly be manipulated in a photorealistic fashion to fulfill different requirements and application scenarios. Experiments further demonstrate the effectiveness of the proposed framework. 
ANOVA Kernel Regression  Highorder parametric models that include terms for feature interactions are applied to various data min ing tasks, where ground truth depends on interactions of features. However, with sparse data, the high dimensional parameters for feature interactions often face three issues: expensive computation, difficulty in parameter estimation and lack of structure. Previous work has proposed approaches which can partially re solve the three issues. In particular, models with fac torized parameters (e.g. Factorization Machines) and sparse learning algorithms (e.g. FTRLProximal) can tackle the first two issues but fail to address the third. Regarding to unstructured parameters, constraints or complicated regularization terms are applied such that hierarchical structures can be imposed. However, these methods make the optimization problem more challeng ing. In this work, we propose Strongly Hierarchical Factorization Machines and ANOVA kernel regression where all the three issues can be addressed without making the optimization problem more difficult. Ex perimental results show the proposed models signifi cantly outperform the stateoftheart in two data min ing tasks: coldstart user response time prediction and stock volatility prediction. 
Anscombe’s Quartet  Anscombe’s quartet comprises four datasets that have nearly identical simple statistical properties, yet appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. ➘ “Linear Regression” multispatialCCM 
Answer Set Programming (ASP) 
Answer set programming (ASP) is a form of declarative programming oriented towards difficult (primarily NPhard) search problems. It is based on the stable model (answer set) semantics of logic programming. In ASP, search problems are reduced to computing stable models, and answer set solvers – programs for generating stable models – are used to perform search. The computational process employed in the design of many answer set solvers is an enhancement of the DPLL algorithm and, in principle, it always terminates (unlike Prolog query evaluation, which may lead to an infinite loop). In a more general sense, ASP includes all applications of answer sets to knowledge representation and the use of Prologstyle query evaluation for solving problems arising in these applications. The Seventh Answer Set Programming Competition: Design and Results 
Answer Set Programming – Reinforcement Learning (ASP(RL)) 
Nonstationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing nonstationary domains. 
Ant Colony Optimization (ACO) 
In computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it constitutes some metaheuristic optimizations. Initially proposed by Marco Dorigo in 1992 in his PhD thesis, the first algorithm was aiming to search for an optimal path in a graph, based on the behavior of ants seeking a path between their colony and a source of food. The original idea has since diversified to solve a wider class of numerical problems, and as a result, several problems have emerged, drawing on various aspects of the behavior of ants. nettools 
Anticipatory Analytics  In fact, the application of predictive analytics to intelligence often focuses on what is missing in the data or how certain observations diverge from the expected norm. Some experts insist on calling this process not predictive analytics but anticipatory analytics, in order to distinguish it from similar processes in other realms. ngspatial 
ANTIQUE  Considering the widespread use of mobile and voice search, answer passage retrieval for nonfactoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of largescale nonfactoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop and release a collection of 2,626 opendomain nonfactoid questions from a diverse set of categories. The dataset, called ANTIQUE, contains 34,011 manual relevance annotations. The questions were asked by real users in a community question answering service, i.e., Yahoo! Answers. Relevance judgments for all the answers to each question were collected through crowdsourcing. To facilitate further research, we also include a brief analysis of the data as well as baseline results on both classical and recently developed neural IR models. 
AntiSpoofing with SqueezeExcitation and Residual neTwork (ASSERT) 
We present JHU’s system submission to the ASVspoof 2019 Challenge: AntiSpoofing with SqueezeExcitation and Residual neTworks (ASSERT). Antispoofing has gathered more and more attention since the inauguration of the ASVspoof Challenges, and ASVspoof 2019 dedicates to address attacks from all three major types: texttospeech, voice conversion, and replay. Built upon previous research work on Deep Neural Network (DNN), ASSERT is a pipeline for DNNbased approach to antispoofing. ASSERT has four components: feature engineering, DNN models, network optimization and system combination, where the DNN models are variants of squeezeexcitation and residual networks. We conducted an ablation study of the effectiveness of each component on the ASVspoof 2019 corpus, and experimental results showed that ASSERT obtained more than 93% and 17% relative improvements over the baseline systems in the two subchallenges in ASVspooof 2019, ranking ASSERT one of the top performing systems. Code and pretrained models will be made publicly available. 
Antisymmetrical Initialization (ASI) 
How different initializations and loss functions affect the learning of a deep neural network (DNN), specifically its generalization error, is an important problem in practice. In this work, focusing on regression problems, we develop a kernelnorm minimization framework for the analysis of DNNs in the kernel regime in which the number of neurons in each hidden layer is sufficiently large (Jacot et al. 2018, Lee et al. 2019). We find that, in the kernel regime, for any loss in a general class of functions, e.g., any Lp loss for $1 < p < \infty$, the DNN finds the same global minimathe one that is nearest to the initial value in the parameter space, or equivalently, the one that is closest to the initial DNN output in the corresponding reproducing kernel Hilbert space. With this framework, we prove that a nonzero initial output increases the generalization error of DNN. We further propose an antisymmetrical initialization (ASI) trick that eliminates this type of error and accelerates the training. We also demonstrate experimentally that even for DNNs in the nonkernel regime, our theoretical analysis and the ASI trick remain effective. Overall, our work provides insight into how initialization and loss function quantitatively affect the generalization of DNNs, and also provides guidance for the training of DNNs. 
AntisymmetricRNN  Recurrent neural networks have gained widespread use in modeling sequential data. Learning longterm dependencies using these models remains difficult though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special form of recurrent networks called the AntisymmetricRNN is proposed under this theoretical framework, which is able to capture longterm dependencies thanks to the stability property of its underlying differential equation. Existing approaches to improving RNN trainability often incur significant computation overhead. In comparison, AntisymmetricRNN achieves the same goal by design. We showcase the advantage of this new architecture through extensive simulations and experiments. AntisymmetricRNN exhibits much more predictable dynamics. It outperforms regular LSTM models on tasks requiring longterm memory and matches the performance on tasks where shortterm dependencies dominate despite being much simpler. 
ANTNet  Deep convolutional neural networks have achieved remarkable success in computer vision. However, deep neural networks require large computing resources to achieve high performance. Although depthwise separable convolution can be an efficient module to approximate a standard convolution, it often leads to reduced representational power of networks. In this paper, under budget constraints such as computational cost (MAdds) and the parameter count, we propose a novel basic architectural block, ANTBlock. It boosts the representational power by modeling, in a high dimensional space, interdependency of channels between a depthwise convolution layer and a projection layer in the ANTBlocks. Our experiments show that ANTNet built by a sequence of ANTBlocks, consistently outperforms stateoftheart lowcost mobile convolutional neural networks across multiple datasets. On CIFAR100, our model achieves 75.7% top1 accuracy, which is 1.5% higher than MobileNetV2 with 8.3% fewer parameters and 19.6% less computational cost. On ImageNet, our model achieves 72.8% top1 accuracy, which is 0.8% improvement, with 157.7ms (20% faster) on iPhone 5s over MobileNetV2. 
Anyk Ranking  Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called ‘heterogeneous information networks’ or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the topk matches ac cording to a ranking function over edge and node weights. For users, it is di cult to select value k . We therefore propose the novel notion of an anyk ranking algorithm: for a given time budget, re turn as many of the topranked results as possible. Then, given additional time, produce the next lowerranked results quickly as well. It can be stopped anytime, but may have to continues until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the topranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong nontrivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges. 
Anytime Stochastic Gradient Descent  In this paper, we focus on approaches to parallelizing stochastic gradient descent (SGD) wherein data is farmed out to a set of workers, the results of which, after a number of updates, are then combined at a central master node. Although such synchronized SGD approaches parallelize well in idealized computing environments, they often fail to realize their promised computational acceleration in practical settings. One cause is slow workers, termed stragglers, who can cause the fusion step at the master node to stall, which greatly slowing convergence. In many straggler mitigation approaches work completed by these nodes, while only partial, is discarded completely. In this paper, we propose an approach to parallelizing synchronous SGD that exploits the work completed by all workers. The central idea is to fix the computation time of each worker and then to combine distinct contributions of all workers. We provide a convergence analysis and optimize the combination function. Our numerical results demonstrate an improvement of several factors of magnitude in comparison to existing methods. 
Anytime Universal Intelligence Test (AUIT) 
Collective intelligence is manifested when multiple agents coherently work in observation, interaction, decisionmaking and action. In this paper, we define and quantify the intelligence level of heterogeneous agents group with the improved Anytime Universal Intelligence Test(AUIT), based on an extension of the existing evaluation of homogeneous agents group. The relationship of intelligence level with agents composition, group size, spatial complexity and testing time is analyzed. The intelligence level of heterogeneous agents groups is compared with the homogeneous ones to analyze the effects of heterogeneity on collective intelligence. Our work will help to understand the essence of collective intelligence more deeply and reveal the effect of various key factors on group intelligence level. 
AOGParsing Operator  This paper presents a method of learning qualitatively interpretable models in object detection using popular twostage regionbased ConvNet detection systems (i.e., RCNN). RCNN consists of a region proposal network and a RoI (RegionofInterest) prediction network.By interpretable models, we focus on weaklysupervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. We utilize a topdown hierarchical and compositional grammar model embedded in a directed acyclic ANDOR Graph (AOG) to explore and unfold the space of latent part configurations of RoIs. We propose an AOGParsing operator to substitute the RoIPooling operator widely used in RCNN, so the proposed method is applicable to many stateoftheart ConvNet based detection systems. The AOGParsing operator aims to harness both the explainable rigor of topdown hierarchical and compositional grammar models and the discriminative power of bottomup deep neural networks through endtoend training. In detection, a bounding box is interpreted by the best parse tree derived from the AOG onthefly, which is treated as the extractive rationale generated for interpreting detection. In learning, we propose a foldingunfolding method to train the AOG and ConvNet endtoend. In experiments, we build on top of the RFCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets with performance comparable to stateoftheart methods. 
Apache Accumulo  The Apache Accumulo sorted, distributed key/value store is based on Google’s BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of celllevel access labels and a serverside programming mechanism that can modify key/value pairs at various points in the data management process. oaxaca,GeneralOaxaca 
Apache Airflow  Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative. 
Apache Ambari  Apache Ambari makes Hadoop cluster provisioning, managing, and monitoring dead simple. pcalg 
Apache Apex  1. Enterprise Grade: Apex is a Hadoop YARN native platform that unifies stream and batch processing. It processes big data inmotion in a way that is highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and easily operable. 2. Low BarriertoEntry: Write your business logic and leave all operability to the platform. It provides a simple API that enables developers to write or reuse generic Java code, thereby lowering the expertise needed to write big data applications. 3. Modular: The Apex platform comes with Malhar, a library of operators (units of functionality) that can be leveraged to quickly create nontrivial applications. Includes many connectors for messaging systems, databases, files etc. 
Apache Atlas  Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem. Apache Atlas provides open metadata management and governance capabilities for organizations to build a catalog of their data assets, classify and govern these assets and provide collaboration capabilities around these data assets for data scientists, analysts and the data governance team. 
Apache Avro  Apache Avro is a data serialization system. 
Apache Beam  Apache Beam provides an advanced unified programming model, allowing you to implement batch and streaming data processing jobs that can run on any execution engine. Apache Beam is: · UNIFIED – Use a single programming model for both batch and streaming use cases. · PORTABLE – Execute pipelines on multiple execution environments, including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. · EXTENSIBLE – Write and share new SDKs, IO connectors, and transformation libraries. 
Apache Bigtop  Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a community around the packaging and interoperability testing of Hadooprelated projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc…) developed by a community with a focus on the system as a whole, rather than individual projects. In short we strive to be for Hadoop what Debian is to Linux. QFRM 
Apache Calcite  Apache Calcite is a Dynamic Data Management Framework. It Contains Many of the Pieces That Comprise a Typical Database Management System, but Omits Some key Functions: Storage of Data, Algorithms to Process Data, and a Repository for Storing Metadata. Calcite Intentionally Stays out of the Business of Storing and Processing Data. As we Shall See, This Makes it an Excellent Choice for Mediating Between Applications and one or More Data Storage Locations and Data Processing Engines. It is Also a Perfect Foundation for Building a Database: Just add Data. Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources 
Apache Cassandra / Cassandra Query Language (CQL) 
Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra also places a high value on performance. In 2012, University of Toronto researchers studying NoSQL systems concluded that “In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments.” QuACN 
Apache Chukwa  Chukwa is an open source data collection system for monitoring large distributed systems. Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying, monitoring and analyzing results to make the best use of the collected data. rbokeh 
Apache Clerezza  Clerezza allows to easily develop semantic web applications by providing tools to manipulate RDF data, create RESTful Web Services and Renderlets using ScalaServerPages. Contents are stored as triples based on W3C RDF specification. These triples are stored via Clerezza’s Smart Content Binding (SCB). SCB defines a technologyagnostic layer to access and modify triple stores. It provides a java implementation of the graph data model specified by W3C RDF and functionalities to operate on that data model. SCB offers a service interface to access multiple named graphs and it can use various providers to manage RDF graphs in a technology specific manner, e.g., using Jena or Sesame. It also provides for adaptors that allow an application to use various APIs (including the Jena api) to process RDF graphs. Furthermore, SCB offers a serialization and a parsing service to convert a graph into a certain representation (format) and vice versa. RcppAnnoy 
Apache CloudStack  Apache CloudStack is open source software designed to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. CloudStack is used by a number of service providers to offer public cloud services, and by many companies to provide an onpremises (private) cloud offering, or as part of a hybrid cloud solution. CloudStack is a turnkey solution that includes the entire “stack” of features most organizations want with an IaaS cloud: compute orchestration, NetworkasaService, user and account management, a full and open native API, resource accounting, and a firstclass User Interface (UI). CloudStack currently supports the most popular hypervisors: VMware, KVM, XenServer and Xen Cloud Platform (XCP). Users can manage their cloud with an easy to use Web interface, command line tools, and / or a fullfeatured RESTful API. In addition, CloudStack provides an API that’s compatible with AWS EC2 and S3 for organizations that wish to deploy hybrid clouds. SACOBRA 
Apache Commons Mathematics Library  Commons Math is a library of lightweight, selfcontained mathematics and statistics components addressing the most common problems not available in the Java programming language or Commons Lang. commonsMath 
Apache Crunch  The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many userdefined functions simple to write, easy to test, and efficient to run. Running on top of Hadoop MapReduce and Apache Spark, the Apache Crunch library is a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce. The APIs are especially useful when processing data that does not fit naturally into relational model, such as time series, serialized object formats like protocol buffers or Avro records, and HBase rows and columns. For Scala users, there is the Scrunch API, which is built on top of the Java APIs and includes a REPL (readevalprint loop) for creating MapReduce pipelines. smacpod 
Apache Curator  New users of ZooKeeper are surprised to learn that a significant amount of connection management must be done manually. For example, when the ZooKeeper client connects to the ensemble it must negotiate a new session, etc. This takes some time. If you use a ZooKeeper client API before the connection process has completed, ZooKeeper will throw an exception. These types of exceptions are referred to as “recoverable” errors. Curator automatically handles connection management, greatly simplifying client code. Instead of directly using the ZooKeeper APIs you use Curator APIs that internally check for connection completion and wrap each ZooKeeper API in a retry loop. Curator uses a retry mechanism to handle recoverable errors and automatically retry operations. The method of retry is customizable. Curator comes bundled with several implementations (ExponentialBackoffRetry, etc.) or custom implementations can be written. smbinning 
Apache DataFu  Apache DataFu consists of two libraries: Apache DataFu Pig is a collection of useful userdefined functions for data analysis in Apache Pig. Apache DataFu Hourglass is a library for incrementally processing data using Apache Hadoop MapReduce. This library was inspired by the prevelance of sliding window computations over daily tracking data. Computations such as these typically happen at regular intervals (e.g. daily, weekly), and therefore the sliding nature of the computations means that much of the work is unnecessarily repeated. DataFu’s Hourglass was created to make these computations more efficient, yielding sometimes 5095% reductions in computational resources. smoof 
Apache Drill  Apache Drill is an open source, low latency SQL query engine for Hadoop and NoSQL. It is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google’s Dremel. snht 
Apache Druid  Druid is primarily used to store, query, and analyze large event streams. Examples of event streams include user generated data such as clickstreams, application generated data such as performance metrics, and machine generated data such as network flows and server metrics. Druid is optimized for subsecond queries to sliceanddice, drill down, search, filter, and aggregate this data. Druid is commonly used to power interactive applications where performance, concurrency, and uptime are important. 
Apache Falcon  Apache Falcon is a data processing and management solution for Hadoop designed for data motion, coordination of data pipelines, lifecycle management, and data discovery. Falcon enables end consumers to quickly onboard their data and its associated processing and management tasks on Hadoop clusters. stats 
Apache Flink  Apache Flink is an open source platform for scalable batch and stream data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink includes several APIs for creating applications that use the Flink engine: 1. DataSet API for static data embedded in Java, Scala, and Python, 2. DataStream API for unbounded streams embedded in Java and Scala, and 3. Table API with a SQLlike expression language embedded in Java and Scala. Flink also bundles libraries for domainspecific use cases: 1. Machine Learning library, and 2. Gelly, a graph processing API and library. You can integrate Flink easily with other wellknown open source systems both for data input and output as well as deployment. ➘ “Gelly” support.BWS 
Apache Flume  Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store VIFCP 
Apache Forrest  Apache Forrest software is a publishing framework that transforms input from various sources into a unified presentation in one or more output formats. The modular and extensible plugin architecture of Apache Forrest is based on Apache Cocoon and the relevant industry standards that separate presentation from content. Forrest can generate static documents, or be used as a dynamic server, or be deployed by its automated facility. WCE,joint.Cox 
Apache Gearpump  Apache Gearpump is a realtime big data streaming engine. The name Gearpump is a reference to the engineering term ‘gear pump’ which is a super simple pump that consists of only two gears, but is very powerful at streaming water. Different to other streaming engines, Gearpump’s engine is event/message based. Per initial benchmarks we are able to process 18 million messages per second (message length is 100 bytes) with a 8ms latency on a 4node cluster. 
Apache Giraph  Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections. zoib 
Apache Hadoop  Apache Hadoop is an opensource software framework for storage and largescale processing of datasets on clusters of commodity hardware. Hadoop is an Apache toplevel project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0. The Apache Hadoop framework is composed of the following modules: · Hadoop Common – contains libraries and utilities needed by other Hadoop modules · Hadoop Distributed File System (HDFS) – a distributed filesystem that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. · Hadoop YARN – a resourcemanagement platform responsible for managing compute resources in clusters and using them for scheduling of users’ applications. · Hadoop MapReduce – a programming model for large scale data processing. Hadoop is being regarded as one of the best platforms for storing and managing big data. It owes its success to its high data storage and processing scalability, low price/performance ratio, high performance, high availability, high schema flexibility, and its capability to handle all types of data. 
Apache Hama  The Apache Hama is an efficient and scalable generalpurpose BSP computing engine which can be used to speed up a large variety of computeintensive analytics applications. 
Apache HBase  Use Apache HBase software when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. HBase is an opensource, distributed, versioned, columnoriented store modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtablelike capabilities on top of Hadoop and HDFS. 
Apache Hive  The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides * tools to enable easy data extract/transform/load (ETL) * a mechanism to impose structure on a variety of data formats * access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM) * query execution via MapReduce Hive defines a simple SQLlike query language, called HiveQL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the builtin capabilities of the language. HiveQL can also be extended with custom scalar functions (UDF’s), aggregations (UDAF’s), and table functions (UDTF’s). 
Apache Ignite  Apache Ignite InMemory Data Fabric is a highperformance, integrated and distributed inmemory platform for computing and transacting on largescale data sets in realtime, orders of magnitude faster than possible with traditional diskbased or flash technologies. Apache Ignite InMemory Data Fabric is designed to deliver uncompromised performance for a wide set of inmemory computing use cases from high performance computing, to the industry most advanced data grid, highly available service grid, and streaming. 
Apache Jena  Apache Jena provides a complete framework for building Semantic Web and Linked Data applications in Java, and provides: parsers for RDF/XML, Turtle and Ntriples; a Java programming API; a complete implementation of the SPARQL query language; a rulebased inference engine for RDFS and OWL entailments; TDB (a nonSQL persistent triple store); SDB (a persistent triples store built on a relational store) and Fuseki, an RDF server using web protocols. Jena complies with all relevant recommendations for RDF and related technologies from the W3C. 
Apache Kafka  Apache Kafka is an opensource message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, highthroughput, lowlatency platform for handling realtime data feeds. The design is heavily influenced by transactions logs. 
Apache Knox Gateway  The Apache Knox Gateway is an Application Gateway for interacting with the REST APIs and UIs of Apache Hadoop deployments. The Knox Gateway provides a single access point for all REST and HTTP interactions with Apache Hadoop clusters. 
Apache Lucene  The goal of Apache Lucene and Solr is to provide world class search capabilities. The Apache Lucene project develops opensource search software, including: · Lucene Core, our flagship subproject, provides Javabased indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. · Solr is a high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface. · Open Relevance Project is a subproject with the aim of collecting and distributing free materials for relevance testing and performance. · PyLucene is a Python port of the Core project. http://…/structureapachelucene 
Apache Lucy  The Apache Lucy search engine library provides fulltext search for dynamic programming languages. 
Apache Mahout  Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification. Many of the implementations use the Apache Hadoop platform. Mahout also provides Java libraries for common math operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; the number of implemented algorithms has grown quickly, but various algorithms are still missing. 
Apache NiFi  Put simply NiFi was built to automate the flow of data between systems. While the term ‘dataflow’ is used in a variety of contexts, we use it here to mean the automated and managed flow of information between systems. This problem space has been around ever since enterprises had more than one system, where some of the systems created data and some of the systems consumed data. The problems and solution patterns that emerged have been discussed and articulated extensively. A comprehensive and readily consumed form is found in the Enterprise Integration Patterns [eip]. 
Apache Object Oriented Data Technology (OODT) 
Metadata for middleware (and vice versa): · Transparent access to distributed resources · Data discovery and query optimization · Distributed processing and virtual archives But it’s not just for science! It’s also a software architecture: · Models for information representation · Solutions to knowledge capture problems · Unification of technology, data, and metadata 
Apache Oozie  Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java mapreduce, Streaming mapreduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). 
Apache OpenNLP  Apache OpenNLP software supports the most common NLP tasks, such as tokenization, sentence segmentation, partofspeech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.. 
Apache Phoenix  Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the noSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL. Phoenix compiles queries and other statements into native noSQL store APIs rather than using MapReduce enabling the building of low latency applications on top of noSQL stores. 
Apache Pig  Pig is a highlevel platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF (User Defined Functions) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language. Pig was originally developed at Yahoo Research around 2006 for researchers to have an adhoc way of creating and executing mapreduce jobs on very large data sets. In 2007, it was moved into the Apache Software Foundation. 
Apache Pulsar  Pulsar is a distributed pubsub messaging platform with a very flexible messaging model and an intuitive client API. 
Apache REEF  REEF, (Retainable Evaluator Execution Framework), is our approach to simplify and unify the lower layers of big data systems on modern resource managers. For managers like Apache YARN, Apache Mesos, Google Omega, and Facebook Corona, REEF provides a centralized control plane abstraction that can be used to build a decentralized data plane for supporting big data systems. Special consideration is given to graph computation and machine learning applications, both of which require data retention on allocated resources to execute multiple passes over the data. More broadly, applications that run on YARN will have the need for a variety of dataprocessing tasks e.g., data shuffle, group communication, aggregation, checkpointing, and many more. Rather than reimplement these for each application, REEF aims to provide them in a library form, so that they can be reused by higherlevel applications and tuned for a specific domain problem e.g., Machine Learning. In that sense, our longterm vision is that REEF will mature into a Big Data Application Server, that will host a variety of tool kits and applications, on modern resource managers. 
Apache S4 (S4) 
S4 is a generalpurpose, distributed, scalable, faulttolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. S4 fills the gap between complex proprietary systems and batchoriented open source computing platforms. We aim to develop a high performance computing platform that hides the complexity inherent in parallel processing system from the application programmer. The core platform is written in Java. The implementation is modular and pluggable, and S4 applications can be easily and dynamically combined for creating more sophisticated stream processing systems. 
Apache SAMOA (SAMOA) 
Apache SAMOA (Scalable Advanced Massive Online Analysis) is an opensource platform for mining big data streams. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage, and analyze, due to the time and memory complexity. Apache SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Apache Flink, Apache Storm, and Apache Samza. Apache SAMOA is written in Java and is available at https://samoa.incubator.apache.org under the Apache Software License version 2.0. LargeScale Learning from Data Streams with Apache SAMOA 
Apache Samza  Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management. 
Apache SINGA  SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. A variety of popular deep learning models are supported, namely feedforward models including convolutional neural networks (CNN), energy models like restricted Boltzmann machine (RBM), and recurrent neural networks (RNN). Many builtin layers are provided for users. SINGA architecture is sufficiently flexible to run synchronous, asynchronous and hybrid training frameworks. SINGA also supports different neural net partitioning schemes to parallelize the training of large models, namely partitioning on batch dimension, feature dimension or hybrid partitioning. 
Apache Spark (Spark) 
Apache Spark is an opensource data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop opensource community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the twostage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce, for certain applications. Spark provides primitives for inmemory cluster computing that allows user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms. 
Apache Spatial Information System  Apache SIS provides data structures for geographic data and associated metadata along with methods to manipulate those data structures. The library is an implementation of GeoAPI interfaces and can be used for desktop or server applications. 
Apache Sqoop  Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. 
Apache Stanbol  Apache Stanbol provides a set of reusable components for semantic content management. Apache Stanbol’s intended use is to extend traditional content management systems with semantic services. Other feasible use cases include: direct usage from web applications (e.g. for tag extraction/suggestion; or text completion in search fields), ‘smart’ content workflows or email routing based on extracted entities, topics, etc Apache Stanbol – the Semantic Engine. In order to be used as a semantic engine via its services, all components offer their functionalities in terms of a RESTful web service API. Apache Stanbol’s main features are: · Content Enhancement: Services that add semantic information to ‘nonsemantic’ pieces of content. · Reasoning: Services that are able to retrieve additional semantic information about the content based on the semantic information retrieved via content enhancement. · Knowledge Models: Services that are used to define and manipulate the data models (e.g. ontologies) that are used to store the semantic information. · Persistence: Services that store (or cache) semantic information, i.e. enhanced content, entities, facts, and make it searchable. http://…/Apache_Stanbol 
Apache Storm (Storm) 
Storm is a distributed computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. It uses custom created “spouts” and “bolts” to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on September 17, 2011. A Storm application is designed as a topology of interfaces which create a “stream” of transformations. It provides similar functionality as a MapReduce job with the exception that the topology will theoretically run indefinitely until it is manually terminated. In 2013 the Apache Software Foundation has accepted Storm into its incubator program. 
Apache Tajo  A big data warehouse system on Hadoop. Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for lowlatency and scalable adhoc queries, online aggregation, and ETL (extracttransformload process) on largedata sets stored on HDFS (Hadoop Distributed File System) and other data sources. By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities. 
Apache Texen  Texen is a general purpose text generating utility. It is capable of producing almost any sort of text output. Driven by Ant, essentially an Ant Task, Texen uses a control template, an optional set of worker templates, and control context to govern the generated output. Although TexenTask can be used directly, it is usually subclassed to initialize your control context before generating any output. 
Apache Tez  Apache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem. 
Apache Thrift  The Apache Thrift software framework, for scalable crosslanguage services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages. thriftr 
Apache Tika  The Apache Tika toolkit detects and extracts metadata and text content from various documents – from PPT to CSV to PDF – using existing parser libraries. Tika unifies these parsers under a single interface to allow you to easily parse over a thousand different file types. Tika is useful for search engine indexing, content analysis, translation, and much more. 
Apache Twill  Apache Twill is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed applications, allowing developers to focus more on their application logic. Apache Twill allows you to use YARN’s distributed capabilities with a programming model that is similar to running threads. 
Apache UIMA  Unstructured Information Management Applications (UIMA) are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as worksfor or locatedat. UIMA is made of many things UIMA enables applications to be decomposed into components, for example ‘language identification’ => ‘language specific segmentation’ => ‘sentence boundary detection’ => ‘entity detection (person/place names etc.)’. Each component implements interfaces defined by the framework and provides selfdescribing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes. Apache UIMA is an Apachelicensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS , a standards organization). 
Apache Yarn (Yarn) 
We present the next generation of Hadoop compute platform known as YARN, which departs from its familiar, monolithic architecture. By separating resource management functions from the programming model, YARN delegates many schedulingrelated functions to perjob components. In this new context, MapReduce is just one of the applications running on top of YARN. This separation provides a great deal of flexibility in the choice of programming framework. Examples of alternative programming models that are becoming available on YARN are: Dryad, Giraph, Hoya, REEF, Spark, Storm and Tez. 
Apache Zeppelin  A webbased notebook that enables interactive data analytics. You can make beautiful datadriven, interactive and collaborative documents with SQL, Scala and more. 
Apache ZooKeeper  Apache ZooKeeper is an effort to develop and maintain an opensource server which enables highly reliable distributed coordination. 
APES  Assisted by neural networks, reinforcement learning agents have been able to solve increasingly complex tasks over the last years. The simulation environment in which the agents interact is an essential component in any reinforcement learning problem. The environment simulates the dynamics of the agents’ world and hence provides feedback to their actions in terms of state observations and external rewards. To ease the design and simulation of such environments this work introduces $\texttt{APES}$, a highly customizable and open source package in Python to create 2D gridworld environments for reinforcement learning problems. $\texttt{APES}$ equips agents with algorithms to simulate any field of vision, it allows the creation and positioning of items and rewards according to userdefined rules, and supports the interaction of multiple agents. 
apk2vec  Building behavior profiles of Android applications (apps) with holistic, rich and multiview information (e.g., incorporating several semantic views of an app such as API sequences, system calls, etc.) would help catering downstream analytics tasks such as app categorization, recommendation and malware analysis significantly better. Towards this goal, we design a semisupervised Representation Learning (RL) framework named apk2vec to automatically generate a compact representation (aka profile/embedding) for a given app. More specifically, apk2vec has the three following unique characteristics which make it an excellent choice for largescale app profiling: (1) it encompasses information from multiple semantic views such as API sequences, permissions, etc., (2) being a semisupervised embedding technique, it can make use of labels associated with apps (e.g., malware family or app category labels) to build high quality app profiles, and (3) it combines RL and feature hashing which allows it to efficiently build profiles of apps that stream over time (i.e., online learning). The resulting semisupervised multiview hash embeddings of apps could then be used for a wide variety of downstream tasks such as the ones mentioned above. Our extensive evaluations with more than 42,000 apps demonstrate that apk2vec’s app profiles could significantly outperform stateoftheart techniques in four app analytics tasks namely, malware detection, familial clustering, app clone detection and app recommendation. 
APPG  A popular asynchronous protocol for decentralized optimization is randomized gossip where a pair of neighbors concurrently update via pairwise averaging. In practice, this creates deadlocks and is vulnerable to information delays. It can also be problematic if a node is unable to response or has only access to its privatepreserved local dataset. To address these issues simultaneously, this paper proposes an asynchronous decentralized algorithm, i.e. APPG, with {\em directed} communication where each node updates {\em asynchronously} and independently of any other node. If local functions are stronglyconvex with Lipschitzcontinuous gradients, each node of APPG converges to the same optimal solution at a rate of $O(\lambda^k)$, where $\lambda\in(0,1)$ and the virtual counter $k$ increases by 1 no matter on which node updates. The superior performance of APPG is validated on a logistic regression problem against stateoftheart methods in terms of linear speedup and system implementations. 
Application Function Library (AFL) 
SAP HANA Library References: The SAP HANA Business Function Library (BFL) Reference describes the Business Function Library (BFL) delivered with SAP HANA. This application function library (AFL) contains prebuilt financial functions implemented in C++. The SAP HANA Predictive Analysis Library (PAL) Reference describes the Predictive Analysis Library (PAL) delivered with SAP HANA. This application function library (AFL) defines functions that can be called from within SAP HANA SQLScript procedures to perform analytic algorithms. 
Application Function Modeler (AFM) 
The Application Function Modeler 2.0 (AFM 2) is a graphical editor for complex data analysis pipelines in the HANA Studio. This tool is based on the HANA Data Scientist prototype developed at the HANA Platform Innovation Center in Potsdam, Germany. It is planned to be the next generation of the existing HANA Studio Application Function Modeler which was developed at the TIP CE&SP Algorithm Labs in Shanghai, China. The AFM 2 team consists of original and new developers from both locations. 
Applied Mathematics  Applied mathematics is a branch of mathematics that deals with mathematical methods that find use in science, engineering, business, computer science, and industry. Thus, ‘applied mathematics’ is a mathematical science with specialized knowledge. The term ‘applied mathematics’ also describes the professional specialty in which mathematicians work on practical problems by formulating and studying mathematical models. In the past, practical applications have motivated the development of mathematical theories, which then became the subject of study in pure mathematics where abstract concepts are studied for their own sake. The activity of applied mathematics is thus intimately connected with research in pure mathematics. 
Applied Singular Spectrum Analysis (ASSA) 
Functions to model and decompose time series into principal components using singular spectrum analysis (de Carvalho and Rua (2017) <doi:10.1016/j.ijforecast.2015.09.004>; de Carvalho et al (2012) <doi:10.1016/j.econlet.2011.09.007>). ASSA 
Approximate Bayesian Computation (ABC) 
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. In all modelbased statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically wellfounded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences, e.g. in population genetics, ecology, epidemiology, and systems biology. Overview of Approximate Bayesian Computation 
Approximate Common Variance (AComVar) 
We consider nonregular fractions of factorial experiments for a class of linear models. These models have a common general mean and main effects, however they may have different 2factor interactions. Here we assume for simplicity that 3factor and higher order interactions are negligible. In the absence of a priori knowledge about which interactions are important, it is reasonable to prefer a design that results in equal variance for the estimates of all interaction effects to aid in model discrimination. Such designs are called common variance designs and can be quite challenging to identify without performing an exhaustive search of possible designs. In this work, we introduce an extension of common variance designs called approximate common variance, or AComVar designs. We develop a numerical approach to finding AComVar designs that is much more efficient than an exhaustive search. We present the types of AComVar designs that can be found for different number of factors, runs, and interactions. We further demonstrate the competitive performance of both common variance and AComVar designs with PlackettBurman designs for model selection using simulation. 
Approximate Computing  Approximate computing is a computation technique which returns a possibly inaccurate result rather than a guaranteed accurate result, and can be used for applications where an approximate result is sufficient for its purpose. One example of such situation is for a search engine where no exact answer may exist for a certain search query and hence, many answers may be acceptable. Similarly, occasional dropping of some frames in a video application can go undetected due to perceptual limitations of humans. Approximate computing is based on the observation that in many scenarios, although performing exact computation requires large amount of resources, allowing bounded approximation can provide disproportionate gains in performance and energy, while still achieving acceptable result accuracy. For example, in kmeans clustering algorithm, allowing only 5% loss in classification accuracy can provide 50 times energy saving compared to the fully accurate classification. The key requirement in approximate computing is that approximation can be introduced only in noncritical data, since approximating critical data (e.g., control operations) can lead to disastrous consequences, such as program crash or erroneous output. autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components 
Approximate Entropy  In statistics, an approximate entropy (ApEn) is a technique used to quantify the amount of regularity and the unpredictability of fluctuations over timeseries data. 
Approximate Exploration  Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call \emph{approximate exploration}. We first provide results when the approximation is explicit, quantifying the performance of an exploration algorithm, MBIEEB \citep{strehl2008analysis}, when combined with state aggregation. In particular, we show that this allows the agent to trade off between learning speed and quality of the policy learned. We then turn to a successful exploration scheme in practical, pseudocount based exploration bonuses \citep{bellemare2016unifying}. We show that choosing a density model implicitly defines an abstraction and that the pseudocount bonus incentivizes the agent to explore using this abstraction. We find, however, that implicit exploration may result in a mismatch between the approximated value function and exploration bonus, leading to either under or overexploration. 
Approximate LeaveOneOut Cross Validation (ALOOCV) 
We consider the parametric learning problem, where the objective of the learner is determined by a parametric loss function. Employing empirical risk minimization with possibly regularization, the inferred parameter vector will be biased toward the training samples. Such bias is measured by the cross validation procedure in practice where the data set is partitioned into a training set used for training and a validation set, which is not used in training and is left to measure the outofsample performance. A classical cross validation strategy is the leaveoneout cross validation (LOOCV) where one sample is left out for validation and training is done on the rest of the samples that are presented to the learner, and this process is repeated on all of the samples. LOOCV is rarely used in practice due to the high computational complexity. In this paper, we first develop a computationally efficient approximate LOOCV (ALOOCV) and provide theoretical guarantees for its performance. Then we use ALOOCV to provide an optimization algorithm for finding the regularizer in the empirical risk minimization framework. In our numerical experiments, we illustrate the accuracy and efficiency of ALOOCV as well as our proposed framework for the optimization of the regularizer. 
Approximate Message Passing (AMP) 
The standard linear regression (SLR) problem is to recover a vector $\mathbf{x}^0$ from noisy linear observations $\mathbf{y}=\mathbf{Ax}^0+\mathbf{w}$. The approximate message passing (AMP) algorithm recently proposed by Donoho, Maleki, and Montanari is a computationally efficient iterative approach to SLR that has a remarkable property: for large i.i.d.\ subGaussian matrices $\mathbf{A}$, its periteration behavior is rigorously characterized by a scalar stateevolution whose fixed points, when unique, are Bayes optimal. AMP, however, is fragile in that even small deviations from the i.i.d.\ subGaussian model can cause the algorithm to diverge. This paper considers a ‘vector AMP’ (VAMP) algorithm and shows that VAMP has a rigorous scalar stateevolution that holds under a much broader class of large random matrices $\mathbf{A}$: those that are rightrotationally invariant. After performing an initial singular value decomposition (SVD) of $\mathbf{A}$, the periteration complexity of VAMP can be made similar to that of AMP. In addition, the fixed points of VAMP’s state evolution are consistent with the replica prediction of the minimum meansquared error recently derived by Tulino, Caire, Verd\’u, and Shamai. The effectiveness and state evolution predictions of VAMP are confirmed in numerical experiments. 
Approximate Nearest Neighbor (ANN) 
In some applications it may be acceptable to retrieve a ‘good guess’ of the nearest neighbor. In those cases, we can use an algorithm which doesn’t guarantee to return the actual nearest neighbor in every case, in return for improved speed or memory savings. Often such an algorithm will find the nearest neighbor in a majority of cases, but this depends strongly on the dataset being queried. Algorithms that support the approximate nearest neighbor search include localitysensitive hashing, best bin first and balanced boxdecomposition tree based search. Approximate Nearest Neighbor Search in High Dimensions 
Approximate Query Processing (AQP) 
While Big Data opens the possibility of gaining unprecedented insights, it comes at the price of increased need for computational resources (or risk of higher latency) for answering queries over voluminous data. The ability to provide approximate answers to queries at a fraction of the cost of executing the query in the traditional way, has the disruptive potential of allowing us to explore large datasets efficiently. Specifically, such techniques could prove effective in helping data scientists identify the subset of data that needs further drilldown, discovering trends, and enabling fast visualization. In this article, we will focus on approximate query processing schemes that are based on sampling techniques. An approximate query processing (AQP) scheme can be characterized by the generality of the query language it supports, its error model and accuracy guarantee, the amount of work saved at runtime, and the amount of additional resources it requires in precomputation. These dimensions of an AQP scheme are not independent and much of the past work makes specific choices along the above four dimensions. Specifically, it seems impossible to have an AQP system that supports the richness of SQL with significant saving of work while providing an accuracy guarantee that is acceptable to a broad set of application workloads. Put another way, you cannot have it all. MISS: Finding Optimal Sample Sizes for Approximate Analytics Approximate Query Processing using Deep Generative Models 
Approximate Random Dropout  The training phases of Deep neural network (DNN) consume enormous processing time and energy. Compression techniques for inference acceleration leveraging the sparsity of DNNs, however, can be hardly used in the training phase. Because the training involves dense matrixmultiplication using GPGPU, which endorse regular and structural data layout. In this paper, we exploit the sparsity of DNN resulting from the random dropout technique to eliminate the unnecessary computation and data access for those dropped neurons or synapses in the training phase. Experiments results on MLP and LSTM on standard benchmarks show that the proposed Approximate Random Dropout can reduce the training time by half on average with ignorable accuracy loss. 
Approximate Semantic Matching  Eventbased systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, eventbased systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain eventbased systems. In order to address semantic coupling within eventbased systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauribased and distributional semanticsbased similarity and relatedness measures. The matcher is evaluated over a structured representation of Wikipedia and Freebase events. Initial evaluations show that the approach matches events with a maximal combined precisionrecall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach. Towards an Inexact Semantic Complex Event Processing Framework 
Approximate Survey Propagation (ASP) 
Approximate message passing algorithm enjoyed considerable attention in the last decade. In this paper we introduce a variant of the AMP algorithm that takes into account glassy nature of the system under consideration. We coin this algorithm as the approximate survey propagation (ASP) and derive it for a class of lowrank matrix estimation problems. We derive the state evolution for the ASP algorithm and prove that it reproduces the onestep replica symmetry breaking (1RSB) fixedpoint equations, wellknown in physics of disordered systems. Our derivation thus gives a concrete algorithmic meaning to the 1RSB equations that is of independent interest. We characterize the performance of ASP in terms of convergence and meansquared error as a function of the free Parisi parameter s. We conclude that when there is a model mismatch between the true generative model and the inference model, the performance of AMP rapidly degrades both in terms of MSE and of convergence, while ASP converges in a larger regime and can reach lower errors. Among other results, our analysis leads us to a striking hypothesis that whenever s (or other parameters) can be set in such a way that the Nishimori condition $M=Q>0$ is restored, then the corresponding algorithm is able to reach meansquared error as low as the Bayesoptimal error obtained when the model and its parameters are known and exactly matched in the inference procedure. 
Approximately Binary Clamping (ABC) 
Although traditionally binary visual representations are mainly designed to reduce computational and storage costs in the image retrieval research, this paper argues that binary visual representations can be applied to large scale recognition and detection problems in addition to hashing in retrieval. Furthermore, the binary nature may make it generalize better than its realvalued counterparts. Existing binary hashing methods are either twostage or hinging on loss term regularization or saturated functions, hence converge slowly and only emit soft binary values. This paper proposes Approximately Binary Clamping (ABC), which is nonsaturating, endtoend trainable, with fast convergence and can output true binary visual representations. ABC achieves comparable accuracy in ImageNet classification as its realvalued counterpart, and even generalizes better in object detection. On benchmark image retrieval datasets, ABC also outperforms existing hashing methods. 
Approximation Tree  This paper examines the stability of learned explanations for blackbox predictions via model distillation with decision trees. One approach to intelligibility in machine learning is to use an understandable `student’ model to mimic the output of an accurate `teacher’. Here, we consider the use of regression trees as a student model, in which nodes of the tree can be used as `explanations’ for particular predictions, and the whole structure of the tree can be used as a global representation of the resulting function. However, individual trees are sensitive to the particular data sets used to train them, and an interpretation of a student model may be suspect if small changes in the training data have a large effect on it. In this context, access to outcomes from a teacher helps to stabilize the greedy splitting strategy by generating a much larger corpus of training examples than was originally available. We develop tests to ensure that enough examples are generated at each split so that the same splitting rule would be chosen with high probability were the tree to be re trained. Further, we develop a stopping rule to indicate how deep the tree should be built based on recent results on the variability of Random Forests when these are used as the teacher. We provide concrete examples of these procedures on the CADMDD and COMPAS data sets. 
APRIL  We propose a method to perform automatic document summarisation without using reference summaries. Instead, our method interactively learns from users’ preferences. The merit of preferencebased interactive summarisation is that preferences are easier for users to provide than reference summaries. Existing preferencebased interactive learning methods suffer from high sample complexity, i.e. they need to interact with the oracle for many rounds in order to converge. In this work, we propose a new objective function, which enables us to leverage active learning, preference learning and reinforcement learning techniques in order to reduce the sample complexity. Both simulation and realuser experiments suggest that our method significantly advances the state of the art. Our source code is freely available at https://…/emnlp2018april. 
Apriori Algorithm  Apriori is an algorithm for frequent item set mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis. 
AprioriGraph  Web Usage Mining is an application of Data Mining Techniques to discover interesting usage patterns from web data in order to understand and better serve the needs of webbased applications. The paper proposes an algorithm for finding these usage patterns using a modified version of Apriori Algorithm called AprioriGraph. These rules will help service providers to predict, which web pages, the user is likely to visit next. This will optimize the website in terms of efficiency, bandwidth and will have positive economic benefits for them. The proposed Apriori Graph Algorithm O((V)(E)) works faster compared to the existing Apriori Algorithm and is well suitable for realtime application. 
AQuA  Commonsense reasoning is a critical AI capability, but it is difficult to construct challenging datasets that test common sense. Recent neural questionanswering systems, based on large pretrained models of language, have already achieved nearhumanlevel performance on commonsense knowledge benchmarks. These systems do not possess humanlevel common sense, but are able to exploit limitations of the datasets to achieve humanlevel scores. We introduce the AQuA dataset, an adversariallyconstructed evaluation dataset for testing common sense. AQuA forms a challenging extension to the recentlyproposed SWAG dataset, which tests commonsense knowledge using sentencecompletion questions that describe situations observed in video. To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of stateoftheart neural question answering systems. Workers are rewarded for submissions that models fail to answer correctly both before and after finetuning (in crossvalidation). We create 2.8k questions via this procedure and evaluate the performance of multiple stateoftheart question answering systems on our dataset. We observe a significant gap between human performance, which is 95.3%, and the performance of the best baseline accuracy of 65.3% by the OpenAI GPT model. 
ARAnnotator  An increasing number of scientific publications are created in open and transparent peer review models: a submission is published first, and then reviewers are invited, or a submission is reviewed in a closed environment but then these reviews are published with the final article, or combinations of these. Reasons for open peer review include giving better credit to reviewers and enabling readers to better appraise the quality of a publication. In most cases, the full, unstructured text of an open review is published next to the full, unstructured text of the article reviewed. This approach prevents human readers from getting a quick impression of the quality of parts of an article, and it does not easily support secondary exploitation, e.g., for scientometrics on reviews. While document formats have been proposed for publishing structured articles including reviews, integrated tool support for entire open peer review workflows resulting in such documents is still scarce. We present ARAnnotator, the Automatic Article and Review Annotator which employs a semantic information model of an article and its reviews, using semantic markup and unique identifiers for all entities of interest. The finegrained article structure is not only exposed to authors and reviewers but also preserved in the published version. We publish articles and their reviews in a Linked Data representation and thus maximize their reusability by thirdparty applications. We demonstrate this reusability by running qualityrelated queries against the structured representation of articles and their reviews. 
Arc Diagram  In graph drawing, an arc diagram is a style of graph drawing, in which the vertices of a graph are placed along a line in the Euclidean plane, with edges being drawn as semicircles in one of the two halfplanes bounded by the line, or as smooth curves formed by sequences of semicircles. In some cases, line segments of the line itself are also allowed as edges, as long as they connect only vertices that are consecutive along the line. 
ArcGIS  Esri’s ArcGIS is a geographic information system (GIS) for working with maps and geographic information. It is used for: creating and using maps; compiling geographic data; analyzing mapped information; sharing and discovering geographic information; using maps and geographic information in a range of applications; and managing geographic information in a database. The system provides an infrastructure for making maps and geographic information available throughout an organization, across a community, and openly on the Web. 
ARCHANGEL  We present ARCHANGEL; a novel distributed ledger based system for assuring the longterm integrity of digital video archives. First, we describe a novel deep network architecture for computing compact temporal content hashes (TCHs) from audiovisual streams with durations of minutes or hours. Our TCHs are sensitive to accidental or malicious content modification (tampering) but invariant to the codec used to encode the video. This is necessary due to the curatorial requirement for archives to format shift video over time to ensure future accessibility. Second, we describe how the TCHs (and the models used to derive them) are secured via a proofofauthority blockchain distributed across multiple independent archives. We report on the efficacy of ARCHANGEL within the context of a trial deployment in which the national government archives of the United Kingdom, Estonia and Norway participated. 
Archetypal Analysis  Archetypal analysis (Cutler and Breiman, 1994) has the aim to find a few, not necessarily observed, extremal observations (the archetypes) in a multivariate data set such that: 1. all the observations are approximated by convex combinations of the archetypes, and 2. all the archetypes are convex combinations of the observations. 
Architecture Compression  In this paper we propose a novel approach to model compression termed Architecture Compression. Instead of operating on the weight or filter space of the network like classical model compression methods, our approach operates on the architecture space. A 1D CNN encoderdecoder is trained to learn a mapping from discrete architecture space to a continuous embedding and back. Additionally, this embedding is jointly trained to regress accuracy and parameter count in order to incorporate information about the architecture’s effectiveness on the dataset. During the compression phase, we first encode the network and then perform gradient descent in continuous space to optimize a compression objective function that maximizes accuracy and minimizes parameter count. The final continuous feature is then mapped to a discrete architecture using the decoder. We demonstrate the merits of this approach on visual recognition tasks such as CIFAR10, CIFAR100, FashionMNIST and SVHN and achieve a greater than 20x compression on CIFAR10. 
Architecture Search, Anneal and Prune (ASAP) 
Automatic methods for Neural Architecture Search (NAS) have been shown to produce stateoftheart network models, yet, their main drawback is the computational complexity of the search process. As some primal methods optimized over a discrete search space, thousands of days of GPU were required for convergence. A recent approach is based on constructing a differentiable search space that enables gradientbased optimization, thus reducing the search time to a few days. While successful, such methods still include some incontinuous steps, e.g., the pruning of many weak connections at once. In this paper, we propose a differentiable search space that allows the annealing of architecture weights, while gradually pruning inferior operations, thus the search converges to a single output network in a continuous manner. Experiments on several vision datasets demonstrate the effectiveness of our method with respect to the search cost, accuracy and the memory footprint of the achieved model. 
Arcsine Law  In probability theory, the arcsine laws are a collection of results for onedimensional random walks and Brownian motion (the Wiener process). The best known of these is attributed to Paul Lévy (1939). All three laws relate path properties of the Wiener process to the arcsine distribution. The Lévy Arcsine Law states that the proportion of time that the onedimensional Wiener process is positive follows an arcsine distribution. http://…/randomwalksandarcsinelaw 
Area Attention  Existing attention mechanisms, are mostly itembased in that a model is designed to attend to a single item in a collection of items (the memory). Intuitively, an area in the memory that may contain multiple items can be worth attending to as a whole. We propose area attention: a way to attend to an area of the memory, where each area contains a group of items that are either spatially adjacent when the memory has a 2dimensional structure, such as images, or temporally adjacent for 1dimensional memory, such as natural language sentences. Importantly, the size of an area, i.e., the number of items in an area, can vary depending on the learned coherence of the adjacent items. By giving the model the option to attend to an area of items, instead of only a single item, we hope attention mechanisms can better capture the nature of the task. Area attention can work along multihead attention for attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation and image captioning, and improve upon strong (stateoftheart) baselines in both cases. These improvements are obtainable with a basic form of area attention that is parameter free. In addition to proposing the novel concept of area attention, we contribute an efficient way for computing it by leveraging the technique of summed area tables. 
Area under Curve (AUC) 
Sometimes, the ROC is used to generate a summary statistic. A common version is the area under the ROC curve, or “AUC” (“Area Under Curve”), or A’ (pronounced “aprime”), or “cstatistic”. 
Arena Model  The authors propose a parametric model called the arena model for prediction in paired competitions, i.e. paired comparisons with eliminations and bifurcations. The arena model has a number of appealing advantages. First, it predicts the results of competitions without rating many individuals. Second, it takes full advantage of the structure of competitions. Third, the model provides an easy method to quantify the uncertainty in competitions. Fourth, some of our methods can be directly generalized for comparisons among three or more individuals. Furthermore, the authors identify an invariant Bayes estimator with regard to the prior distribution and prove the consistency of the estimations of uncertainty. Currently, the arena model is not effective in tracking the change of strengths of individuals, but its basic framework provides a solid foundation for future study of such cases. 
AresDB  Released in November 2018, AresDB is an open source, realtime analytics engine that leverages an unconventional power source, graphics processing units (GPUs), to enable our analytics to grow at scale. An emerging tool for realtime analytics, GPU technology has advanced significantly over the years, making it a perfect fit for realtime computation and data processing in parallel. 
Argument Component Detection (ACD) 
Argument component detection (ACD) is an important subtask in argumentation mining. ACD aims at detecting and classifying different argument components in natural language texts. 
Argument Harvesting  Much research in computational argumentation assumes that arguments and counterarguments can be obtained in some way. Yet, to improve and apply models of argument, we need methods for acquiring them. Current approaches include argument mining from text, hand coding of arguments by researchers, or generating arguments from knowledge bases. In this paper, we propose a new approach, which we call argument harvesting, that uses a chatbot to enter into a dialogue with a participant to get arguments and counterarguments from him or her. Because it is automated, the chatbot can be used repeatedly in many dialogues, and thereby it can generate a large corpus. We describe the architecture of the chatbot, provide methods for managing a corpus of arguments and counterarguments, and an evaluation of our approach in a case study concerning attitudes of women to participation in sport. 
Argument unit Recognition and Classification (ARC) 
Argument mining is generally performed on the sentencelevel — it is assumed that an entire sentence (not parts of it) corresponds to an argument. In this paper, we introduce the new task of Argument unit Recognition and Classification (ARC). In ARC, an argument is generally a part of a sentence — a more realistic assumption since several different arguments can occur in one sentence and longer sentences often contain a mix of argumentative and nonargumentative parts. Recognizing and classifying the spans that correspond to arguments makes ARC harder than previously defined argument mining tasks. We release ARC8, a new benchmark for evaluating the ARC task. We show that tokenlevel annotations for argument units can be gathered using scalable methods. ARC8 contains 25\% more arguments than a dataset annotated on the sentencelevel would. We cast ARC as a sequence labeling task, develop a number of methods for ARC sequence tagging and establish the state of the art for ARC8. A focus of our work is robustness: both robustness against errors in sentence identification (which are frequent for noisy text) and robustness against divergence in training and test data. 
Argumentation Framework With Recursive Attacks (AFRA) 
The issue of representing attacks to attacks in argumentation is receiving an increasing attention as a useful conceptual modelling tool in several contexts. In this paper we present AFRA, a formalism encompassing unlimited recursive attacks within argumentation frameworks. AFRA satisfies the basic requirements of definition simplicity and rigorous compatibility with Dung’s theory of argumentation. This paper provides a complete development of the AFRA formalism complemented by illustrative examples and a detailed comparison with other recursive attack formalizations. 
Argumentation Mining  The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people’s argumentation. 
ARMA Point Process  We introduce the ARMA (autoregressivemovingaverage) point process, which is a Hawkes process driven by a NeymanScott process with Poisson immigration. It contains both the Hawkes and NeymanScott process as special cases and naturally combines selfexciting and shotnoise cluster mechanisms, useful in a variety of applications. The name ARMA is used because the ARMA point process is an appropriate analogue of the ARMA time series model for integervalued series. As such, the ARMA point process framework accommodates a flexible family of models sharing methodological and mathematical similarities with ARMA time series. We derive an estimation procedure for ARMA point processes, as well as the integer ARMA models, based on an MCEM (Monte Carlo Expectation Maximization) algorithm. This powerful framework for estimation accommodates trends in immigration, multiple parametric specifications of excitement functions, as well as cases where marks and immigrants are not observed. 
ARMDN  Accurate demand forecasts can help online retail organizations better plan their supplychain processes. The challenge, however, is the large number of associative factors that result in large, nonstationary shifts in demand, which traditional time series and regression approaches fail to model. In this paper, we propose a Neural Network architecture called ARMDN, that simultaneously models associative factors, timeseries trends and the variance in the demand. We first identify several causal features and use a combination of feature embeddings, MLP and LSTM to represent them. We then model the output density as a learned mixture of Gaussian distributions. The ARMDN can be trained endtoend without the need for additional supervision. We experiment on a dataset of an year’s worth of data over tensofthousands of products from Flipkart. The proposed architecture yields a significant improvement in forecasting accuracy when compared with existing alternatives. 
ArrayFire Library  The ArrayFire accelerated computing library is a free, generalpurpose, opensource library that simplifies the process of developing software that targets parallel and massivelyparallel architectures including CPUs, GPUs, and other hardware acceleration devices. ArrayFire is used on devices from lowpowered mobile phones to highpowered GPUenabled supercomputers including CPUs from all major vendors (Intel, AMD, Arm), GPUs from the dominant manufacturers (NVIDIA, AMD, and Qualcomm), as well as a variety of other accelerator devices on Windows, Mac, and Linux. RcppArrayFire 
ART  Transfer learning aims to solve the data sparsity for a target domain by applying information of the source domain. Given a sequence (e.g. a natural language sentence), the transfer learning, usually enabled by recurrent neural network (RNN), represents the sequential information transfer. RNN uses a chain of repeating cells to model the sequence data. However, previous studies of neural network based transfer learning simply represents the whole sentence by a single vector, which is unfeasible for seq2seq and sequence labeling. Meanwhile, such layerwise transfer learning mechanisms lose the finegrained celllevel information from the source domain. In this paper, we proposed the aligned recurrent transfer, ART, to achieve celllevel information transfer. ART is under the pretraining framework. Each cell attentively accepts transferred information from a set of positions in the source domain. Therefore, ART learns the crossdomain word collocations in a more flexible way. We conducted extensive experiments on both sequence labeling tasks (POS tagging, NER) and sentence classification (sentiment analysis). ART outperforms the stateofthearts over all experiments. 
artfima (ARTFIMA) 

Articulate  Articulate is a platform for building conversational interfaces with intelligent agents. Articulate is an open source project that will allow you to take control of you conversational interfaces, without being worried where and how your data is stored. Also, Articulate is built with an usercentered design where the main goal is to make experts and beginners feel comfortable when building their intelligent agents. The main features of Articulate are: • Open source project • Based on Rasa NLU • Docker and dockercompose based (Easy to set up locally and in the cloud) • Awesome UI/UX • Webhook connection • Response formatting • Handlebars.js for template responses • Community support on Gitter and Github Articulates makes it super easy to get up and running with Rasa NLU. You´ll be guided as you build and train your custom agent using our friendly and intuitive interface. 
Artificial Bee Colony (ABC) 
Characterizing the Social Interactions in the Artificial Bee Colony Algorithm 
Artificial Continuous Prediction Market (ACPM) 
We propose the Artificial Continuous Prediction Market (ACPM) as a means to predict a continuous real value, by integrating a range of data sources and aggregating the results of different machine learning (ML) algorithms. ACPM adapts the concept of the (physical) prediction market to address the prediction of real values instead of discrete events. Each ACPM participant has a data source, a ML algorithm and a local decisionmaking procedure that determines what to bid on what value. The contributions of ACPM are: (i) adaptation to changes in data quality by the use of learning in: (a) the market, which weights each market participant to adjust the influence of each on the market prediction and (b) the participants, which use a Qlearning based trading strategy to incorporate the market prediction into their subsequent predictions, (ii) resilience to a changing population of low and highperforming participants. We demonstrate the effectiveness of ACPM by application to an influenzalike illnesses data set, showing ACPM outperforms a range of wellknown regression models and is resilient to variation in data source quality. 
Artificial Emotion Intelligence  ➘ “Artificial Emotional Intelligence” 
Artificial Emotional Intelligence  ➚ “Affective Computing” 
Artificial General Intelligence (AGI) 
Artificial general intelligence (AGI) is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can. It is a primary goal of artificial intelligence research and an important topic for science fiction writers and futurists. Artificial general intelligence is also referred to as “strong AI”, “full AI” or as the ability to perform “general intelligent action”. AGI is associated with traits such as consciousness, sentience, sapience, and selfawareness observed in living beings. 
Artificial Intelligence (AI) 
Artificial intelligence (AI) is the humanlike intelligence exhibited by machines or software. It is also an academic field of study. Major AI researchers and textbooks define the field as “the study and design of intelligent agents”, where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success. John McCarthy, who coined the term in 1955, defines it as “the science and engineering of making intelligent machines”. 
Artificial Life (ALife) 
Artificial life (often abbreviated ALife or ALife) is a field of study wherein researchers examine systems related to natural life, its processes, and its evolution, through the use of simulations with computer models, robotics, and biochemistry. The discipline was named by Christopher Langton, an American theoretical biologist, in 1986. There are three main kinds of alife, named for their approaches: soft, from software; hard, from hardware; and wet, from biochemistry. Artificial life researchers study traditional biology by trying to recreate aspects of biological phenomena. Artificial life studies the fundamental processes of living systems in artificial environments in order to gain a deeper understanding of the complex information processing that define such systems. These topics are broad, but often include evolutionary dynamics, emergent properties of collective systems, biomimicry, as well as related issues about the philosophy of the nature of life and the use of lifelike properties in artistic works. 
Artificial Quadratic Neural Network (AQNN) 
In machine learning, the use of an artificial neural network is the mainstream approach. Such a network consists of layers of neurons. These neurons are of the same type characterized by the two features: (1) an inner product of an input vector and a matching weighting vector of trainable parameters and (2) a nonlinear excitation function. Here we investigate the possibility of replacing the inner product with a quadratic function of the input vector, thereby upgrading the 1st order neuron to the 2nd order neuron, empowering individual neurons, and facilitating the optimization of neural networks. Also, numerical examples are provided to illustrate the feasibility and merits of the 2nd order neurons. Finally, further topics are discussed. 
Artificial Stupidity  And artificial stupidity’s most valid usage relates to examples of the obvious faultiness of AI technologies and systems. Finally within the field of computer science, artificial stupidity has one other significant application: it refers to a technique of deliberately dumbing down computer programs in order to introduce errors in their responses. Building Safer AGI by introducing Artificial Stupidity 
Aspect Based Sentiment Analysis (ABSA) 
Targeted sentiment analysis (TSA), also known as aspect based sentiment analysis (ABSA), aims at detecting finegrained sentiment polarity towards targets in a given opinion document. Due to the lack of labeled datasets and effective technology, TSA had been intractable for many years. The newly released datasets and the rapid development of deep learning technologies are key enablers for the recent significant progress made in this area. However, the TSA tasks have been defined in various ways with different understandings towards basic concepts like `target’ and `aspect’. In this paper, we categorize the different tasks and highlight the differences in the available datasets and their specific tasks. We then further discuss the challenges related to data collection and data annotation which are overlooked in many previous studies. 
AspectAware LSTM (AALSTM) 
Aspectbased sentiment analysis (ABSA) aims to predict finegrained sentiments of comments with respect to given aspect terms or categories. In previous ABSA methods, the importance of aspect has been realized and verified. Most existing LSTMbased models take aspect into account via the attention mechanism, where the attention weights are calculated after the context is modeled in the form of contextual vectors. However, aspectrelated information may be already discarded and aspectirrelevant information may be retained in classic LSTM cells in the context modeling process, which can be improved to generate more effective context representations. This paper proposes a novel variant of LSTM, termed as aspectaware LSTM (AALSTM), which incorporates aspect information into LSTM cells in the context modeling stage before the attention mechanism. Therefore, our AALSTM can dynamically produce aspectaware contextual representations. We experiment with several representative LSTMbased models by replacing the classic LSTM cells with the AALSTM cells. Experimental results on SemEval2014 Datasets demonstrate the effectiveness of AALSTM. 
AspectBased Rating Prediction Model (AspeRa) 
We propose a novel endtoend Aspectbased Rating Prediction model (AspeRa) that estimates user rating based on review texts for the items and at the same time discovers coherent aspects of reviews that can be used to explain predictions or profile users. The AspeRa model uses maxmargin losses for joint item and user embedding learning and a dualheaded architecture; it significantly outperforms recently proposed stateoftheart models such as DeepCoNN, HFT, NARRE, and TransRev on two real world data sets of user reviews. With qualitative examination of the aspects and quantitative evaluation of rating prediction models based on these aspects, we show how aspect embeddings can be used in a recommender system. 
AspectOriented Programming (AOP) 
In computing, aspectoriented programming (AOP) is a programming paradigm that aims to increase modularity by allowing the separation of crosscutting concerns. It does so by adding additional behavior to existing code (an advice) without modifying the code itself, instead separately specifying which code is modified via a ‘pointcut’ specification, such as ‘log all function calls when the function’s name begins with ‘set”. This allows behaviors that are not central to the business logic (such as logging) to be added to a program without cluttering the code, core to the functionality. AOP forms a basis for aspectoriented software development. AOP includes programming methods and tools that support the modularization of concerns at the level of the source code, while ‘aspectoriented software development’ refers to a whole engineering discipline. 
AspectOriented Software Development (AOSD) 
In computing, Aspectoriented software development (AOSD) is an emerging software development technology that seeks new modularizations of software systems in order to isolate secondary or supporting functions from the main program’s business logic. AOSD allows multiple concerns to be expressed separately and automatically unified into working systems. AspectOriented Software Development focuses on the identification, specification and representation of crosscutting concerns and their modularization into separate functional units as well as their automated composition into a working system. 
ASReml  ASReml is a statistical software package for fitting linear mixed models using restricted maximum likelihood, a technique commonly used in plant and animal breeding and quantitative genetics as well as other fields. It is notable for its ability to fit very large and complex data sets efficiently, due to its use of the average information algorithm and sparse matrix methods. It was originally developed by Arthur Gilmour. ASREML can be used in Windows, Linux, and as an addon to SPLUS and R. ASReml 
Assistive MultiArmed Bandit  Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people are themselves learning about what they want. In this work, we introduce the assistive multiarmed bandit, where a robot assists a human playing a bandit task to maximize cumulative reward. In this problem, the human does not know the reward function but can learn it through the rewards received from arm pulls; the robot only observes which arms the human pulls but not the reward associated with each pull. We offer sufficient and necessary conditions for successfully assisting the human in this framework. Surprisingly, better human performance in isolation does not necessarily lead to better performance when assisted by the robot: a human policy can do better by effectively communicating its observed rewards to the robot. We conduct proofofconcept experiments that support these results. We see this work as contributing towards a theory behind algorithms for humanrobot interaction. 
Association Analysis  ➘ “Association Rule Learning” 
Association for Uncertainty in Artificial Intelligence (AUAI) 
The Association for Uncertainty in Artificial Intelligence is a nonprofit organization focused on organizing the annual Conference on Uncertainty in Artificial Intelligence (UAI) and, more generally, on promoting research in pursuit of advances in knowledge representation, learning and reasoning under uncertainty. Principles and applications developed within the UAI community have been at the forefront of research in Artificial Intelligence. The UAI community and annual meeting have been primary sources of advances in graphical models for representing and reasoning with uncertainty. 
Association Mapping  Association mapping (genetics), also known as “linkage disequilibrium mapping”, is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes (observable characteristics) to genotypes (the genetic constitution of organisms). 
Association Rule Classification (ARC) 
arc 
Association Rule Learning  Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. Based on the concept of strong rules, Rakesh Agrawal et al. introduced association rules for discovering regularities between products in largescale transaction data recorded by pointofsale (POS) systems in supermarkets. For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements. In addition to the above example from market basket analysis association rules are employed today in many application areas including Web usage mining, intrusion detection, Continuous production, and bioinformatics. As opposed to sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions. 
Association Rule Mining  ➘ “Association Rule Learning” 
Associative Domain Adaptation  We propose associative domain adaptation, a novel technique for endtoend domain adaptation with neural networks, the task of inferring class labels for an unlabeled target domain based on the statistical properties of a labeled source domain. Our training scheme follows the paradigm that in order to effectively derive class labels for the target domain, a network should produce statistically domain invariant embeddings, while minimizing the classification error on the labeled source domain. We accomplish this by reinforcing associations between source and target data directly in embedding space. Our method can easily be added to any existing classification network with no structural and almost no computational overhead. We demonstrate the effectiveness of our approach on various benchmarks and achieve stateoftheart results across the board with a generic convolutional neural network architecture not specifically tuned to the respective tasks. Finally, we show that the proposed association loss produces embeddings that are more effective for domain adaptation compared to methods employing maximum mean discrepancy as a similarity measure in embedding space. 
Assume, Augment and Learn (AAL) 
The field of fewshot learning has been laboriously explored in the supervised setting, where perclass labels are available. On the other hand, the unsupervised fewshot learning setting, where no labels of any kind are required, has seen little investigation. We propose a method, named Assume, Augment and Learn or AAL, for generating fewshot tasks using unlabeled data. We randomly label a random subset of images from an unlabeled dataset to generate a support set. Then by applying data augmentation on the support set’s images, and reusing the support set’s labels, we obtain a target set. The resulting fewshot tasks can be used to train any standard metalearning framework. Once trained, such a model, can be directly applied on small reallabeled datasets without any changes or finetuning required. In our experiments, the learned models achieve good generalization performance in a variety of established fewshot learning tasks on Omniglot and MiniImagenet. 
Assumed Density Filtering Updating Beliefs on ActionValues (ADFQ) 
While offpolicy temporal difference methods have been broadly used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have been relatively understudied. This is mainly because the max operator in the Bellman optimality equation brings nonlinearity and inconsistent distributions over value function. In this paper, we introduce a new Bayesian approach to offpolicy TD methods using Assumed Density Filtering, called ADFQ, which updates beliefs on actionvalues (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs not only are used in exploration but they provide a natural regularization in the belief updates. We also present a connection between ADFQ and Qlearning. Our empirical results show the proposed ADFQ algorithms outperform comparing algorithms in several task domains. Moreover, our algorithms improve general drawbacks in BRL such as computational complexity, usage of uncertainty, and nonlinearity. 
AsyBProxSGD  Large models are prevalent in modern machine learning scenarios, including deep learning, recommender systems, etc., which can have millions or even billions of parameters. Parallel algorithms have become an essential solution technique to many largescale machine learning jobs. In this paper, we propose a model parallel proximal stochastic gradient algorithm, AsyBProxSGD, to deal with large models using model parallel blockwise updates while in the meantime handling a large amount of training data using proximal stochastic gradient descent (ProxSGD). In our algorithm, worker nodes communicate with the parameter servers asynchronously, and each worker performs proximal stochastic gradient for only one block of model parameters during each iteration. Our proposed algorithm generalizes ProxSGD to the asynchronous and model parallel setting. We prove that AsyBProxSGD achieves a convergence rate of $O(1/\sqrt{K})$ to stationary points for nonconvex problems under \emph{constant} minibatch sizes, where $K$ is the total number of block updates. This rate matches the bestknown rates of convergence for a wide range of gradientlike algorithms. Furthermore, we show that when the number of workers is bounded by $O(K^{1/4})$, we can expect AsyBProxSGD to achieve linear speedup as the number of workers increases. We implement the proposed algorithm on MXNet and demonstrate its convergence behavior and nearlinear speedup on a realworld dataset involving both a large model size and large amounts of data. 
Asymmetric Convolution Bidirectional Long ShortTerm Memory Networks (ACBLSTM) 
Recently deeplearning models have been shown to be capable of making remarkable performance in sentences and documents classification tasks. In this work, we propose a novel framework called ACBLSTM for modeling setences and documents, which combines the asymmetric convolution neural network (ACNN) with the Bidirectional Long ShortTerm Memory network (BLSTM). Experiment results demonstrate that our model achieves stateoftheart results on all six tasks, including sentiment analysis, question type classification, and subjectivity classification. 
Asymmetric Deep Semantic Quantization (ADSQ) 
Due to its fast retrieval and storage efficiency capabilities, hashing has been widely used in nearest neighbor retrieval tasks. By using deep learning based techniques, hashing can outperform nonlearning based hashing in many applications. However, there are some limitations to previous learning based hashing methods (e.g., the learned hash codes are not discriminative due to the hashing methods being unable to discover rich semantic information and the training strategy having difficulty optimizing the discrete binary codes). In this paper, we propose a novel learning based hashing method, named \textbf{\underline{A}}symmetric \textbf{\underline{D}}eep \textbf{\underline{S}}emantic \textbf{\underline{Q}}uantization (\textbf{ADSQ}). \textbf{ADSQ} is implemented using three stream frameworks, which consists of one \emph{LabelNet} and two \emph{ImgNets}. The \emph{LabelNet} leverages three fullyconnected layers, which is used to capture rich semantic information between image pairs. For the two \emph{ImgNets}, they each adopt the same convolutional neural network structure, but with different weights (i.e., asymmetric convolutional neural networks). The two \emph{ImgNets} are used to generate discriminative compact hash codes. Specifically, the function of the \emph{LabelNet} is to capture rich semantic information that is used to guide the two \emph{ImgNets} in minimizing the gap between the realcontinuous features and discrete binary codes. By doing this, \textbf{ADSQ} can make full use of the most critical semantic information to guide the feature learning process and consider the consistency of the common semantic space and Hamming space. Results from our experiments demonstrate that \textbf{ADSQ} can generate high discriminative compact hash codes and it outperforms current stateoftheart methods on three benchmark datasets, CIFAR10, NUSWIDE, and ImageNet. 
Asymptotically Exact Data Augmentation (AXDA) 
Data augmentation, by the introduction of auxiliary variables, has become an ubiquitous technique to improve mixing/convergence properties, simplify the implementation or reduce the computational time of inference methods such as Markov chain Monte Carlo. Nonetheless, introducing appropriate auxiliary variables while preserving the initial target probability distribution cannot be conducted in a systematic way but highly depends on the considered problem. To deal with such issues, this paper draws a unified framework, namely asymptotically exact data augmentation (AXDA), which encompasses several wellestablished but also more recent approximate augmented models. Benefiting from a much more general perspective, it delivers some additional qualitative and quantitative insights concerning these schemes. In particular, general properties of AXDA along with nonasymptotic theoretical results on the approximation that is made are stated. Close connections to existing Bayesian methods (e.g. mixture modeling, robust Bayesian models and approximate Bayesian computation) are also drawn. All the results are illustrated with examples and applied to standard statistical learning problems. 
Asynchronous Advantage ActorCritic (A3C) 

Asynchronous Decentralized Accelerated Stochastic Gradient Descent  In this work, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of method for decentralized stochastic optimization, considering communication and synchronization are the major bottlenecks. We establish $\mathcal{O}(1/\epsilon)$ (resp., $\mathcal{O}(1/\sqrt{\epsilon})$) communication complexity and $\mathcal{O}(1/\epsilon^2)$ (resp., $\mathcal{O}(1/\epsilon)$) sampling complexity for solving general convex (resp., strongly convex) problems. 
Asynchronous Distributed Gibbs (ADG) 
Gibbs sampling is a widely used Markov Chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences. It is widely believed that MCMC methods do not extend easily to parallel implementations, as their inherently sequential nature incurs a large synchronization cost. This means that new solutions are needed to bring Bayesian analysis fully into the era of largescale computation. In this paper, we present a novel scheme – Asynchronous Distributed Gibbs (ADG) sampling – that allows us to perform MCMC in a parallel fashion with no synchronization or locking, avoiding the typical performance bottlenecks of parallel algorithms. Our method is especially attractive in settings, such as hierarchical randomeffects modeling in which each observation has its own random effect, where the problem dimension grows with the sample size. We prove convergence under some basic regularity conditions, and discuss the proof for similar parallelization schemes for other iterative algorithms. We provide three examples that illustrate some of the algorithm’s properties with respect to scaling. Because our hardware resources are bounded, we have not yet found a limit to the algorithm’s scaling, and thus its true capabilities remain unknown. 
Asynchronous Episodic Deep Deterministic Policy Gradient (AEDDPG) 
Deep Deterministic Policy Gradient (DDPG) has been proved to be a successful reinforcement learning (RL) algorithm for continuous control tasks. However, DDPG still suffers from data insufficiency and training inefficiency, especially in computationally complex environments. In this paper, we propose Asynchronous Episodic Deep Deterministic Policy Gradient (AEDDPG), as an expansion of DDPG, which can achieve more effective learning with less training time required. First, we design a modified scheme for data collection in an asynchronous fashion. Generally, for asynchronous RL algorithms, sample efficiency or/and training stability diminish as the degree of parallelism increases. We consider this problem from the perspectives of both data generation and data utilization. In detail, we redesign experience replay by introducing the idea of episodic control so that the agent can latch on good trajectories rapidly. In addition, we also inject a new type of noise in action space to enrich the exploration behaviors. Experiments demonstrate that our AEDDPG achieves higher rewards and requires less time consuming than most popular RL algorithms in Learning to Run task which has a computationally complex environment. Not limited to the control tasks in computationally complex environments, AEDDPG also achieves higher rewards and 2 to 4fold improvement in sample efficiency on average compared to other variants of DDPG in MuJoCo environments. Furthermore, we verify the effectiveness of each proposed technique component through abundant ablation study. 
Asynchronous Federated Optimization  Federated learning enables training on a massive number of edge devices. To improve flexibility and scalability, we propose a new asynchronous federated optimization algorithm. We prove that the proposed approach has nearlinear convergence to a global optimum, for both strongly and nonstrongly convex problems, as well as a restricted family of nonconvex problems. Empirical results show that the proposed algorithm converges fast and tolerates staleness. 
Asynchronous Global Weight Update (AGWU) 
Benefitting from largescale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very timeconsuming, where large amounts of training samples and iterative operations are required to obtain highquality weight parameters. In this paper, we focus on the timeconsuming training process of largescale CNNs and propose a Bilayered Parallel Training (BPTCNN) architecture in distributed computing environments. BPTCNN consists of two main components: (a) an outerlayer parallel training for multiple CNN subnetworks on separate data subsets, and (b) an innerlayer parallel training for each subnetwork. In the outerlayer parallelism, we address critical issues of distributed and parallel computing, including data communication, synchronization, and workload balance. A heterogeneousaware Incremental Data Partitioning and Allocation (IDPA) strategy is proposed, where largescale training datasets are partitioned and allocated to the computing nodes in batches according to their computing power. To minimize the synchronization waiting during the global weight update process, an Asynchronous Global Weight Update (AGWU) strategy is proposed. In the innerlayer parallelism, we further accelerate the training process for each CNN subnetwork on each computer, where computation steps of convolutional layer and the local weight training are parallelized based on taskparallelism. We introduce task decomposition and scheduling strategies with the objectives of threadlevel load balancing and minimum waiting time for critical paths. Extensive experimental results indicate that the proposed BPTCNN effectively improves the training performance of CNNs while maintaining the accuracy. 
Asynchronous Parallel SAGA (ASAGA) 
We describe Asaga, an asynchronous parallel version of the incremental gradient algorithm Saga that enjoys fast linear convergence rates. We highlight a subtle but important technical issue present in a large fraction of the recent convergence rate proofs for asynchronous parallel optimization algorithms, and propose a simplification of the recently proposed ‘perturbed iterate’ framework that resolves it. We thereby prove that Asaga can obtain a theoretical linear speedup on multicore systems even without sparsity assumptions. We present results of an implementation on a 40core architecture illustrating the practical speedup as well as the hardware overhead. 
Asynchronous Parallel Sampling Gradient Boosting Decision Tree (asynchSGBDT) 
With the development of big data technology, Gradient Boosting Decision Tree, i.e. GBDT, becomes one of the most important machine learning algorithms for its accurate output. However, the training process of GBDT needs a lot of computational resources and time. In order to accelerate the training process of GBDT, the asynchronous parallel sampling gradient boosting decision tree, abbr. asynchSGBDT is proposed in this paper. Via introducing sampling, we adapt the numerical optimization process of traditional GBDT training process into stochastic optimization process and use asynchronous parallel stochastic gradient descent to accelerate the GBDT training process. Meanwhile, the theoretical analysis of asynchSGBDT is provided by us in this paper. Experimental results show that GBDT training process could be accelerated by asynchSGBDT. Our asynchronous parallel strategy achieves an almost linear speedup, especially for highdimensional sparse datasets. 
Asynchronous Parallel Stochastic Coordinate Descent (ASYCD) 
An asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity property and a sublinear rate (1=K) on general convex functions. Nearlinear speedup on a multicore system can be expected if the number of processors is O(n1=2) in unconstrained optimization and O(n1=4) in the separableconstrained case, where n is the number of variables. 
Asynchronous Stochastic Gradient Descent  Distributed machine learning has been widely studied in the literature to scale up machine learning model training in the presence of an everincreasing amount of data. We study distributed machine learning from another perspective, where the information about the training same samples are inherently decentralized and located on different parities. We propose an asynchronous stochastic gradient descent (SGD) algorithm for such a feature distributed machine learning (FDML) problem, to jointly learn from decentralized features, with theoretical convergence guarantees under bounded asynchrony. Our algorithm does not require sharing the original feature data or even local model parameters between parties, thus preserving a high level of data confidentiality. We implement our algorithm for FDML in a parameter server architecture. We compare our system with fully centralized training (which violates data locality requirements) and training only based on local features, through extensive experiments performed on a large amount of data from a realworld application, involving 5 million samples and $8700$ features in total. Experimental results have demonstrated the effectiveness and efficiency of the proposed FDML system. 
Asynchronous Stochastic Variational Inference  Stochastic variational inference (SVI) employs stochastic optimization to scale up Bayesian computation to massive data. Since SVI is at its core a stochastic gradientbased algorithm, horizontal parallelism can be harnessed to allow larger scale inference. We propose a lockfree parallel implementation for SVI which allows distributed computations over multiple slaves in an asynchronous style. We show that our implementation leads to linear speedup while guaranteeing an asymptotic ergodic convergence rate $O(1/\sqrt(T)$ ) given that the number of slaves is bounded by $\sqrt(T)$ ($T$ is the total number of iterations). The implementation is done in a highperformance computing (HPC) environment using message passing interface (MPI) for python (MPI4py). The extensive empirical evaluation shows that our parallel SVI is lossless, performing comparably well to its counterpart serial SVI with linear speedup. 
Asynchronous SubgradientPush Algorithm (AsySPA) 
This paper proposes a novel exact asynchronous subgradientpush algorithm (AsySPA) to solve an additive cost optimization problem over digraphs where each node only has access to a local convex function and updates asynchronously with an arbitrary rate. Specifically, each node of a strongly connected digraph does not wait for updates from other nodes but simply starts a new update within any bounded time interval by using local information available from its inneighbors. ‘Exact’ means that every node of the AsySPA can asymptotically converge to the same optimal solution, even under different update rates among nodes and bounded communication delays. To compensate uneven update rates, we design a simple mechanism to adaptively adjust stepsizes per update in each node, which is substantially different from the existing works. Then, we construct a delayfree augmented system to address asynchrony and delays, and perform the convergence analysis by proposing a generalized subgradient algorithm, which clearly has its own significance and helps us to explicitly evaluate the convergence speed of the AsySPA. Finally, we demonstrate advantages of the AsySPA over the celebrated synchronous SPA in both theory and simulation. 
ATMSeer  To relieve the pain of manually selecting machine learning algorithms and tuning hyperparameters, automated machine learning (AutoML) methods have been developed to automatically search for good models. Due to the huge model search space, it is impossible to try all models. Users tend to distrust automatic results and increase the search budget as much as they can, thereby undermining the efficiency of AutoML. To address these issues, we design and implement ATMSeer, an interactive visualization tool that supports users in refining the search space of AutoML and analyzing the results. To guide the design of ATMSeer, we derive a workflow of using AutoML based on interviews with machine learning experts. A multigranularity visualization is proposed to enable users to monitor the AutoML process, analyze the searched models, and refine the search space in real time. We demonstrate the utility and usability of ATMSeer through two case studies, expert interviews, and a user study with 13 end users. 
Atom  A hackable text editor for the 21st Century. AtomIDE is a set of optional packages to bring IDElike functionality to Atom and improve language integrations. Index ide screenshot Get smarter contextaware autocompletion, code navigation features such as an outline view, go to definition and find all references. You can also hovertoreveal information, diagnostics (errors and warnings) and document formatting. To get all these IDE features, open Atom IDE UI in Atom and install the package. 
Atom Responding Machine (ARM) 
Recently, improving the relevance and diversity of dialogue system has attracted wide attention. For a post x, the corresponding response y is usually diverse in the realworld corpus, while the conventional encoderdecoder model tends to output the highfrequency (safe but trivial) responses and thus is difficult to handle the large number of responding styles. To address these issues, we propose the Atom Responding Machine (ARM), which is based on a proposed encodercomposerdecoder network trained by a teacherstudent framework.To enrich the generated responses, ARM introduces a large number of moleculemechanisms as various responding styles, which are conducted by taking different combinations from a few atommechanisms. In other words, even a little of atommechanisms can make a mickle of moleculemechanisms. The experiments demonstrate diversity and quality of the responses generated by ARM. We also present generating process to show underlying interpretability for the result. 
ATOMIC  We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 300k textual descriptions. Compared to existing resources that center around taxonomic knowledge, ATOMIC focuses on inferential knowledge organized as typed ifthen relations with variables (e.g., ‘if X pays Y a compliment, then Y will likely return the compliment’). We propose nine ifthen relation types to distinguish causes v.s. effects, agents v.s. themes, voluntary v.s. involuntary events, and actions v.s. mental states. By generatively training on the rich inferential knowledge described in ATOMIC, we show that neural models can acquire simple commonsense capabilities and reason about previously unseen events. Experimental results demonstrate that multitask models that incorporate the hierarchical structure of ifthen relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation. 
Atomic Information Resource Data Model (AIR) 
Entities are simply formed by grouping Attributes together. One or more Attributes are shared between two or more Entities. According to the AtomicDB terminology, shared attributes are called bridge concepts and this is the equivalent of the relationship that is implemented with primary and foreign keys on two tables but here we are completely independent to mix and match them. For example PartColor, could be merged with another attribute from a different table in another relational database. 
Atomic Triangular Matrix  An atomic (upper or lower) triangular matrix is a special form of unitriangular matrix, where all of the offdiagonal entries are zero, except for the entries in a single column. Such a matrix is also called a Gauss matrix or a Gauss transformation matrix. 
Atomistic Structure Learning Algorithm (ASLA) 
One endeavour of modern physical chemistry is to use bottomup approaches to design materials and drugs with desired properties. Here we introduce an atomistic structure learning algorithm (ASLA) that utilizes a convolutional neural network to build 2D compounds and layered structures atom by atom. The algorithm takes no prior data or knowledge on atomic interactions but inquires a firstprinciples quantum mechanical program for physical properties. Using reinforcement learning, the algorithm accumulates knowledge of chemical compound space for a given number and type of atoms and stores this in the neural network, ultimately learning the blueprint for the optimal structural arrangement of the atoms for a given target property. ASLA is demonstrated to work on diverse problems, including grain boundaries in graphene sheets, organic compound formation and a surface oxide structure. This approach to structure prediction is a first step toward direct manipulation of atoms with artificially intelligent first principles computer codes. 
ATOMO  Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include elementwise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that methods such as QSGD and TernGrad are special cases of ATOMO and show that sparsifiying gradients in their singular value decomposition (SVD), rather than the coordinatewise one, can lead to significantly faster distributed training. 
ATPboost  ATPboost is a system for solving sets of largetheory problems by interleaving ATP runs with stateoftheart machine learning of premise selection from the proofs. Unlike many previous approaches that use multilabel setting, the learning is implemented as binary classification that estimates the pairwiserelevance of (theorem, premise) pairs. ATPboost uses for this the XGBoost gradient boosting algorithm, which is fast and has stateoftheart performance on many tasks. Learning in the binary setting however requires negative examples, which is nontrivial due to many alternative proofs. We discuss and implement several solutions in the context of the ATP/ML feedback loop, and show that ATPboost with such methods significantly outperforms the knearest neighbors multilabel classifier. 
Atrain Distributed System (ADS) 
A special type of distributed system called by ‘Atrain Distributed System’ (ADS) which is very suitable for processing big data using the heterogeneous data structures ratrain or the homogeneous data structure rtrain. A simple ‘Atrain Distributed System’ is called an unitier ADS. The ‘Multitier Atrain Distributed System’ is an extension of the unitier ADS. The ADS is scalable upto any extent as many times as required. Two new type of network topologies are defined for ADS called by ‘multihorse cart’ topology and ‘cycle’ topology which can support increasing volume of big data. Where ratrain and rtrain data structures are introduced for the processing of big data, the data structures ‘heterogeneous data structure MA’ and ‘homogeneous data structure MT’ are introduced for the processing of big data including temporal big data too. Both MA and MT can be well implemented in multitier ADS. We define cyclic train and cyclic atrain, and then doubly linked train/atrain. A method is proposed on how to implement Solid Matrices, ndimensional arrays, ndimensional larrays etc. in a computer memory using the data structures MT and MA. http://…/biswasIJCO1420142.pdf 
ATRank  A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNNbased methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via selfattention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models. 
ATree  Index structures are one of the most important tools that DBAs leverage in order to improve the performance of analytics and transactional workloads. However, with the explosion of data that is constantly being generated in a wide variety of domains including autonomous vehicles, Internet of Things (IoT) devices, and Ecommerce sites, building several indexes can often become prohibitive and consume valuable system resources. In fact, a recent study has shown that indexes created as part of the TPCC benchmark can account for 55% of the total memory available in a stateoftheart inmemory DBMS. This overhead consumes valuable and expensive main memory, and limits the amount of space that a database has available to store new data or process existing data. In this paper, we present a novel approximate index structure called ATree. At the core of our index is a tunable error parameter that allows a DBA to balance lookup performance and space consumption. To navigate this tradeoff, we provide a cost model that helps the DBA choose an appropriate error parameter given either (1) a lookup latency requirement (e.g., 500ns) or (2) a storage budget (e.g., 100MB). Using a variety of realworld datasets, we show that our index structure is able to provide performance that is comparable to full index structures while reducing the storage footprint by orders of magnitude. 
Attack Bundling  This technical report describes a new feature of the CleverHans library called ‘attack bundling’. Many papers about adversarial examples present lists of error rates corresponding to different attack algorithms. A common approach is to take the maximum across this list and compare defenses against that error rate. We argue that a better approach is to use attack bundling: the max should be taken across many examples at the level of individual examples, then the error rate should be calculated by averaging after this maximization operation. Reporting the bundled attacker error rate provides a lower bound on the true worstcase error rate. The traditional approach of reporting the maximum error rate across attacks can underestimate the true worstcase error rate by an amount approaching 100\% as the number of attacks approaches infinity. Attack bundling can be used with different prioritization schemes to optimize quantities such as error rate on adversarial examples, perturbation size needed to cause misclassification, or failure rate when using a specific confidence threshold. 
Attack Graph Obfuscation  Before executing an attack, adversaries usually explore the victim’s network in an attempt to infer the network topology and identify vulnerabilities in the victim’s servers and personal computers. Falsifying the information collected by the adversary post penetration may significantly slower lateral movement and increase the amount of noise generated within the victim’s network. We investigate the effect of fake vulnerabilities within a real enterprise network on the attacker performance. We use the attack graphs to model the path of an attacker making its way towards a target in a given network. We use combinatorial optimization in order to find the optimal assignments of fake vulnerabilities. We demonstrate the feasibility of our deceptionbased defense by presenting results of experiments with a large scale real network. We show that adding fake vulnerabilities forces the adversary to invest a significant amount of effort, in terms of time and exploitability cost. 
Attack Investigation Query Language (AIQL) 
The need for countering Advanced Persistent Threat (APT) attacks has led to the solutions that ubiquitously monitor system activities in each host, and perform timely attack investigation over the monitoring data for analyzing attack provenance. However, existing query systems based on relational databases and graph databases lack language constructs to express key properties of major attack behaviors, and often execute queries inefficiently since their semanticsagnostic design cannot exploit the properties of system monitoring data to speed up query execution. To address this problem, we propose a novel query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation. Our system provides (1) domainspecific data model and storage for scaling the storage, (2) a domainspecific query language, Attack Investigation Query Language (AIQL) that integrates critical primitives for attack investigation, and (3) an optimized query engine based on the characteristics of the data and the semantics of the queries to efficiently schedule the query execution. We deployed our system in NEC Labs America comprising 150 hosts and evaluated it using 857 GB of real system monitoring data (containing 2.5 billion events). Our evaluations on a realworld APT attack and a broad set of attack behaviors show that our system surpasses existing systems in both efficiency (124x over PostgreSQL, 157x over Neo4j, and 16x over Greenplum) and conciseness (SQL, Neo4j Cypher, and Splunk SPL contain at least 2.4x more constraints than AIQL). 
Attend, Copy, Parse Architecture  Document information extraction tasks performed by humans create data consisting of a PDF or document image input, and extracted string outputs. This endtoend data is naturally consumed and produced when performing the task because it is valuable in and of itself. It is naturally available, at no additional cost. Unfortunately, stateoftheart word classification methods for information extraction cannot use this data, instead requiring wordlevel labels which are expensive to create and consequently not available for many real life tasks. In this paper we propose the Attend, Copy, Parse architecture, a deep neural network model that can be trained directly on endtoend data, bypassing the need for wordlevel labels. We evaluate the proposed architecture on a large diverse set of invoices, and outperform a stateoftheart production system based on word classification. We believe our proposed architecture can be used on many real life information extraction tasks where word classification cannot be used due to a lack of the required wordlevel labels. 
Attention Attractor Network  Machine learning classifiers are often trained to recognize a set of predefined classes. However, in many real applications, it is often desirable to have the flexibility of learning additional concepts, without retraining on the full training set. This paper addresses this problem, incremental fewshot learning, where a regular classification network has already been trained to recognize a set of base classes; and several extra novel classes are being considered, each with only a few labeled examples. After learning the novel classes, the model is then evaluated on the overall performance of both base and novel classes. To this end, we propose a metalearning model, the Attention Attractor Network, which regularizes the learning of novel classes. In each episode, we train a set of new weights to recognize novel classes until they converge, and we show that the technique of recurrent backpropagation can backpropagate through the optimization process and facilitate the learning of the attractor network regularizer. We demonstrate that the learned attractor network can recognize novel classes while remembering old classes without the need to review the original training set, outperforming baselines that do not rely on an iterative optimization process. 
Attention Branch Network (ABN) 
Visual explanation enables human to understand the decision making of Deep Convolutional Neural Network (CNN), but it is insufficient to contribute the performance improvement. In this paper, we focus on the attention map for visual explanation, which represents high response value as the important region in image recognition. This region significantly improves the performance of CNN by introducing an attention mechanism that focuses on a specific region in an image. In this work, we propose Attention Branch Network (ABN), which extends the topdown visual explanation model by introducing a branch structure with an attention mechanism. ABN can be applicable to several image recognition tasks by introducing a branch for attention mechanism and is trainable for the visual explanation and image recognition in endtoend manner. We evaluate ABN on several image recognition tasks such as image classification, finegrained recognition, and multiple facial attributes recognition. Experimental results show that ABN can outperform the accuracy of baseline models on these image recognition tasks while generating an attention map for visual explanation. Embedding Human Knowledge in Deep Neural Network via Attention Map 
Attention Distillation Loss  ➘ “Learning without Memorizing” 
Attention Economy  Attention economics is an approach to the management of information that treats human attention as a scarce commodity, and applies economic theory to solve various information management problems. 
Attention Enhanced Graph Convolutional LSTM Network (AGCLSTM) 
Skeletonbased action recognition is an important task that requires the adequate understanding of movement characteristics of a human action from the given skeleton sequence. Recent studies have shown that exploring spatial and temporal features of the skeleton sequence is vital for this task. Nevertheless, how to effectively extract discriminative spatial and temporal features is still a challenging problem. In this paper, we propose a novel Attention Enhanced Graph Convolutional LSTM Network (AGCLSTM) for human action recognition from skeleton data. The proposed AGCLSTM can not only capture discriminative features in spatial configuration and temporal dynamics but also explore the cooccurrence relationship between spatial and temporal domains. We also present a temporal hierarchical architecture to increases temporal receptive fields of the top AGCLSTM layer, which boosts the ability to learn the highlevel semantic representation and significantly reduces the computation cost. Furthermore, to select discriminative spatial information, the attention mechanism is employed to enhance information of key joints in each AGCLSTM layer. Experimental results on two datasets are provided: NTU RGB+D dataset and NorthwesternUCLA dataset. The comparison results demonstrate the effectiveness of our approach and show that our approach outperforms the stateoftheart methods on both datasets. 
Attention Fusion Network  Customer support is a central objective at Square as it helps us build and maintain great relationships with our sellers. In order to provide the best experience, we strive to deliver the most accurate and quasiinstantaneous responses to questions regarding our products. In this work, we introduce the Attention Fusion Network model which combines signals extracted from seller interactions on the Square product ecosystem, along with submitted email questions, to predict the most relevant solution to a seller’s inquiry. We show that the innovative combination of two very different data sources that are rarely used together, using stateoftheart deep learning systems outperforms, candidate models that are trained only on a single source. 
Attention Gated Network  We propose a novel attention gate (AG) model for medical image analysis that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules when using convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN models such as VGG or UNet architectures with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed AG models are evaluated on a variety of tasks, including medical image classification and segmentation. For classification, we demonstrate the use case of AGs in scan plane detection for fetal ultrasound screening. We show that the proposed attention mechanism can provide efficient object localisation while improving the overall prediction performance by reducing false positives. For segmentation, the proposed architecture is evaluated on two large 3D CT abdominal datasets with manual annotations for multiple organs. Experimental results show that AG models consistently improve the prediction performance of the base architectures across different datasets and training sizes while preserving computational efficiency. Moreover, AGs guide the model activations to be focused around salient regions, which provides better insights into how model predictions are made. The source code for the proposed AG models is publicly available. 
Attention Incorporate Network (AIN) 
In traditional neural networks for image processing, the inputs of the neural networks should be the same size such as 224*224*3. But how can we train the neural net model with different input size A common way to do is image deformation which accompany a problem of information loss (e.g. image crop or wrap). Sequence model(RNN, LSTM, etc.) can accept different size of input like text and audio. But one disadvantage for sequence model is that the previous information will become more fragmentary during the transfer in time step, it will make the network hard to train especially for long sequential data. In this paper we propose a new network structure called Attention Incorporate Network(AIN). It solve the problem of different size of inputs including: images, text, audio, and extract the key features of the inputs by attention mechanism, pay different attention depends on the importance of the features not rely on the data size. Experimentally, AIN achieve a higher accuracy, better convergence comparing to the same size of other network structure 
Attention Map Generator (AMG) 
We propose an attentioninjective deformable convolutional network called ADCrowdNet for crowd understanding that can address the accuracy degradation problem of highly congested noisy scenes. ADCrowdNet contains two concatenated networks. An attentionaware network called Attention Map Generator (AMG) first detects crowd regions in images and computes the congestion degree of these regions. Based on detected crowd regions and congestion priors, a multiscale deformable network called Density Map Estimator (DME) then generates highquality density maps. With the attentionaware training scheme and multiscale deformable convolutional scheme, the proposed ADCrowdNet achieves the capability of being more effective to capture the crowd features and more resistant to various noises. We have evaluated our method on four popular crowd counting datasets (ShanghaiTech, UCF_CC_50, WorldEXPO’10, and UCSD) and an extra vehicle counting dataset TRANCOS, our approach overwhelmingly beats existing approaches on all of these datasets. 
Attentional Encoder Network (AEN) 
Targeted sentiment classification aims at determining the sentimental tendency towards specific targets. Most of the previous approaches model context and target words using recurrent neural networks such as LSTM in conjunction with attention mechanisms. However, LSTM networks are difficult to parallelize because of their sequential nature. Moreover, since full backpropagation over the sequence requires large amounts of memory, essentially every implementation of backpropagation through time is the truncated version, which brings difficulty in remembering longterm patterns. To address these issues, this paper propose an Attentional Encoder Network (AEN) for targeted sentiment classification. Contrary to previous LSTM based works, AEN eschews complex recurrent neural networks and employs attention based encoders for the modeling between context and target, which can excavate the rich introspective and interactive semantic information from the word embeddings without considering the distance between words. This paper also raise the label unreliability issue and introduce label smoothing regularization term to the loss function for encouraging the model to be less confident with the training labels. Experimental results on three benchmark datasets demonstrate that our model achieves comparable or superior performances with a lightweight model size. 
Attentional Multiagent Predictive Modeling (VAIN) 
Multiagent predictive modeling is an essential step for understanding physical, social and teamplay systems. Recently, Interaction Networks (INs) were proposed for the task of modeling multiagent physical systems, INs scale with the number of interactions in the system (typically quadratic or higher order in the number of agents). In this paper we introduce VAIN, a novel attentional architecture for multiagent predictive modeling that scales linearly with the number of agents. We show that VAIN is effective for multiagent predictive modeling. Our method is evaluated on tasks from challenging multiagent prediction domains: chess and soccer, and outperforms competing multiagent approaches. 
AttentionAware Generative Adversarial Network (ATAGAN) 
In this work, we present a novel approach for training Generative Adversarial Networks (GANs). Using the attention maps produced by a Teacher Network we are able to improve the quality of the generated images as well as perform weakly object localization on the generated images. To this end, we generate images of HEp2 cells captured with Indirect Imunofluoresence (IIF) and study the ability of our network to perform a weakly localization of the cell. Firstly, we demonstrate that whilst GANs can learn the mapping between the input domain and the target distribution efficiently, the discriminator network is not able to detect the regions of interest. Secondly, we present a novel attention transfer mechanism which allows us to enforce the discriminator to put emphasis on the regions of interest via transfer learning. Thirdly, we show that this leads to more realistic images, as the discriminator learns to put emphasis on the area of interest. Fourthly, the proposed method allows one to generate both images as well as attention maps which can be useful for data annotation e.g in object detection. 
Attentionbased Adversarial Autoencoder Network Embedding (AAANE) 
Network embedding represents nodes in a continuous vector space and preserves structure information from the Network. Existing methods usually adopt a ‘onesizefitsall’ approach when concerning multiscale structure information, such as first and secondorder proximity of nodes, ignoring the fact that different scales play different roles in the embedding learning. In this paper, we propose an Attentionbased Adversarial Autoencoder Network Embedding(AAANE) framework, which promotes the collaboration of different scales and lets them vote for robust representations. The proposed AAANE consists of two components: 1) Attentionbased autoencoder effectively capture the highly nonlinear network structure, which can deemphasize irrelevant scales during training. 2) An adversarial regularization guides the autoencoder learn robust representations by matching the posterior distribution of the latent embeddings to given prior distribution. This is the first attempt to introduce attention mechanisms to multiscale network embedding. Experimental results on realworld networks show that our learned attention parameters are different for every network and the proposed approach outperforms existing stateoftheart approaches for network embedding. 
AttentionBased Bidirectional LSTM  Text segmentation plays an important role in various Natural Language Processing (NLP) tasks like summarization, context understanding, document indexing and document noise removal. Previous methods for this task require manual feature engineering, huge memory requirements and large execution times. To the best of our knowledge, this paper is the first one to present a novel supervised neural approach for text segmentation. Specifically, we propose an attentionbased bidirectional LSTM model where sentence embeddings are learned using CNNs and the segments are predicted based on contextual information. This model can automatically handle variable sized context information. Compared to the existing competitive baselines, the proposed model shows a performance improvement of ~7% in WinDiff score on three benchmark datasets. 
AttentionBased Convolutional Neural NetworkBidirectional LongShort Term Memory (CNNBLSTM) 
In this paper, we present an endtoend language identification framework, the attentionbased Convolutional Neural NetworkBidirectional Longshort Term Memory (CNNBLSTM). The model is performed on the utterance level, which means the utterancelevel decision can be directly obtained from the output of the neural network. To handle speech utterances with entire arbitrary and potentially long duration, we combine CNNBLSTM model with a selfattentive pooling layer together. The frontend CNNBLSTM module plays a role as local pattern extractor for the variablelength inputs, and the following selfattentive pooling layer is built on top to get the fixeddimensional utterancelevel representation. We conducted experiments on NIST LRE07 closedset task, and the results reveal that the proposed attentionbased CNNBLSTM model achieves comparable error reduction with other stateoftheart utterancelevel neural network approaches for all 3 seconds, 10 seconds, 30 seconds duration tasks. 
AttentionBased Feature Selection (AFS) 
As an effective data preprocessing step, feature selection has shown its effectiveness to prepare highdimensional data for many machine learning tasks. The proliferation of high dimension and huge volume big data, however, has brought major challenges, e.g. computation complexity and stability on noisy data, upon existing featureselection techniques. This paper introduces a novel neural networkbased feature selection architecture, dubbed Attentionbased Feature Selection (AFS). AFS consists of two detachable modules: an attention module for feature weight generation and a learning module for the problem modeling. The attention module formulates correlation problem among features and supervision target into a binary classification problem, supported by a shallow attention net for each feature. Feature weights are generated based on the distribution of respective feature selection patterns adjusted by backpropagation during the training process. The detachable structure allows existing offtheshelf models to be directly reused, which allows for much less training time, demands for the training data and requirements for expertise. A hybrid initialization method is also introduced to boost the selection accuracy for datasets without enough samples for feature weight generation. Experimental results show that AFS achieves the best accuracy and stability in comparison to several stateofart feature selection algorithms upon both MNIST, noisy MNIST and several datasets with small samples. 
AttentionGuided Generative Adversarial Network (AGGAN) 
The stateoftheart approaches in Generative Adversarial Networks (GANs) are able to learn a mapping function from one image domain to another with unpaired image data. However, these methods often produce artifacts and can only be able to convert lowlevel information, but fail to transfer highlevel semantic part of images. The reason is mainly that generators do not have the ability to detect the most discriminative semantic part of images, which thus makes the generated images with lowquality. To handle the limitation, in this paper we propose a novel AttentionGuided Generative Adversarial Network (AGGAN), which can detect the most discriminative semantic object and minimize changes of unwanted part for semantic manipulation problems without using extra data and models. The attentionguided generators in AGGAN are able to produce attention masks via a builtin attention mechanism, and then fuse the input image with the attention mask to obtain a target image with highquality. Moreover, we propose a novel attentionguided discriminator which only considers attended regions. The proposed AGGAN is trained by an endtoend fashion with an adversarial loss, cycleconsistency loss, pixel loss and attention loss. Both qualitative and quantitative results demonstrate that our approach is effective to generate sharper and more accurate images than existing models. 
AttentionRNN  Visual attention mechanisms have proven to be integrally important constituent components of many modern deep neural architectures. They provide an efficient and effective way to utilize visual information selectively, which has shown to be especially valuable in multimodal learning tasks. However, all prior attention frameworks lack the ability to explicitly model structural dependencies among attention variables, making it difficult to predict consistent attention masks. In this paper we develop a novel structured spatial attention mechanism which is endtoend trainable and can be integrated with any feedforward convolutional neural network. This proposed AttentionRNN layer explicitly enforces structure over the spatial attention variables by sequentially predicting attention values in the spatial mask in a bidirectional rasterscan and inverse rasterscan order. As a result, each attention value depends not only on local image or contextual information, but also on the previously predicted attention values. Our experiments show consistent quantitative and qualitative improvements on a variety of recognition tasks and datasets; including image categorization, question answering and image generation. 
AttentionXML  Extreme multilabel text classification (XMTC) is a task for tagging each given text with the most relevant multiple labels from an extremely largescale label set. This task can be found in many applications, such as product categorization,web page tagging, news annotation and so on. Many methods have been proposed so far for solving XMTC, while most of the existing methods use traditional bagofwords (BOW) representation, ignoring word context as well as deep semantic information. XMLCNN, a stateoftheart deep learningbased method, uses convolutional neural network (CNN) with dynamic pooling to process the text, going beyond the BOWbased appraoches but failing to capture 1) the longdistance dependency among words and 2) different levels of importance of a word for each label. We propose a new deep learningbased method, AttentionXML, which uses bidirectional long shortterm memory (LSTM) and a multilabel attention mechanism for solving the above 1st and 2nd problems, respectively. We empirically compared AttentionXML with other six stateoftheart methods over five benchmark datasets. AttentionXML outperformed all competing methods under all experimental settings except only a couple of cases. In addition, a consensus ensemble of AttentionXML with the second best method, Parabel, could further improve the performance over all five benchmark datasets. 
Attentive Adversarial DomainInvariant Training (AADIT) 
Adversarial domaininvariant training (ADIT) proves to be effective in suppressing the effects of domain variability in acoustic modeling and has led to improved performance in automatic speech recognition (ASR). In ADIT, an auxiliary domain classifier takes in equallyweighted deep features from a deep neural network (DNN) acoustic model and is trained to improve their domaininvariance by optimizing an adversarial loss function. In this work, we propose an attentive Adversarial domaininvariant training (AADIT) in which we advance the domain classifier with an attention mechanism to automatically weight the input deep features according to their importance in domain classification. With this attentive reweighting, AADIT can focus on the domain normalization of phonetic components that are more susceptible to domain variability and generates deep features with improved domaininvariance and senonediscriminativity over ADIT. Most importantly, the attention block serves only as an external component to the DNN acoustic model and is not involved in ASR, so AADIT can be used to improve the acoustic modeling with any DNN architectures. More generally, the same methodology can improve any adversarial learning system with an auxiliary discriminator. Evaluated on CHiME3 dataset, the AADIT achieves 13.6% and 9.3% relative WER improvements, respectively, over a multiconditional model and a strong ADIT baseline. 
Attentive Convolution Network (AttentiveConvNet) 
In NLP, convolution neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because attention in CNNs has been mainly implemented as attentive pooling (i.e., it is applied to pooling) rather than as attentive convolution (i.e., it is integrated into convolution). Convolution is the differentiator of CNNs in that it can powerfully model the higherlevel representation of a word by taking into account its local fixedsize context in input text $t^x$. In this work, we propose an attentive convolution network, AttentiveConvNet. It extends the context scope of the convolution operation, deriving higherlevel features for a word not only from local context, but also from information extracted from nonlocal context by the attention mechanism commonly used in RNNs. This nonlocal context can come (i) from parts of the input text $t^x$ that are distant or (ii) from a second input text, the context text $t^y$. In an evaluation on sentence relation classification (textual entailment and answer sentence selection) and text classification, experiments demonstrate that AttentiveConvNet has stateoftheart performance and outperforms RNN/CNN variants with and without attention. 
Attentive Crowd Flow Machine (ACFM) 
Traffic flow prediction is crucial for urban traffic management and public safety. Its key challenges lie in how to adaptively integrate the various factors that affect the flow changes. In this paper, we propose a unified neural network module to address this problem, called Attentive Crowd Flow Machine~(ACFM), which is able to infer the evolution of the crowd flow by learning dynamic representations of temporallyvarying data with an attention mechanism. Specifically, the ACFM is composed of two progressive ConvLSTM units connected with a convolutional layer for spatial weight prediction. The first LSTM takes the sequential flow density representation as input and generates a hidden state at each timestep for attention map inference, while the second LSTM aims at learning the effective spatialtemporal feature expression from attentionally weighted crowd flow features. Based on the ACFM, we further build a deep architecture with the application to citywide crowd flow prediction, which naturally incorporates the sequential and periodic data as well as other external influences. Extensive experiments on two standard benchmarks (i.e., crowd flow in Beijing and New York City) show that the proposed method achieves significant improvements over the stateoftheart methods. 
Attentive Dense Graph Propagation Module (ADGPM) 
The potential of graph convolutional neural networks for the task of zeroshot learning has been demonstrated recently. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, knowledge from distant nodes can get diluted when propagating through intermediate nodes, because current approaches to zeroshot learning use graph propagation schemes that perform Laplacian smoothing at each layer. We show that extensive smoothing does not help the task of regressing classifier weights in zeroshot learning. In order to still incorporate information from distant nodes and utilize the graph structure, we propose an Attentive Dense Graph Propagation Module (ADGPM). ADGPM allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a node’s relationship to its ancestors and descendants and an attention scheme is further used to weigh their contribution depending on the distance to the node. Finally, we illustrate that finetuning of the feature representation after training the ADGPM leads to considerable improvements. Our method achieves competitive results, outperforming previous zeroshot learning approaches. 
Attentive Dynamics Model (ADM) 
This paper investigates whether learning contingencyawareness and controllable aspects of an environment can lead to better exploration in reinforcement learning. To investigate this question, we consider an instantiation of this hypothesis evaluated on the Arcade Learning Element (ALE). In this study, we develop an attentive dynamics model (ADM) that discovers controllable elements of the observations, which are often associated with the location of the character in Atari games. The ADM is trained in a selfsupervised fashion to predict the actions taken by the agent. The learned contingency information is used as a part of the state representation for exploration purposes. We demonstrate that combining A2C with countbased exploration using our representation achieves impressive results on a set of notoriously challenging Atari games due to sparse rewards. For example, we report a stateoftheart score of >6600 points on Montezuma’s Revenge without using expert demonstrations, explicit highlevel information (e.g., RAM states), or supervised data. Our experiments confirm that indeed contingencyawareness is an extremely powerful concept for tackling exploration problems in reinforcement learning and opens up interesting research questions for further investigations. 
Attentive Long ShortTerm Preference (ALSTP) 
Ecommerce users may expect different products even for the same query, due to their diverse personal preferences. It is wellknown that there are two types of preferences: longterm ones and shortterm ones. The former refers to user’ inherent purchasing bias and evolves slowly. By contrast, the latter reflects users’ purchasing inclination in a relatively short period. They both affect users’ current purchasing intentions. However, few research efforts have been dedicated to jointly model them for the personalized product search. To this end, we propose a novel Attentive Long ShortTerm Preference model, dubbed as ALSTP, for personalized product search. Our model adopts the neural networks approach to learn and integrate the long and shortterm user preferences with the current query for the personalized product search. In particular, two attention networks are designed to distinguish which factors in the shortterm as well as longterm user preferences are more relevant to the current query. This unique design enables our model to capture users’ current search intentions more accurately. Our work is the first to apply attention mechanisms to integrate both long and shortterm user preferences with the given query for the personalized search. Extensive experiments over four Amazon product datasets show that our model significantly outperforms several stateoftheart product search methods in terms of different evaluation metrics. 
Attentive Memory Network  Recent advances in conversational systems have changed the search paradigm. Traditionally, a user poses a query to a search engine that returns an answer based on its index, possibly leveraging external knowledge bases and conditioning the response on earlier interactions in the search session. In a natural conversation, there is an additional source of information to take into account: utterances produced earlier in a conversation can also be referred to and a conversational IR system has to keep track of information conveyed by the user during the conversation, even if it is implicit. We argue that the process of building a representation of the conversation can be framed as a machine reading task, where an automated system is presented with a number of statements about which it should answer questions. The questions should be answered solely by referring to the statements provided, without consulting external knowledge. The time is right for the information retrieval community to embrace this task, both as a standalone task and integrated in a broader conversational search setting. In this paper, we focus on machine reading as a standalone task and present the Attentive Memory Network (AMN), an endtoend trainable machine reading algorithm. Its key contribution is in efficiency, achieved by having an hierarchical input encoder, iterating over the input only once. Speed is an important requirement in the setting of conversational search, as gaps between conversational turns have a detrimental effect on naturalness. On 20 datasets commonly used for evaluating machine reading algorithms we show that the AMN achieves performance comparable to the stateoftheart models, while using considerably fewer computations. 
Attentive Neural Process  Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed inputoutput pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context inputoutput pairs, and can learn a wide family of conditional distributions; they learn predictive distributions conditioned on context sets of arbitrary size. Nonetheless, we show that NPs suffer a fundamental drawback of underfitting, giving inaccurate predictions at the inputs of the observed data they condition on. We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. We show that this greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled. 
Attentive Regularization (AR) 
We propose Attentive Regularization (AR), a method to constrain the activation maps of kernels in Convolutional Neural Networks (CNNs) to specific regions of interest (ROIs). Each kernel learns a location of specialization along with its weights through standard backpropagation. A differentiable attention mechanism requiring no additional supervision is used to optimize the ROIs. Traditional CNNs of different types and structures can be modified with this idea into equivalent Targeted Kernel Networks (TKNs), while keeping the network size nearly identical. By restricting kernel ROIs, we reduce the number of sliding convolutional operations performed throughout the network in its forward pass, speeding up both training and inference. We evaluate our proposed architecture on both synthetic and natural tasks across multiple domains. TKNs obtain significant improvements over baselines, requiring less computation (around an order of magnitude) while achieving superior performance. 
AttoNet  While deep neural networks have achieved stateoftheart performance across a large number of complex tasks, it remains a big challenge to deploy such networks for practical, ondevice edge scenarios such as on mobile devices, consumer devices, drones, and vehicles. In this study, we take a deeper exploration into a humanmachine collaborative design approach for creating highly efficient deep neural networks through a synergy between principled network design prototyping and machinedriven design exploration. The efficacy of humanmachine collaborative design is demonstrated through the creation of AttoNets, a family of highly efficient deep neural networks for ondevice edge deep learning. Each AttoNet possesses a humanspecified networklevel macroarchitecture comprising of custom modules with unique machinedesigned modulelevel macroarchitecture and microarchitecture designs, all driven by humanspecified design requirements. Experimental results for the task of object recognition showed that the AttoNets created via humanmachine collaborative design has significantly fewer parameters and computational costs than stateoftheart networks designed for efficiency while achieving noticeably higher accuracy (with the smallest AttoNet achieving ~1.8% higher accuracy while requiring ~10x fewer multiplyadd operations and parameters than MobileNetV1). Furthermore, the efficacy of the AttoNets is demonstrated for the task of instancelevel object segmentation and object detection, where an AttoNetbased Mask RCNN network was constructed with significantly fewer parameters and computational costs (~5x fewer multiplyadd operations and ~2x fewer parameters) than a ResNet50 based Mask RCNN network. 
Attracting Random Walk  This paper introduces the Attracting Random Walks model, which describes the dynamics of a system of particles on a graph with certain attraction properties. In the model, particles move between adjacent vertices of a graph $\mathcal{G}$, with transition probabilities that depend positively on particle counts at neighboring vertices. From an applied standpoint, the model captures the rich get richer phenomenon. We show that the Markov chain underlying the dynamics exhibits a phase transition in mixing time, as the parameter governing the attraction is varied. Namely, mixing is fast in the hightemperature regime, and slow in the lowtemperature regime. When $\mathcal{G}$ is the complete graph, the model is a projection of the Potts model, whose phase transition is known. On the other hand, when the graph is incomplete, the model is nonreversible, and the stationary distribution is unknown. We demonstrate the existence of phase transition in mixing time for general graphs. 
Attribute Manifold Encoding GAN (AMEGAN) 
Image attribute transfer aims to change an input image to a target one with expected attributes, which has received significant attention in recent years. However, most of the existing methods lack the ability to decorrelate the target attributes and irrelevant information, i.e., the other attributes and background information, thus often suffering from blurs and artifacts. To address these issues, we propose a novel Attribute Manifold Encoding GAN (AMEGAN) for fullyfeatured attribute transfer, which can modify and adjust every detail in the images. Specifically, our method divides the input image into image attribute part and image background part on manifolds, which are controlled by attribute latent variables and background latent variables respectively. Through enforcing attribute latent variables to Gaussian distributions and background latent variables to uniform distributions respectively, the attribute transfer procedure becomes controllable and image generation is more photorealistic. Furthermore, we adopt a conditional multiscale discriminator to render accurate and highquality target attribute images. Experimental results on three popular datasets demonstrate the superiority of our proposed method in both performances of the attribute transfer and image generation quality. 
Attribution Modeling  Attribution modelling, in essence, means reporting on the impact of communication activity using metrics like: · Turnover · Profit · Customer retention · Volume of sales Instead of metrics like: · Share of voice · Web visits · Click through rate (CTR) · Impressions There’s a big difference between these two lists. The second list contains important metrics, but businesses could survive without ever increasing them. The business metrics in the first list, however, are essential for all companies that want to survive and thrive. Understanding the impact of communications on business metrics is – rightly – more important to senior executives. This is the primary objective of attribution modelling; to provide holistic, accurate information about the financial return activities are delivering so you can refine them, adjust what you’re doing, and use the same budget to deliver more value to your business and your customers. 
AttriGuard  Users in various web and mobile applications are vulnerable to attribute inference attacks, in which an attacker leverages a machine learning classifier to infer a target user’s private attributes (e.g., location, sexual orientation, political view) from its public data (e.g., rating scores, page likes). Existing defenses leverage game theory or heuristics based on correlations between the public data and attributes. These defenses are not practical. Specifically, gametheoretic defenses require solving intractable optimization problems, while correlationbased defenses incur large utility loss of users’ public data. In this paper, we present AttriGuard, a practical defense against attribute inference attacks. AttriGuard is computationally tractable and has small utility loss. Our AttriGuard works in two phases. Suppose we aim to protect a user’s private attribute. In Phase I, for each value of the attribute, we find a minimum noise such that if we add the noise to the user’s public data, then the attacker’s classifier is very likely to infer the attribute value for the user. We find the minimum noise via adapting existing evasion attacks in adversarial machine learning. In Phase II, we sample one attribute value according to a certain probability distribution and add the corresponding noise found in Phase I to the user’s public data. We formulate finding the probability distribution as solving a constrained convex optimization problem. We extensively evaluate AttriGuard and compare it with existing methods using a realworld dataset. Our results show that AttriGuard substantially outperforms existing methods. Our work is the first one that shows evasion attacks can be used as defensive techniques for privacy protection. 
Auction Theory  Auction theory is an applied branch of economics which deals with how people act in auction markets and researches the properties of auction markets. There are many possible designs (or sets of rules) for an auction and typical issues studied by auction theorists include the efficiency of a given auction design, optimal and equilibrium bidding strategies, and revenue comparison. Auction theory is also used as a tool to inform the design of realworld auctions; most notably auctions for the privatization of publicsector companies or the sale of licenses for use of the electromagnetic spectrum. 
AudioVisual SequencetoSequence Dual Network (AVSDN) 
Audiovisual event localization requires one to identify the event which is both visible and audible in a video (eitherat a frame or video level). To address this task, we propose a deep neural network named AudioVisual sequencetosequence dual network (AVSDN). By jointly taking bothaudio and visual features at each time segment as inputs, ourproposed model learns global and local event information ina sequence to sequence manner, which can be realized in either fully supervised or weakly supervised settings. Empirical results confirm that our proposed method performs favorably against recent deep learning approaches in both settings. 
AuerGervini Graphical Bayesian Approach  This article approaches the problem of selecting significant principal components from a Bayesian model selection perspective. The resulting Bayes rule provides a simple graphical technique that can be used instead of (or together with) the popular scree plot to determine the number of significant components to retain. We study the theoretical properties of the new method and show, by examples and simulation, that it provides more clearcut answers than the scree plot in many interesting situations. PCDimension 
Augmented Backward Elimination (ABE) 
Augmented backward elimination combines significance or information based criteria with the change in estimate to either select the optimal model for prediction purposes or to serve as a tool to obtain a practically sound, highly interpretable model. More details can be found in Dunkler et al. (2014) <doi:10.1371/journal.pone.0113677>. abe 
Augmented Concurrent Experience Replay (ACER) 
➘ “Hierarchical Deep Multiagent Reinforcement Learning” 
Augmented DickeyFuller Test (ADF) 
In statistics and econometrics, an augmented DickeyFuller test (ADF) is a test for a unit root in a time series sample. It is an augmented version of the DickeyFuller test for a larger and more complicated set of time series models. The augmented DickeyFuller (ADF) statistic, used in the test, is a negative number. The more negative it is, the stronger the rejection of the hypothesis that there is a unit root at some level of confidence. 
Augmented Intelligence  
Augmented Interval Markov Chains (AIMC) 
In this paper we propose augmented interval Markov chains (AIMCs): a generalisation of the familiar interval Markov chains (IMCs) where uncertain transition probabilities are in addition allowed to depend on one another. This new model preserves the flexibility afforded by IMCs for describing stochastic systems where the parameters are unclear, for example due to measurement error, but also allows us to specify transitions with probabilities known to be identical, thereby lending further expressivity. The focus of this paper is reachability in AIMCs. We study the qualitative, exact quantitative and approximate reachability problem, as well as natural subproblems thereof, and establish several upper and lower bounds for their complexity. We prove the exact reachability problem is at least as hard as the famous squareroot sum problem, but, encouragingly, the approximate version lies in $\mathbf{NP}$ if the underlying graph is known, whilst the restriction of the exact problem to a constant number of uncertain edges is in $\mathbf{P}$. Finally, we show that uncertainty in the graph structure affects complexity by proving $\mathbf{NP}$completeness for the qualitative subproblem, in contrast with an easilyobtained upper bound of $\mathbf{P}$ for the same subproblem with known graph structure. 
Augmented Inverse Probability Weighting (AIPWT) 
In this paper, we discuss an estimator for average treatment effects (ATEs) known as the augmented inverse propensity weighted (AIPW) estimator. This estimator has attractive theoretical properties and only requires practitioners to do two things they are already comfortable with: (1) specify a binary regression model for the propensity score, and (2) specify a regression model for the outcome variable. Perhaps the most interesting property of this estimator is its socalled ‘‘double robustness.” Put simply, the estimator remains consistent for the ATE if either the propensity score model or the outcome regression is misspecified but the other is properly specified. After explaining the AIPW estimator, we conduct a Monte Carlo experiment that compares the finite sample performance of the AIPW estimator to three common competitors: a regression estimator, an inverse propensity weighted (IPW) estimator, and a propensity score matching estimator. The Monte Carlo results show that the AIPW estimator has comparable or lower mean square error than the competing estimators when the propensity score and outcome models are both properly specified and, when one of the models is misspecified, the AIPW estimator is superior. ‘Robustsquared’ Imputation Models Using BART 
Augmented kmeans  Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, kmeans is a very important algorithm for this. In this technical report, we develop a new kmeans variant called Augmented kmeans, which is a hybrid of kmeans and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent reestimation of cluster means. Observations which can’t be firmly identified into clusters are excluded from the reestimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, nonsphericity, substantial overlap, and high scatter. Augmented kmeans frequently outperforms kmeans by more accurately classifying observations into known clusters and / or converging in fewer iterations. We demonstrate this on both simulated and real datasets. Our algorithm is implemented in Python and will be available with this report. 
Augmented Lagrangian Alternating Direction Inexact Newton (ALADIN) 
Distributed Optimal Power Flow using ALADIN 
Augmented Lagrangian Method (ALM) 
Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems; the difference is that the augmented Lagrangian method adds an additional term to the unconstrained objective. This additional term is designed to mimic a Lagrange multiplier. The augmented Lagrangian is not the same as the method of Lagrange multipliers. Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an additional penalty term (the augmentation). 
Augmented Neural ODE  We show that Neural Ordinary Differential Equations (ODEs) learn representations that preserve the topology of the input space and prove that this implies the existence of functions Neural ODEs cannot represent. To address these limitations, we introduce Augmented Neural ODEs which, in addition to being more expressive models, are empirically more stable, generalize better and have a lower computational cost than Neural ODEs. 
Augmented RNN (ARNN) 
The recent adoption of recurrent neural networks (RNNs) for session modeling has yielded substantial performance gains compared to previous approaches. In terms of contextaware session modeling, however, the existing RNNbased models are limited in that they are not designed to explicitly model rich static userside contexts (e.g., age, gender, location). Therefore, in this paper, we explore the utility of explicit userside context modeling for RNN session models. Specifically, we propose an augmented RNN (ARNN) model that extracts highorder usercontextual preference using the productbased neural network (PNN) in order to augment any existing RNN session model. Evaluation results show that our proposed model outperforms the baseline RNN session model by a large margin when rich userside contexts are available. 
Augmented Utilitarianism  In the light of ongoing progresses of research on artificial intelligent systems exhibiting a steadily increasing problemsolving ability, the identification of practicable solutions to the value alignment problem in AGI Safety is becoming a matter of urgency. In this context, one preeminent challenge that has been addressed by multiple researchers is the adequate formulation of utility functions or equivalents reliably capturing human ethical conceptions. However, the specification of suitable utility functions harbors the risk of ‘perverse instantiation’ for which no final consensus on responsible proactive countermeasures has been achieved so far. Amidst this background, we propose a novel sociotechnological ethical framework denoted Augmented Utilitarianism which directly alleviates the perverse instantiation problem. We elaborate on how augmented by AI and more generally science and technology, it might allow a society to craft and update ethical utility functions while jointly undergoing a dynamical ethical enhancement. Further, we elucidate the need to consider embodied simulations in the design of utility functions for AGIs aligned with human values. Finally, we discuss future prospects regarding the usage of the presented scientifically grounded ethical framework and mention possible challenges. 
Augmenting experienCe via TeacheR’s adviCE (ACTRCE) 
Sparse reward is one of the most challenging problems in reinforcement learning (RL). Hindsight Experience Replay (HER) attempts to address this issue by converting a failed experience to a successful one by relabeling the goals. Despite its effectiveness, HER has limited applicability because it lacks a compact and universal goal representation. We present Augmenting experienCe via TeacheR’s adviCE (ACTRCE), an efficient reinforcement learning technique that extends the HER framework using natural language as the goal representation. We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with nonlanguage goal representation failed to learn. We also show that with language goal representations, the agent can generalize to unseen instructions, and even generalize to instructions with unseen lexicons. We further demonstrate it is crucial to use hindsight advice to solve challenging tasks, and even small amount of advice is sufficient for the agent to achieve good performance. 
Author Rank  Author Rank is a new aspect of Google’s search algorithm that will score online content creators. Similar to SEO rankings for sites and pages, authors will now have an associated ranking based on a few contributing factors, including, but not limited to: · Social sharing of your Google+ posts · Quality of backlinks to your content · Interactions with your content (comments and shares) · Timely and topical content · Reputation and authority on other social networks · PageRank In short, Google will be assessing your reputation, authority, and the general reception of your content to determine just how valuable you are as an author. This ranking methodology provides writers with a greater incentive to not only build out their Google Plus profiles (smart move, Google), but also to ensure their online presence is streamlined and connected across all social networks and blogs to which they contribute. A user who is wellconnected, wellinformed, produces great content, and is seen by the larger community as valuable, will without question reap the rewards of a high author ranking. A Guide on Google’s Author Rank 
Auto Mutual Information  Information theoretic measures (entropies, entropy rates, mutual information) are nowadays commonly used in statistical signal processing for realworld data analysis. The present work proposes the use of Auto Mutual Information (Mutual Information between subsets of the same signal) and entropy rate as powerful tools to assess refined dependencies of any order in signal temporal dynamics. Notably, it is shown how twopoint Auto Mutual Information and entropy rate unveil information conveyed by higher order statistic and thus capture details of temporal dynamics that are overlooked by the (twopoint) correlation function. Statistical performance of relevant estimators for Auto Mutual Information and entropy rate are studied numerically, by means of Monte Carlo simulations, as functions of sample size, dependence structures and hyper parameters that enter their definition. Further, it is shown how Auto Mutual Information permits to discriminate between several different non Gaussian processes, having exactly the same marginal distribution and covariance function. Assessing higher order statistics via multipoint Auto Mutual Information is also shown to unveil the global dependence structure fo these processes, indicating that one of the non Gaussian actually has temporal dynamics that ressembles that of a Gaussian process with same covariance while the other does not. 
Auto Regressive TIme VArying (ARTIVA) 
Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions. To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying generegulation networks from biological timecourse expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.). ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail. 
AutoAssist  Deep neural networks have yielded superior performance in many applications; however, the gradient computation in a deep model with millions of instances lead to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount of improvement in the current model by a stochastic gradient update on each instance varies dynamically. In AutoAssist, we utilize this fact and design a simple instance shrinking operation, which is used to filter out instances with relatively low marginal improvement to the current model; thus the computationally intensive gradient computations are performed on informative instances as much as possible. We prove that the proposed technique outperforms vanilla SGD with existing importance sampling approaches for linear SVM problems, and establish an O(1/k) convergence for strongly convex problems. In order to apply the proposed techniques to accelerate training of deep models, we propose to jointly train a very lightweight Assistant network in addition to the original deep network referred to as Boss. The Assistant network is designed to gauge the importance of a given instance with respect to the current Boss such that a shrinking operation can be applied in the batch generator. With careful design, we train the Boss and Assistant in a nonblocking and asynchronous fashion such that overhead is minimal. We demonstrate that AutoAssist reduces the number of epochs by 40% for training a ResNet to reach the same test accuracy on an image classification data set and saves 30% training time needed for a transformer model to yield the same BLEU scores on a translation dataset. 
AutoAugment  Previous attempts for data augmentation are designed manually, and the augmentation policies are datasetspecific. Recently, an automatic data augmentation approach, named AutoAugment, is proposed using reinforcement learning. AutoAugment searches for the augmentation polices in the discrete search space, which may lead to a suboptimal solution. In this paper, we employ the Augmented Random Search method (ARS) to improve the performance of AutoAugment. Our key contribution is to change the discrete search space to continuous space, which will improve the searching performance and maintain the diversities between subpolicies. With the proposed method, stateoftheart accuracies are achieved on CIFAR10, CIFAR100, and ImageNet (without additional data). Our code is available at https://…/ARSAug. 
autoAx  Approximate computing is an emerging paradigm for developing highly energyefficient computing systems such as various accelerators. In the literature, many libraries of elementary approximate circuits have already been proposed to simplify the design process of approximate accelerators. Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic operation it is intractable to find an optimal combination of approximate circuits in the library even for an application consisting of a few operations. An open problem is ‘how to effectively combine circuits from these libraries to construct complex approximate accelerators’. This paper proposes a novel methodology for searching, selecting and combining the most suitable approximate circuits from a set of available libraries to generate an approximate accelerator for a given application. To enable fast design space generation and exploration, the methodology utilizes machine learning techniques to create computational models estimating the overall quality of processing and hardware cost without performing full synthesis at the accelerator level. Using the methodology, we construct hundreds of approximate accelerators (for a Sobel edge detector) showing different but relevant tradeoffs between the quality of processing and hardware cost and identify a corresponding Paretofrontier. Furthermore, when searching for approximate implementations of a generic Gaussian filter consisting of 17 arithmetic operations, the proposed approach allows us to identify approximately $10^3$ highly important implementations from $10^{23}$ possible solutions in a few hours, while the exhaustive search would take four months on a highend processor. 
Autocorrelation  Autocorrelation, also known as serial correlation or crossautocorrelation, is the crosscorrelation of a signal with itself at different points in time (that is what the cross stands for). Informally, it is the similarity between observations as a function of the time lag between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals. 
AutoCorrelation  ➘ “Autocorrelation” 
AutoCorrelation Function (ACF) 
The autocorrelation function measures the correlation of a signal x(t) with itself shifted by some time delay tau. The autocorrelation function can be used to detect repeats or periodicity in a signal. E.g. to use the autocorrelation to assess the effect of fluctuations (noise) on a periodic signal. 
AutoCorrelational Neural Network (ACNN) 
In recent years, the natural language processing community has moved away from taskspecific feature engineering, i.e., researchers discovering adhoc feature representations for various tasks, in favor of generalpurpose methods that learn the input representation by themselves. However, stateoftheart approaches to disfluency detection in spontaneous speech transcripts currently still depend on an array of handcrafted features, and other representations derived from the output of preexisting systems such as language models or dependency parsers. As an alternative, this paper proposes a simple yet effective model for automatic disfluency detection, called an autocorrelational neural network (ACNN). The model uses a convolutional neural network (CNN) and augments it with a new autocorrelation operator at the lowest layer that can capture the kinds of ‘rough copy’ dependencies that are characteristic of repair disfluencies in speech. In experiments, the ACNN model outperforms the baseline CNN on a disfluency detection task with a 5% increase in fscore, which is close to the previous best result on this task. 
Autocorrelogram  
AutoCross  Feature crossing captures interactions among categorical features and is useful to enhance learning from tabular data in realworld businesses. In this paper, we present AutoCross, an automatic feature crossing tool provided by 4Paradigm to its customers, ranging from banks, hospitals, to Internet corporations. By performing beam search in a treestructured space, AutoCross enables efficient generation of highorder cross features, which is not yet visited by existing works. Additionally, we propose successive minibatch gradient descent and multigranularity discretization to further improve efficiency and effectiveness, while ensuring simplicity so that no machine learning expertise or tedious hyperparameter tuning is required. Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and realworld business datasets demonstrate the effectiveness and efficiency of AutoCross. It is shown that AutoCross can significantly enhance the performance of both linear and deep models. 
Autodependogram  In this paper the serial dependences between the observed time series and the lagged series, taken into account onebyone, are graphically analyzed by what we have chosen to call the ‘autodependogram’. This tool, is a sort of natural nonlinear counterpart of the wellknown autocorrelogram used in the linear context. The simple idea, instead of using autocorrelations at varying time lags, exploits the c2test statistics applied to convenient contingency tables. The usefulness of this graphical device is confirmed by simulations from certain classes of wellknown models, characterized by randomness and also by different kinds of linear and nonlinear dependences. The autodependogram is also applied to both environmental and economic real data. In this way its ability to detect nonlinear features is highlighted. http://…/00463531472f604d52000000.pdf 
AutoDispNet  Much research work in computer vision is being spent on optimizing existing network architectures to obtain a few more percentage points on benchmarks. Recent AutoML approaches promise to relieve us from this effort. However, they are mainly designed for comparatively smallscale classification tasks. In this work, we show how to use and extend existing AutoML techniques to efficiently optimize largescale UNetlike encoderdecoder architectures. In particular, we leverage gradientbased neural architecture search and Bayesian optimization for hyperparameter search. The resulting optimization does not require a large companyscale compute cluster. We show results on disparity estimation that clearly outperform the manually optimized baseline and reach stateoftheart performance. 
AutoDistance Covariance Function (ADCV) 

Autoencoded Variational Inference For Topic Model (AVITM) 
Topic models are one of the most popular methods for learning representations of text, but a major challenge is that any change to the topic model requires mathematically deriving a new inference algorithm. A promising approach to address this problem is autoencoding variational Bayes (AEVB), but it has proven difficult to apply to topic models in practice. We present what is to our knowledge the first effective AEVB based inference method for latent Dirichlet allocation (LDA), which we call Autoencoded Variational Inference For Topic Model (AVITM). This model tackles the problems caused for AEVB by the Dirichlet prior and by component collapsing. We find that AVITM matches traditional methods in accuracy with much better inference time. Indeed, because of the inference network, we find that it is unnecessary to pay the computational cost of running variational optimization on test data. Because AVITM is black box, it is readily applied to new topic models. As a dramatic illustration of this, we present a new topic model called ProdLDA, that replaces the mixture model in LDA with a product of experts. By changing only one line of code from LDA, we find that ProdLDA yields much more interpretable topics, even if LDA is trained via collapsed Gibbs sampling. 
Autoencoder  An autoencoder, autoassociator or Diabolo network is an artificial neural network used for learning efficient codings. The aim of an autoencoder is to learn a compressed, distributed representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. 
AutoEncoder Feature Selector (AEFS) 
Highdimensional data in many areas such as computer vision and machine learning brings in computational and analytical difficulty. Feature selection which select a subset of features from original ones has been proven to be effective and efficient to deal with highdimensional data. In this paper, we propose a novel AutoEncoder Feature Selector (AEFS) for unsupervised feature selection. AEFS is based on the autoencoder and the group lasso regularization. Compared to traditional feature selection methods, AEFS can select the most important features in spite of nonlinear and complex correlation among features. It can be viewed as a nonlinear extension of the linear method regularized selfrepresentation (RSR) for unsupervised feature selection. In order to deal with noise and corruption, we also propose robust AEFS. An efficient iterative algorithm is designed for model optimization and experimental results verify the effectiveness and superiority of the proposed method. 
Autoencoding Binary Classifier (ABC) 
We propose the Autoencoding Binary Classifiers (ABC), a novel supervised anomaly detector based on the Autoencoder (AE). There are two main approaches in anomaly detection: supervised and unsupervised. The supervised approach accurately detects the known anomalies included in training data, but it cannot detect the unknown anomalies. Meanwhile, the unsupervised approach can detect both known and unknown anomalies that are located away from normal data points. However, it does not detect known anomalies as accurately as the supervised approach. Furthermore, even if we have labeled normal data points and anomalies, the unsupervised approach cannot utilize these labels. The ABC is a probabilistic binary classifier that effectively exploits the label information, where normal data points are modeled using the AE as a component. By maximizing the likelihood, the AE in the proposed ABC is trained to minimize the reconstruction error for normal data points, and to maximize it for known anomalies. Since our approach becomes able to reconstruct the normal data points accurately and fails to reconstruct the known and unknown anomalies, it can accurately discriminate both known and unknown anomalies from normal data points. Experimental results show that the ABC achieves higher detection performance than existing supervised and unsupervised methods. 
AutoEncoding Variational Bayes  How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is twofold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results. GitXiv 
Autoencoding Variational Transformation (AVT) 
The learning of TransformationEquivariant Representations (TERs), which is introduced by Hinton et al. \cite{hinton2011transforming}, has been considered as a principle to reveal visual structures under various transformations. It contains the celebrated Convolutional Neural Networks (CNNs) as a special case that only equivary to the translations. In contrast, we seek to train TERs for a generic class of transformations and train them in an {\em unsupervised} fashion. To this end, we present a novel principled method by Autoencoding Variational Transformations (AVT), compared with the conventional approach to autoencoding data. Formally, given transformed images, the AVT seeks to train the networks by maximizing the mutual information between the transformations and representations. This ensures the resultant TERs of individual images contain the {\em intrinsic} information about their visual structures that would equivary {\em extricably} under various transformations. Technically, we show that the resultant optimization problem can be efficiently solved by maximizing a variational lowerbound of the mutual information. This variational approach introduces a transformation decoder to approximate the intractable posterior of transformations, resulting in an autoencoding architecture with a pair of the representation encoder and the transformation decoder. Experiments demonstrate the proposed AVT model sets a new record for the performances on unsupervised tasks, greatly closing the performance gap to the supervised models. 
AutoFD  We study the problem of discovering functional dependencies (FD) from a noisy dataset. We focus on FDs that correspond to statistical dependencies in a dataset and draw connections between FD discovery and structure learning in probabilistic graphical models. We show that discovering FDs from a noisy dataset is equivalent to learning the structure of a graphical model over binary random variables, where each random variable corresponds to a functional of the dataset attributes. We build upon this observation to introduce AutoFD a conceptually simple framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that our methods can recover true functional dependencies across a diverse array of realworld and synthetic datasets, even in the presence of noisy or missing data. We find that AutoFD scales to large data instances with millions of tuples and hundreds of attributes while it yields an average F1 improvement of 2 times against stateoftheart FD discovery methods. 
autofeat  This paper describes the autofeat Python library, which provides a scikitlearn style linear regression model with automatic feature engineering and selection capabilities. Complex nonlinear machine learning models such as neural networks are in practice often difficult to train and even harder to explain to nonstatisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. Our library provides a multistep feature engineering and selection process, where first a large pool of nonlinear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability. 
AutoGAN  Classifiers fail to classify correctly input images that have been purposefully and imperceptibly perturbed to cause misclassification. This susceptability has been shown to be consistent across classifiers, regardless of their type, architecture or parameters. Common defenses against adversarial attacks modify the classifer boundary by training on additional adversarial examples created in various ways. In this paper, we introduce AutoGAN, which counters adversarial attacks by enhancing the lowerdimensional manifold defined by the training data and by projecting perturbed data points onto it. AutoGAN mitigates the need for knowing the attack type and magnitude as well as the need for having adversarial samples of the attack. Our approach uses a Generative Adversarial Network (GAN) with an autoencoder generator and a discriminator that also serves as a classifier. We test AutoGAN against adversarial samples generated with stateoftheart Fast Gradient Sign Method (FGSM) as well as samples generated with random Gaussian noise, both using the MNIST dataset. For different magnitudes of perturbation in training and testing, AutoGAN can surpass the accuracy of FGSM method by up to 25\% points on samples perturbed using FGSM. Without an augmented training dataset, AutoGAN achieves an accuracy of 89\% compared to 1\% achieved by FGSM method on FGSM testing adversarial samples. 
AutoKeras  ➚ “AutoKeras” 
AutoKeras  AutoKeras is an open source software library for automated machine learning (AutoML). It is developed by DATA Lab at Texas A&M University and community contributors. The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. AutoKeras provides functions to automatically search for architecture and hyperparameters of deep learning models. 
AutoKGE  Knowledge graph embedding (KGE) aims to find low dimensional vector representations of entities and relations so that their similarities can be quantized. Scoring functions (SFs), which are used to build a model to measure the similarity between entities based on a given relation, have developed as the crux of KGE. Humans have designed lots of SFs in the literature, and the evolving of SF has become the primary power source of boosting KGE’s performance. However, such improvements gradually get marginal. Besides, with so many SFs, how to make a proper choice among existing SFs already becomes a nontrivial problem. Inspired by the recent success of automated machine learning (AutoML), in this paper, we propose automated KGE (AutoKGE), to design and discover distinct SFs for KGE automatically. We first identify a unified representation over popularly used SFs, which helps to set up a search space for AutoKGE. Then, we propose a greedy algorithm, which is enhanced by a predictor to estimate the final performance without model training, to search through the space. Extensive experiments on benchmark datasets demonstrate the effectiveness and efficiency of our AutoKGE. Finally, the SFs, searched by our method, are KG dependent, new to the literature, and outperform existing stateofthearts SFs designed by humans. 
AutoLoss  Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters. Appropriately scheduling the optimization of a task objective or a set of parameters is usually crucial to the quality of convergence. In this paper, we present AutoLoss, a metalearning framework that automatically learns and determines the optimization schedule. AutoLoss provides a generic way to represent and learn the discrete optimization schedule from metadata, allows for a dynamic and datadriven schedule in ML problems that involve alternating updates of different parameters or from different loss objectives. We apply AutoLoss on four ML tasks: dary quadratic regression, classification using a multilayer perceptron (MLP), image generation using GANs, and multitask neural machine translation (NMT). We show that the AutoLoss controller is able to capture the distribution of better optimization schedules that result in higher quality of convergence on all four tasks. The trained AutoLoss controller is generalizable — it can guide and improve the learning of a new task model with different specifications, or on different datasets. 
Automata Network  An Automata Network is a map ${f:Q^n\rightarrow Q^n}$ where $Q$ is a finite alphabet. It can be viewed as a network of $n$ entities, each holding a state from $Q$, and evolving according to a deterministic synchronous update rule in such a way that each entity only depends on its neighbors in the network’s graph, called interaction graph. A major trend in automata network theory is to understand how the interaction graph affects dynamical properties of $f$. 
Automated Canary Analysis (ACA) 

automated CLAUse DETectEr (Claudette) 
Machine Learning Powered Analysis of Consumer Contracts and Privacy Policies. CLAUDETTE – ‘automated CLAUse DETectEr’ – is an interdisciplinary research project hosted at the Law Department of the European University Institute, led by professors Giovanni Sartor and HansW. Micklitz, in cooperation with engineers from University of Bologna and University of Modena and Reggio Emilia. The research objective is to test to what extent is it possible to automate reading and legal assessment of online consumer contracts and privacy policies, to evaluate their compliance with EU´s unfair contractual terms law and personal data protection law (GDPR), using machine learning and grammarbased approaches. The idea arose out of bewilderment. Having read dozens of terms of service and of privacy policies of online platforms, we came to conclusion that despite substantive law in place, and despite enforcers´ competence for abstract control, providers of online services still tend to use unfair and unlawful clauses in these documents. Hence, the idea to automate parts of enforcement process by delegating certain tasks to machines. On one hand, we believe that relying on automation can increase quality and effectiveness of legal work of enforcers. On the other, we want to empower consumers themselves, by giving them tools to quickly assess whether what they agree to online is fair and/or lawful. 
Automated Exploratory Data Analysis (autoEDA) 
The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most timeconsuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of twelve popular R packages to identify the parts of analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development. 
Automated Lyric Annotation (ALA) 
Comprehending lyrics, as found in songs and poems, can pose a challenge to human and machine readers alike. This motivates the need for systems that can understand the ambiguity and jargon found in such creative texts, and provide commentary to aid readers in reaching the correct interpretation. We introduce the task of automated lyric annotation (ALA). Like text simplification, a goal of ALA is to rephrase the original text in a more easily understandable manner. However, in ALA the system must often include additional information to clarify niche terminology and abstract concepts. To stimulate research on this task, we release a large collection of crowdsourced annotations for song lyrics. We analyze the performance of translation and retrieval models on this task, measuring performance with both automated and human evaluation. We find that each model captures a unique type of information important to the task. 
Automated Machine Learning (AutoML) 
Automated machine learning (AutoML) is the process of automating the endtoend process of applying machine learning to realworld problems. In a typical machine learning application, practitioners must apply the appropriate data preprocessing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of nonexperts, AutoML was proposed as an artificial intelligencebased solution to the evergrowing challenge of applying machine learning. Automating the endtoend process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand. The Current State of Automated Machine Learning 
Automated Model Order Selection Algorithm (AMOS) 
One of the longstanding problems in spectral graph clustering (SGC) is the socalled model order selection problem: automated selection of the correct number of clusters. This is equivalent to the problem of finding the number of connected components or communities in an undirected graph. In this paper, we propose AMOS, an automated model order selection algorithm for SGC. Based on a recent analysis of clustering reliability for SGC under the random interconnection model, AMOS works by incrementally increasing the number of clusters, estimating the quality of identified clusters, and providing a series of clustering reliability tests. Consequently, AMOS outputs clusters of minimal model order with statistical clustering reliability guarantees. Comparing to three other automated graph clustering methods on realworld datasets, AMOS shows superior performance in terms of multiple external and internal clustering metrics. 
Automated Planning and Scheduling  Automated planning and scheduling, sometimes denoted as simply AI Planning, is a branch of artificial intelligence that concerns the realization of strategies or action sequences, typically for execution by intelligent agents, autonomous robots and unmanned vehicles. Unlike classical control and classification problems, the solutions are complex and must be discovered and optimized in multidimensional space. Planning is also related to decision theory. In known environments with available models, planning can be done offline. Solutions can be found and evaluated prior to execution. In dynamically unknown environments, the strategy often needs to be revised online. Models and policies must be adapted. Solutions usually resort to iterative trial and error processes commonly seen in artificial intelligence. These include dynamic programming, reinforcement learning and combinatorial optimization. Languages used to describe planning and scheduling are often called action languages. 
Automated Predictive Library (APL) 
Automated Predictive Library (APL) is a HANA Application Function Library (AFL) which is meant to expose the features of InfiniteInsight (aka Kxen) inside HANA. 
Automatic Algorithm Discoverer (AAD) 
This paper presents Automatic Algorithm Discoverer (AAD), an evolutionary framework for synthesizing programs of high complexity. To guide evolution, prior evolutionary algorithms have depended on fitness (objective) functions, which are challenging to design. To make evolutionary progress, instead, AAD employs Problem Guided Evolution (PGE), which requires introduction of a group of problems together. With PGE, solutions discovered for simpler problems are used to solve more complex problems in the same group. PGE also enables several new evolutionary strategies, and naturally yields to HighPerformance Computing (HPC) techniques. We find that PGE and related evolutionary strategies enable AAD to discover algorithms of similar or higher complexity relative to the stateoftheart. Specifically, AAD produces Python code for 29 array/vector problems ranging from min, max, reverse, to more challenging problems like sorting and matrixvector multiplication. Additionally, we find that AAD shows adaptability to constrained environments/inputs and demonstrates outsideofthebox problem solving abilities. 
Automatic BAyesian Changepoints Under Sparsity (ABACUS) 
Change detection involves segmenting sequential data such that observations in the same segment share some desired properties. Multivariate change detection continues to be a challenging problem due to the variety of ways change points can be correlated across channels and the potentially poor signaltonoise ratio on individual channels. In this paper, we are interested in locating additive outliers (AO) and level shifts (LS) in the unsupervised setting. We propose ABACUS, Automatic BAyesian Changepoints Under Sparsity, a Bayesian source separation technique to recover latent signals while also detecting changes in model parameters. Multilevel sparsity achieves both dimension reduction and modeling of signal changes. We show ABACUS has competitive or superior performance in simulation studies against stateoftheart change detection methods and established latent variable models. We also illustrate ABACUS on two real application, modeling genomic profiles and analyzing household electricity consumption. 
Automatic Bayesian Density Analysis (ABDA) 
Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for density estimation, even when taking into account mixtures of probabilistic models, are not flexible enough to deal with the uncertainty inherent to realworld data: they are generally restricted to a priori fixed homogeneous likelihood model and to latent variable structures where expressiveness comes at the price of tractability. We propose Automatic Bayesian Density Analysis (ABDA) to go beyond classical mixture model density estimation, casting uncertainty estimation on both the underlying structure in the data, as well as the selection of adequate likelihood models for the data—thus statistical data types of the variable in the data—into a joint inference problem. Specifically, ABDA relies on a hierarchical model explicitly incorporating arbitrarily rich collections of likelihood models at a local level, while capturing global variable interactions by an expressive deep structure built on a sumproduct network. Extensive empirical evidence shows that ABDA is more accurate than density estimators in the literature at dealing with both kinds of uncertainties, at modeling and predicting realworld (mixed continuous and discrete) data in both transductive and inductive scenarios, and at recovering the statistical data types. 
Automatic Derivation Machine  This paper presents an artificial intelligence algorithm that can be used to derive formulas from various scientific disciplines called automatic derivation machine. First, the formula is abstractly expressed as a multiway tree model, and then each step of the formula derivation transformation is abstracted as a mapping of multiway trees. Derivation steps similar can be expressed as a reusable formula template by a multiway tree map. After that, the formula multiway tree is eigenencoded to feature vectors construct the feature space of formulas, the Qlearning model using in this feature space can achieve the derivation by making training data from derivation process. Finally, an automatic formula derivation machine is made to choose the next derivation step based on the current state and object. We also make an example about the nuclear reactor physics problem to show how the automatic derivation machine works. 
Automatic Gradient Boosting  Automatic machine learning performs predictive modeling with high performing machine learning tools without human interference. This is achieved by making machine learning applications parameterfree, i.e. only a dataset is provided while the complete model selection and model building process is handled internally through (often meta) optimization. Projects like AutoWEKA and autosklearn aim to solve the Combined Algorithm Selection and Hyperparameter optimization (CASH) problem resulting in huge configuration spaces. However, for most realworld applications, the optimization over only a few different key learning algorithms can not only be sufficient, but also potentially beneficial. The latter becomes apparent when one considers that models have to be validated, explained, deployed and maintained. Here, less complex model are often preferred, for validation or efficiency reasons, or even a strict requirement. Automatic gradient boosting simplifies this idea one step further, using only gradient boosting as a single learning algorithm in combination with modelbased hyperparameter tuning, threshold optimization and encoding of categorical features. We introduce this general framework as well as a concrete implementation called autoxgboost. It is compared to current AutoML projects on 16 datasets and despite its simplicity is able to achieve comparable results on about half of the datasets as well as performing best on two. 
Automatic Interactive Data Exploration (AIDE) 
In this paper, we argue that database systems be augmented with an automated data exploration service that methodically steers users through the data in a meaningful way. Such an automated system is crucial for deriving insights from complex datasets found in many big data applications such as scientific and healthcare applications as well as for reducing the human effort of data exploration. Towards this end, we present AIDE, an Automatic Interactive Data Exploration framework that assists users in discovering new interesting data patterns and eliminate expensive adhoc exploratory queries. AIDE relies on a seamless integration of classification algorithms and data management optimization techniques that collectively strive to accurately learn the user interests based on his relevance feedback on strategically collected samples. We present a number of exploration techniques as well as optimizations that minimize the number of samples presented to the user while offering interactive performance. AIDE can deliver highly accurate query predictions for very common conjunctive queries with small user effort while, given a reasonable number of samples, it can predict with high accuracy complex disjunctive queries. It provides interactive performance as it limits the user wait time per iteration of exploration to less than a few seconds. 
Automatic License Plate Recognition (ALPR) 
Automatic License Plate Recognition (ALPR) has been a frequent topic of research due to many practical applications. However, many of the current solutions are still not robust in realworld situations, commonly depending on many constraints. This paper presents a robust and efficient ALPR system based on the stateoftheart YOLO object detection. The Convolutional Neural Networks (CNNs) are trained and finetuned for each ALPR stage so that they are robust under different conditions (e.g., variations in camera, lighting, and background). Specially for character segmentation and recognition, we design a twostage approach employing simple data augmentation tricks such as inverted License Plates (LPs) and flipped characters. The resulting ALPR approach achieved impressive results in two datasets. First, in the SSIG dataset, composed of 2,000 frames from 101 vehicle videos, our system achieved a recognition rate of 93.53% and 47 Frames Per Second (FPS), performing better than both Sighthound and OpenALPR commercial systems (89.80% and 93.03%, respectively) and considerably outperforming previous results (81.80%). Second, targeting a more realistic scenario, we introduce a larger public dataset, called UFPRALPR dataset, designed to ALPR. This dataset contains 150 videos and 4,500 frames captured when both camera and vehicles are moving and also contains different types of vehicles (cars, motorcycles, buses and trucks). In our proposed dataset, the trial versions of commercial systems achieved recognition rates below 70%. On the other hand, our system performed better, with recognition rate of 78.33% and 35 FPS. 
Automatic Machine Learning (AutoML) 
The availability of large data sets and computational resources have encouraged the development of machine learning and datadriven models which pose an interesting alternative to explicit and fully structured models of behaviour. A battery of tools are now available which can automatically learn interesting mappings and simulate complex phenomena. These include techniques such as Hidden Markov Models, Neural Networks, Support Vector Machines, Mixture Models, Decision Trees and Bayesian Network Inference . In general, these tools have fallen into two classes: discriminative and generative models. The first attempts to optimize the learning for a particular task while the second models a phenomenon in its entirety. This difference between the two approaches will be addressed in this thesis in particular detail and the above probabilistic formalisms will be employed in deriving a machine learning system for our purposes. ChaLearn Automatic Machine Learning Challenge <a href="'Automate” target=”top”>https://…/ 
Automatic Model Selection  Neural networks and deep learning are changing the way that artificial intelligence is being done. Efficiently choosing a suitable network architecture and finetune its hyperparameters for a specific dataset is a timeconsuming task given the staggering number of possible alternatives. In this paper, we address the problem of model selection by means of a fully automated framework for efficiently selecting a neural network model for a given task: classification or regression. The algorithm, named Automatic Model Selection, is a modified microgenetic algorithm that automatically and efficiently finds the most suitable neural network model for a given dataset. The main contributions of this method are a simple list based encoding for neural networks as genotypes in an evolutionary algorithm, new crossover, and mutation operators, the introduction of a fitness function that considers both, the accuracy of the model and its complexity and a method to measure the similarity between two neural networks. AMS is evaluated on two different datasets. By comparing some models obtained with AMS to stateoftheart models for each dataset we show that AMS can automatically find efficient neural network models. Furthermore, AMS is computationally efficient and can make use of distributed computing paradigms to further boost its performance. 
Automatic MultiscaleBased Peak Detection (AMPD) 
A method for automatic detection of peaks in noisy periodic and quasiperiodic signals. This method, called automatic multiscalebased peak detection (AMPD), is based on the calculation and analysis of the local maxima scalogram, a matrix comprising the scaledependent occurrences of local maxima. ampd 
Automatic Query Expansion (AQE) 
Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations. In the context of search engines, query expansion involves evaluating a user’s input (what words were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional documents. Query expansion involves techniques such as: · Finding synonyms of words, and searching for the synonyms as well · Finding all the various morphological forms of words by stemming each word in the search query · Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results · Reweighting the terms in the original query Query expansion is a methodology studied in the field of computer science, particularly within the realm of natural language processing and information retrieval. 
Automatic Short Answer Grading (ASAG) 
Automatic short answer grading (ASAG) techniques are designed to automatically assess short answers to questions in natural language, having a length of a few words to a few sentences. Supervised ASAG techniques have been demonstrated to be effective but suffer from a couple of key practical limitations. They are greatly reliant on instructor provided model answers and need labeled training data in the form of graded student answers for every assessment task. To overcome these, in this paper, we introduce an ASAG technique with two novel features. We propose an iterative technique on an ensemble of (a) a text classifier of student answers and (b) a classifier using numeric features derived from various similarity measures with respect to model answers. Second, we employ canonical correlation analysis based transfer learning on a common feature representation to build the classifier ensemble for questions having no labelled data. The proposed technique handsomely beats all winning supervised entries on the SCIENTSBANK dataset from the Student Response Analysis task of SemEval 2013. Additionally, we demonstrate generalizability and benefits of the proposed technique through evaluation on multiple ASAG datasets from different subject topics and standards. 
Automatic Speech Recognition (ASR) 
In computer science and electrical engineering, speech recognition (SR) is the translation of spoken words into text. It is also known as ‘automatic speech recognition’ (ASR), ‘computer speech recognition’, or just ‘speech to text’ (STT). Some SR systems use ‘training’ (also called ‘enrolment’) where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person’s specific voice and uses it to finetune the recognition of that person’s speech, resulting in increased accuracy. Systems that do not use training are called ‘speaker independent’ systems. Systems that use training are called ‘speaker dependent’. Speech recognition applications include voice user interfaces such as voice dialling (e.g. ‘Call home’), call routing (e.g. ‘I would like to make a collect call’), domotic appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speechtotext processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input). The term voice recognition or speaker identification refers to identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person’s voice or it can be used to authenticate or verify the identity of a speaker as part of a security process. From the technology perspective, speech recognition has a long history with several waves of major innovations. Most recently, the field has benefited from advances in deep learning and big data. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems. These speech industry players include Microsoft, Google, IBM, Baidu (China), Apple, Amazon, Nuance, IflyTek (China), many of which have publicized the core technology in their speech recognition systems being based on deep learning. GitXiv A Brief History of ASR: Automatic Speech Recognition 
Automatic Summarization  Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. As the problem of information overload has grown, and as the quantity of data has increased, so has interest in automatic summarization. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. An example of the use of summarization technology is search engines such as Google. Document summarization is another. Generally, there are two approaches to automatic summarization: extraction and abstraction. Extractive methods work by selecting a subset of existing words, phrases, or sentences in the original text to form the summary. In contrast, abstractive methods build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might generate. Such a summary might contain words not explicitly present in the original. Research into abstractive methods is an increasingly important and active research area, however due to complexity constraints, research to date has focused primarily on extractive methods. 
Automatic Task Selection and Mixing (AutoSeM) 
Multitask learning (MTL) has achieved success over a wide range of problems, where the goal is to improve the performance of a primary task using a set of relevant auxiliary tasks. However, when the usefulness of the auxiliary tasks w.r.t. the primary task is not known a priori, the success of MTL models depends on the correct choice of these auxiliary tasks and also a balanced mixing ratio of these tasks during alternate training. These two problems could be resolved via manual intuition or hyperparameter tuning over all combinatorial task choices, but this introduces inductive bias or is not scalable when the number of candidate auxiliary tasks is very large. To address these issues, we present AutoSeM, a twostage MTL pipeline, where the first stage automatically selects the most useful auxiliary tasks via a BetaBernoulli multiarmed bandit with Thompson Sampling, and the second stage learns the training mixing ratio of these selected auxiliary tasks via a Gaussian Process based Bayesian optimization framework. We conduct several MTL experiments on the GLUE language understanding tasks, and show that our AutoSeM framework can successfully find relevant auxiliary tasks and automatically learn their mixing ratio, achieving significant performance boosts on several primary tasks. Finally, we present ablations for each stage of AutoSeM and analyze the learned auxiliary task choices. 
AutoML for Loss Function Search (AMLFS) 
Designing an effective loss function plays an important role in visual analysis. Most existing loss function designs rely on handcrafted heuristics that require domain experts to explore the large design space, which is usually suboptimal and timeconsuming. In this paper, we propose AutoML for Loss Function Search (AMLFS) which leverages REINFORCE to search loss functions during the training process. The key contribution of this work is the design of search space which can guarantee the generalization and transferability on different vision tasks by including a bunch of existing prevailing loss functions in a unified formulation. We also propose an efficient optimization framework which can dynamically optimize the parameters of loss function’s distribution during training. Extensive experimental results on four benchmark datasets show that, without any tricks, our method outperforms existing handcrafted loss functions in various computer vision tasks. 
Autonomous Deep Learning (ADL) 
The feasibility of deep neural networks (DNNs) to address data stream problems still requires intensive study because of the static and offline nature of conventional deep learning approaches. A deep continual learning algorithm, namely autonomous deep learning (ADL), is proposed in this paper. Unlike traditional deep learning methods, ADL features a flexible structure where its network structure can be constructed from scratch with the absence of initial network structure via the selfconstructing network structure. ADL specifically addresses catastrophic forgetting by having a differentdepth structure which is capable of achieving a tradeoff between plasticity and stability. Network significance (NS) formula is proposed to drive the hidden nodes growing and pruning mechanism. Drift detection scenario (DDS) is put forward to signal distributional changes in data streams which induce the creation of a new hidden layer. Maximum information compression index (MICI) method plays an important role as a complexity reduction module eliminating redundant layers. The efficacy of ADL is numerically validated under the prequential testthentrain procedure in lifelong environments using nine popular data stream problems. The numerical results demonstrate that ADL consistently outperforms recent continual learning methods while characterizing the automatic construction of network structures. 
Autonomous Predictive Modeler via Reinforcement Learning (APRL) 
Building a good predictive model requires an array of activities such as data imputation, feature transformations, estimator selection, hyperparameter search and ensemble construction. Given the large, complex and heterogenous space of options, offtheshelf optimization methods are infeasible for realistic response times. In practice, much of the predictive modeling process is conducted by experienced data scientists, who selectively make use of available tools. Over time, they develop an understanding of the behavior of operators, and perform serial decision making under uncertainty, colloquially referred to as educated guesswork. With an unprecedented demand for application of supervised machine learning, there is a call for solutions that automatically search for a good combination of parameters across these tasks to minimize the modeling error. We introduce a novel system called APRL (Autonomous Predictive modeler via Reinforcement Learning), that uses past experience through reinforcement learning to optimize such sequential decision making from within a set of diverse actions under a time constraint on a previously unseen predictive learning problem. APRL actions are taken to optimize the performance of a final ensemble. This is in contrast to other systems, which maximize individual model accuracy first and create ensembles as a disconnected postprocessing step. As a result, APRL is able to reduce up to 71\% of classification error on average over a wide variety of problems. 
AutoParallel  The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory management and garbage collection, which simplifies code reusage through library packages, and easily configurable tools for deployment. For instance, Python has risen to the top of the list of the programming languages due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. Moreover, the community has helped to develop a large number of libraries and modules, tuning them to obtain great performance. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate taskbased parallelization of affine loop nests to execute them in parallel in a distributed computing infrastructure. This parallelization can also include the building of data blocks to increase task granularity in order to achieve a good execution performance. Moreover, AutoParallel is based on sequential programming and only contains a small annotation in the form of a Python decorator so that anyone with little programming skills can scale up an application to hundreds of cores. 
AutoPruner  Channel pruning is an important family of methods to speedup deep model’s inference. Previous filter pruning algorithms regard channel pruning and model finetuning as two independent steps. This paper argues that combining them in a single endtoend trainable system will lead to better results. We propose an efficient channel selection layer, namely AutoPruner, to find less important filters automatically in a joint training manner. AutoPruner takes previous activation responses as input and generates a true binary index code for pruning. Hence, all the filters corresponding to zero index values can be removed safely after training. We empirically demonstrate that the gradient information of this channel selection layer is also helpful for the whole model training. Compared with previous stateoftheart pruning algorithms, AutoPruner achieves significantly better performance. Furthermore, ablation experiments show that the proposed novel minibatch pooling and binarization operations are vital for the success of filter pruning. 
AutoQB  In this paper, we propose a hierarchical deep reinforcement learning (DRL)based AutoML framework, AutoQB, to automatically explore the design space of channellevel network quantization and binarization for hardwarefriendly deep learning on mobile devices. Compared to prior DDPGbased quantization techniques, on the various CNN models, AutoQB automatically achieves the same inference accuracy by $\sim79\%$ less computing overhead, or improves the inference accuracy by $\sim2\%$ with the same computing cost. 
Autoregressive Conditional Duration (ACD) 
In financial econometrics, an autoregressive conditional duration (ACD, Engle and Russell (1998)) model considers irregularly spaced and autocorrelated intertrade durations. ACD is analogous to GARCH. Indeed, in a continuous double auction (a common trading mechanism in many financial markets) waiting times between two consecutive trades vary at random. ➘ “Generalized Autoregressive Conditional Heteroscedasticity” 
Autoregressive Conditional Heteroskedasticity (ARCH) 
In econometrics, autoregressive conditional heteroskedasticity (ARCH) models are used to characterize and model observed time series. They are used whenever there is reason to believe that, at any point in a series, the error terms will have a characteristic size or variance. In particular ARCH models assume the variance of the current error term or innovation to be a function of the actual sizes of the previous time periods’ error terms: often the variance is related to the squares of the previous innovations. Such models are often called ARCH models (Engle, 1982), although a variety of other acronyms are applied to particular structures of model which have a similar basis. ARCH models are employed commonly in modeling financial time series that exhibit timevarying volatility clustering, i.e. periods of swings followed by periods of relative calm. ARCHtype models are sometimes considered to be part of the family of stochastic volatility models but strictly this is incorrect since at time t the volatility is completely predetermined (deterministic) given previous values. 
Autoregressive Conditional Poisson (ACP) 
When modelling count data that comes in the form of a time series, the static Poisson regression and standard time series models are often not appropriate. A current study therefore involves the evaluation of several observationdriven and parameterdriven time series models for count data. In the observationdriven class of models, a fairly simple model is the Autoregressive Conditional Poisson (ACP) model. 
Autoregressive Distributed Lag Model (ARDL) 
A distributedlag model is a dynamic model in which the effect of a regressor x on y occurs over time rather than all at once. In the simple case of one explanatory variable and a linear relationship. This form is very similar to the infinitemovingaverage representation of an ARMA process, except that the lag polynomial on the righthand side is applied to the explanatory variable x rather than to a whitenoise process e. The individual coefficients ßs are called lag weights and the collectively comprise the lag distribution. They define the pattern of how x affects y over time. 
Autoregressive Energy Machine  Neural density estimators are flexible families of parametric models which have seen widespread use in unsupervised machine learning in recent years. Maximumlikelihood training typically dictates that these models be constrained to specify an explicit density. However, this limitation can be overcome by instead using a neural network to specify an energy function, or unnormalized density, which can subsequently be normalized to obtain a valid distribution. The challenge with this approach lies in accurately estimating the normalizing constant of the highdimensional energy function. We propose the Autoregressive Energy Machine, an energybased model which simultaneously learns an unnormalized density and computes an importancesampling estimate of the normalizing constant for each conditional in an autoregressive decomposition. The Autoregressive Energy Machine achieves stateoftheart performance on a suite of densityestimation tasks. 
Autoregressive Fractionally Integrated Moving Average (ARFIMA,FARIMA) 
In statistics, autoregressive fractionally integrated moving average models are time series models that generalize ARIMA (autoregressive integrated moving average) models by allowing noninteger values of the differencing parameter. These models are useful in modeling time series with long memory – that is, in which deviations from the longrun mean decay more slowly than an exponential decay. The acronyms “ARFIMA” or “FARIMA” are often used, although it is also conventional to simply extend the “ARIMA(p,d,q)” notation for models, by simply allowing the order of differencing, d, to take fractional values. 
Autoregressive Integrated Moving Average (ARIMA) 
In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). They are applied in some cases where data show evidence of nonstationarity, where an initial differencing step (corresponding to the “integrated” part of the model) can be applied to remove the nonstationarity. The model is generally referred to as an ARIMA(p,d,q) model where parameters p, d, and q are nonnegative integers that refer to the order of the autoregressive, integrated, and moving average parts of the model respectively. ARIMA models form an important part of the BoxJenkins approach to timeseries modelling. When one of the three terms is zero, it is usual to drop “AR”, “I” or “MA” from the acronym describing the model. For example, ARIMA(0,1,0) is I(1), and ARIMA(0,0,1) is MA(1). 
Autoregressive Integrated Moving Average With Explanatory Variable (ARIMAX) 
Matlab: ARIMAX Model Specifications ARIMA vs. ARIMAX – which approach is better to analyze and forecast macroeconomic time series? 
Autoregressive Novelty Detector (AND) 
We propose an unsupervised model for novelty detection. The subject is treated as a density estimation problem, in which a deep neural network is employed to learn a parametric function that maximizes probabilities of training samples. This is achieved by equipping an autoencoder with a novel module, responsible for the maximization of compressed codes’ likelihood by means of autoregression. We illustrate design choices and proper layers to perform autoregressive density estimation when dealing with both image and video inputs. Despite a very general formulation, our model shows promising results in diverse oneclass novelty detection and video anomaly detection benchmarks. 
AutoRL  Many continuous control tasks have easily formulated objectives, yet using them directly as a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many classical control tasks guide RL training using complex rewards, which require tedious handtuning. We automate the reward search with AutoRL, an evolutionary layer over standard RL that treats reward tuning as hyperparameter optimization and trains a population of RL agents to find a reward that maximizes the task objective. AutoRL, evaluated on four Mujoco continuous control tasks over two RL algorithms, shows improvements over baselines, with the the biggest uplift for more complex tasks. The video can be found at: \url{https://youtu.be/svdaOFfQyC8}. 
AutoSense  Word sense induction (WSI), or the task of automatically discovering multiple senses or meanings of a word, has three main challenges: domain adaptability, novel sense detection, and sense granularity flexibility. While current latent variable models are known to solve the first two challenges, they are not flexible to different word sense granularities, which differ very much among words, from aardvark with one sense, to play with over 50 senses. Current models either require hyperparameter tuning or nonparametric induction of the number of senses, which we find both to be ineffective. Thus, we aim to eliminate these requirements and solve the sense granularity problem by proposing AutoSense, a latent variable model based on two observations: (1) senses are represented as a distribution over topics, and (2) senses generate pairings between the target word and its neighboring word. These observations alleviate the problem by (a) throwing garbage senses and (b) additionally inducing finegrained word senses. Results show great improvements over the stateoftheart models on popular WSI datasets. We also show that AutoSense is able to learn the appropriate sense granularity of a word. Finally, we apply AutoSense to the unsupervised author name disambiguation task where the sense granularity problem is more evident and show that AutoSense is evidently better than competing models. We share our data and code here: https://…/AutoSense. 
AutoSklearn  autosklearn is an automated machine learning toolkit and a dropin replacement for a scikitlearn estimator. autosklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, metalearning and ensemble construction. Learn more about the technology behind autosklearn by reading this paper published at the NIPS 2015 . 
AutoSlim  We study how to set channel numbers in a neural network to achieve better accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or model size). A simple and oneshot solution, named AutoSlim, is presented. Instead of training many network samples and searching with reinforcement learning, we train a single slimmable network to approximate the network accuracy of different channel configurations. We then iteratively evaluate the trained slimmable model and greedily slim the layer with minimal accuracy drop. By this single pass, we can obtain the optimized channel configurations under different resource constraints. We present experiments with MobileNet v1, MobileNet v2, ResNet50 and RLsearched MNasNet on ImageNet classification. We show significant improvements over their default channel configurations. We also achieve better accuracy than recent channel pruning methods and neural architecture search methods. Notably, by setting optimized channel numbers, our AutoSlimMobileNetv2 at 305M FLOPs achieves 74.2% top1 accuracy, 2.4% better than default MobileNetv2 (301M FLOPs), and even 0.2% better than RLsearched MNasNet (317M FLOPs). Our AutoSlimResNet50 at 570M FLOPs, without depthwise convolutions, achieves 1.3% better accuracy than MobileNetv1 (569M FLOPs). Code and models will be available at: https://…/slimmable_networks 
AutoSpearman  The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated to defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonlyused feature selection techniques. Through a case study of 13 publiclyavailable defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 12%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonlyused feature selection techniques. 
Autostacker  We introduce an automatic machine learning (AutoML) modeling architecture called Autostacker, which combines an innovative hierarchical stacking architecture and an Evolutionary Algorithm (EA) to perform efficient parameter search. Neither prior domain knowledge about the data nor feature preprocessing is needed. Using EA, Autostacker quickly evolves candidate pipelines with high predictive accuracy. These pipelines can be used as is or as a starting point for human experts to build on. Autostacker finds innovative combinations and structures of machine learning models, rather than selecting a single model and optimizing its hyperparameters. Compared with other AutoML systems on fifteen datasets, Autostacker achieves stateofart or competitive performance both in terms of test accuracy and time cost. ➚ “Automatic Machine Learning” 
AutoTune  Big data analytics frameworks (BDAFs) have been widely used for data processing applications. These frameworks provide a large number of configuration parameters to users, which leads to a tuning issue that overwhelms users. To address this issue, many automatic tuning approaches have been proposed. However, it remains a critical challenge to generate enough samples in a highdimensional parameter space within a time constraint. In this paper, we present AutoTune–an automatic parameter tuning system that aims to optimize application execution time on BDAFs. AutoTune first constructs a smallerscale testbed from the production system so that it can generate more samples, and thus train a better prediction model, under a given time constraint. Furthermore, the AutoTune algorithm produces a set of samples that can provide a wide coverage over the highdimensional parameter space, and searches for more promising configurations using the trained prediction model. AutoTune is implemented and evaluated using the Spark framework and HiBench benchmark deployed on a public cloud. Extensive experimental results illustrate that AutoTune improves on default configurations by 63.70% on average, and on the five stateoftheart tuning algorithms by 6%23%. 
Autowarp  Measuring similarities between unlabeled time series trajectories is an important problem in domains as diverse as medicine, astronomy, finance, and computer vision. It is often unclear what is the appropriate metric to use because of the complex nature of noise in the trajectories (e.g. different sampling rates or outliers). Domain experts typically handcraft or manually select a specific metric, such as dynamic time warping (DTW), to apply on their data. In this paper, we propose Autowarp, an endtoend algorithm that optimizes and learns a good metric given unlabeled trajectories. We define a flexible and differentiable family of warping metrics, which encompasses common metrics such as DTW, Euclidean, and edit distance. Autowarp then leverages the representation power of sequence autoencoders to optimize for a member of this warping distance family. The output is a metric which is easy to interpret and can be robustly learned from relatively few trajectories. In systematic experiments across different domains, we show that Autowarp often outperforms handcrafted trajectory similarity metrics. 
Auxiliary Inference Divergence Estimator (AIDE) 
Approximate probabilistic inference algorithms are central to many fields. Examples include sequential Monte Carlo inference in robotics, variational inference in machine learning, and Markov chain Monte Carlo inference in statistics. A key problem faced by practitioners is measuring the accuracy of an approximate inference algorithm on a specific dataset. This paper introduces the auxiliary inference divergence estimator (AIDE), an algorithm for measuring the accuracy of approximate inference algorithms. AIDE is based on the observation that inference algorithms can be treated as probabilistic models and the random variables used within the inference algorithm can be viewed as auxiliary variables. This view leads to a new estimator for the symmetric KL divergence between the output distributions of two inference algorithms. The paper illustrates application of AIDE to algorithms for inference in regression, hidden Markov, and Dirichlet process mixture models. The experiments show that AIDE captures the qualitative behavior of a broad class of inference algorithms and can detect failure modes of inference algorithms that are missed by standard heuristics. 
Ava  Enterprises increasingly employ a wide array of tools and processes to make datadriven decisions. However, there are large ine ciencies in the enterprisewide work ow that stem from the fact that business work ows are expressed in natural language but the actual computational work ow has to be manually translated into computational programs. In this paper, we present an initial approach to bridge this gap by targeting the data science component of enterprise workflows. In many cases, this component is the slowest part of the overall enterprise process, and focusing on it allows us to take an initial step in solving the larger enterprisewide productivity problem. In this initial approach, we propose using a chatbot to allow a data scientist to assemble data analytics pipelines. A crucial insight is that while precise interpretation of general natural language continues to be challenging, controlled natural language methods are starting to become practical as natural interfaces in complex decisionmaking domains. In addition, we recognize that data science workflow components are often templatized. Putting these two insights together, we develop a practical system, called Ava, that uses (controlled) natural language to program data science work ows. We have an initial proofofconcept that demonstrates the potential of our approach. 
Average Causal Effect (ACE) 
The average causal effect (ACE) is defined as the average increase or decrease in value caused by the intervention. 
Average Hazard Ratio (AHR) 
➘ “Hazard Ratio” Estimation of the Average Hazard Ratio 
Average Nearest Neighbor Degree (ANND) 

Average Nearest Neighbor Rank (ANNR) 
The average nearest neighbor degree (ANND) of a node of degree $k$, as a function of $k$, is often used to characterize dependencies between degrees of a node and its neighbors in a network. We study the limiting behavior of the ANND in undirected random graphs with general i.i.d. degree sequences and arbitrary joint degree distribution of neighbor nodes, when the graph size tends to infinity. When the degree distribution has finite variance, the ANND converges to a deterministic function and we prove that for the configuration model, where nodes are connected at random, this, naturally, is a constant. For degree distributions with infinite variance, the ANND in the configuration model scales with the size of the graph and we prove a central limit theorem that characterizes this behavior. As a result, the ANND is uninformative for graphs with infinite variance degree distributions. We propose an alternative measure, the average nearest neighbor rank (ANNR) and prove its convergence to a deterministic function whenever the degree distribution has finite mean. In addition to our theoretical results we provide numerical experiments to show the convergence of both functions in the configuration model and the erased configuration model, where selfloops and multiple edges are removed. These experiments also shed new light on the wellknown `structural negative correlations’, or `finitesize effects’, that arise in simple graphs, because large nodes can only have a limited number of large neighbors. In particular we show that the majority of such effects for regularly varying distributions are due to a sampling bias. ➘ “Average Nearest Neighbor Degree” 
Average Sensitivity  In modern applications of graphs algorithms, where the graphs of interest are large and dynamic, it is unrealistic to assume that an input representation contains the full information of a graph being studied. Hence, it is desirable to use algorithms that, even when only a (large) subgraph is available, output solutions that are close to the solutions output when the whole graph is available. We formalize this idea by introducing the notion of average sensitivity of graph algorithms, which is the average earth mover’s distance between the output distributions of an algorithm on a graph and its subgraph obtained by removing an edge, where the average is over the edges removed and the distance between two outputs is the Hamming distance. In this work, we initiate a systematic study of average sensitivity. After deriving basic properties of average sensitivity such as composability, we provide efficient approximation algorithms with low average sensitivities for concrete graph problems, including the minimum spanning forest problem, the global minimum cut problem, the maximum matching problem, and the minimum vertex cover problem. We also show that every algorithm for the 2coloring problem has average sensitivity linear in the number of vertices. To show our algorithmic results, we establish and utilize the following fact; if the presence of a vertex or an edge in the solution output by an algorithm can be decided locally, then the algorithm has a low average sensitivity, allowing us to reuse the analyses of known sublineartime algorithms. 
Average Shifted Histogram (ASH) 
A simple device has been proposed for eliminating the bin edge problem of the frequency polygon while retaining many of the computational advantages of a density estimate based on bin counts. Scott (1983, 1985b) considered the problem of choosing among the collection of multivariate frequency polygons, each with the same smoothing parameter but di ering bin origins. Rather than choosing the \smoothest’ such curve or surface, he proposed averaging several of the shifted frequency polygons. As the average of piecewise linear curves is also piecewise linear, the resulting curve appears to be a frequency polygon as well. If the weights are nonnegative and sum to 1, the resulting \averaged shifted frequency polygon’ (ASFP) is nonnegative and integrates to 1. A nearly equivalent device is to average several shifted histograms, which is just as general but simpler to describe and analyze. The result is the \averaged shifted histogram’ (ASH). Since the average of piecewise constant functions such as the histogram is also piecewise constant, the ASH appears to be a histogram as well. In practice, the ASH is made continuous using either of the linear interpolation schemes described for the frequency polygon in Chapter 4 and will be referred to as the frequency polygon of the averaged shifted histogram (FPASH). The ASH is the practical choice for computationally and statistically e cient density estimation. 
Average Shifted Visual Predictive Checks (asVPC) 
➘ “Visual Predictive Checks” ➘ “Average Shifted Histogram” 
Average Topk Loss  In this work, we introduce the average top$k$ (AT$_k$) loss as a new ensemble loss for supervised learning, which is the average over the $k$ largest individual losses over a training dataset. We show that the AT$_k$ loss is a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss, but can combines their advantages and mitigate their drawbacks to better adapt to different data distributions. Furthermore, it remains a convex function over all individual losses, which can lead to convex optimization problems that can be solved effectively with conventional gradientbased method. We provide an intuitive interpretation of the AT$_k$ loss based on its equivalent effect on the continuous individual loss functions, suggesting that it can reduce the penalty on correctly classified data. We further give a learning theory analysis of MAT$_k$ learning on the classification calibration of the AT$_k$ loss and the error bounds of AT$_k$SVM. We demonstrate the applicability of minimum average top$k$ learning for binary classification and regression using synthetic and real datasets. 
Average Treatment Effect (ATE) 
The average treatment effect (ATE) is a measure used to compare treatments (or interventions) in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial (i.e., an experimental study), the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter (i.e., an estimand or property of a population) that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational and experimental study designs may enable one to estimate an ATE in a variety of ways. 
Averaged Gradient Episodic Memory (AGEM) 
In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a datadriven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyperparameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (LopezPaz & Ranzato, 2017), dubbed Averaged GEM (AGEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularizationbased methods. Finally, we show that all algorithms including AGEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that AGEM has the best tradeoff between accuracy and efficiency. ➘ “Gradient Episodic Memory” 
Averaged OneDependence Estimators (AODE) 
Averaged onedependence estimators (AODE) is a probabilistic classification learning technique. It was developed to address the attributeindependence problem of the popular naive Bayes classifier. It frequently develops substantially more accurate classifiers than naive Bayes at the cost of a modest increase in the amount of computation. 
Averaged Shifted Frequency Polygon (ASFP) 
http://…/Scott2015.pdf 
AVIATOR  This paper presents a visual tool, AVIATOR, that integrates the progressive visual analytics paradigm in the IR evaluation process. This tool serves to speedup and facilitate the performance assessment of retrieval models enabling a result analysis through visual facilities. AVIATOR goes one step beyond the common ‘compute wait visualize’ analytics paradigm, introducing a continuous evaluation mechanism that minimizes human and computational resource consumption. 
Avro2TF  Deep learning has been successfully applied to multiple AI systems at LinkedIn that are related to recommendation and search. One of the important lessons that we have learned during this journey is to provide good deep learning platforms that help our modeling engineers become more efficient and productive. Avro2TF is part of this effort to reduce the complexity of data processing and improving velocity of advanced modeling. In addition to advanced deep learning techniques, LinkedIn has been at the forefront of Machine Learning innovation for years now. We have many different ML approaches that consume large amount of data everyday. Efficiency and accuracy are the most important measurements for these approaches. To effectively support deep learning at LinkedIn, we need to first address the data processing issues. Most of the datasets used by our ML algorithms (e.g., LinkedIn’s large scale personalization engine PhotonML) are in Avro format. Each record in a Avro dataset is essentially a sparse vector, and can be easily consumed by most of the modern classifiers. However, the format cannot be directly used by TensorFlow — the leading deep learning package. The main blocker is that the sparse vector is not in the same format as Tensor. We believe that this is not only a LinkedIn problem, many companies have vast amount of ML data in similar sparse vector format, and Tensor format is still relatively new to many companies. Avro2TF bridges this gap by providing scalable Spark based transformation and extensions mechanism to efficiently convert the data into TF records that can be readily consumed by TensorFlow. With this technology, developers can improve their productivity by focusing on model building rather than data conversion. 
Awk  AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unixlike operating systems. AWK was created at Bell Labs in the 1970s, and its name is derived from the family names of its authors – Alfred Aho, Peter Weinberger, and Brian Kernighan. The acronym is pronounced the same as the name of the bird, auk (which acts as an emblem of the language such as on The AWK Programming Language book cover – the book is often referred to by the abbreviation TAPL). When written in all lowercase letters, as awk, it refers to the Unix or Plan 9 program that runs scripts written in the AWK programming language. The AWK language is a datadriven scripting language consisting of a set of actions to be taken against streams of textual data – either run directly on files or used as part of a pipeline – for purposes of extracting or transforming text, such as producing formatted reports. The language extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. While AWK has a limited intended application domain, and was especially designed to support oneliner programs, the language is Turingcomplete, and even the early Bell Labs users of AWK often wrote wellstructured large AWK programs. http://ferd.ca/awkin20minutes.html 
Ax  Developers and researchers alike face problems where they are confronted with a large space of possible ways to configure something — whether those are ‘magic numbers’ used for infrastructure or compiler flags, learning rates or other hyperparameters in machine learning, or images and callstoaction used in marketing promotions. Selecting and tuning these configurations can often take time, resources, and quality of user experiences. Ax is a machine learning system to help automate this process, and help researchers and developers get the most out of their software in an optimally efficient way. Ax is a platform for optimizing any kind of experiment, including machine learning experiments, A/B tests, and simulations. Ax can optimize discrete configurations (e.g., variants of an A/B test) using multiarmed bandit optimization, and continuous (e.g., integer or floating point)valued configurations using Bayesian optimization. This makes it suitable for a wide range of applications. Ax has been successfully applied to a variety of product, infrastructure, ML, and research applications at Facebook. 
AxTrain  The intrinsic error tolerance of neural network (NN) makes approximate computing a promising technique to improve the energy efficiency of NN inference. Conventional approximate computing focuses on balancing the efficiencyaccuracy tradeoff for existing pretrained networks, which can lead to suboptimal solutions. In this paper, we propose AxTrain, a hardwareoriented training framework to facilitate approximate computing for NN inference. Specifically, AxTrain leverages the synergy between two orthogonal methods—one actively searches for a network parameters distribution with high error tolerance, and the other passively learns resilient weights by numerically incorporating the noise distributions of the approximate hardware in the forward pass during the training phase. Experimental results from various datasets with nearthreshold computing and approximation multiplication strategies demonstrate AxTrain’s ability to obtain resilient neural network parameters and system energy efficiency improvement. 
Azure Machine Learning  Azure Machine Learning, a fullymanaged service in the cloud that enables you to easily build, deploy and share advanced analytics solutions. No software to download, no servers to manage – all you need to start is a browser and internet connectivity. 
AzureKusto  AzureKusto provides an interface (including DBI compliant methods for connecting to Kusto clusters and submitting Kusto Query Language (KQL) statements, as well as a dbplyr style backend that translates dplyr queries into KQL statements. On the administrator side, it extends the AzureRMR framework to allow for creating clusters and managing database principals. 