A/B Testing In marketing, A/B testing is a simple randomized experiment with two variants, A and B, which are the control and treatment in the controlled experiment. It is a form of statistical hypothesis testing. Other names include randomized controlled experiments, online controlled experiments, and split testing. In online settings, such as web design (especially user experience design), the goal is to identify changes to web pages that increase or maximize an outcome of interest (e.g., click-through rate for a banner advertisement).
Why does designing a simple A/B test seem so complicated
ABC Analysis In materials management, the ABC analysis (or Selective Inventory Control) is an inventory categorization technique. ABC analysis divides an inventory into three categories- “A items” with very tight control and accurate records, “B items” with less tightly controlled and good records, and “C items” with the simplest controls possible and minimal records. The ABC analysis provides a mechanism for identifying items that will have a significant impact on overall inventory cost, while also providing a mechanism for identifying different categories of stock that will require different management and controls. The ABC analysis suggests that inventories of an organization are not of equal value. Thus, the inventory is grouped into three categories (A, B, and C) in order of their estimated importance.
‘A’ items are very important for an organization. Because of the high value of these ‘A’ items, frequent value analysis is required. In addition to that, an organization needs to choose an appropriate order pattern (e.g. ‘Just- in- time’) to avoid excess capacity. ‘B’ items are important, but of course less important than ‘A’ items and more important than ‘C’ items. Therefore ‘B’ items are intergroup items. ‘C’ items are marginally important.
ABC Shadow This paper presents an original ABC algorithm, ABC Shadow, that can be applied to sample posterior densities that are continuously differentiable. The proposed algorithm solves the main condition to be fulfilled by any ABC algorithm, in order to be useful in practice. This condition requires enough samples in the parameter space region, induced by the observed statistics. The algorithm is tuned on the posterior of a Gaussian model which is entirely known, and then, it is applied for the statistical analysis of several spatial patterns. These patterns are issued or assumed to be outcomes of point processes. The considered models are: Strauss, Candy and area-interaction.
A simulated annealing procedure based on the ABC Shadow algorithm for statistical inference of point processes
Abductive Reasoning Abductive reasoning (also called abduction, abductive inference, or retroduction) is a form of logical inference which starts with an observation or set of observations then seeks to find the simplest and most likely explanation. In abductive reasoning, unlike in deductive reasoning, the premises do not guarantee the conclusion. One can understand abductive reasoning as inference to the best explanation, although not all uses of the terms abduction and inference to the best explanation are exactly equivalent. In the 1990s, as computing power grew, the fields of law, computer science, and artificial intelligence research spurred renewed interest in the subject of abduction. Diagnostic expert systems frequently employ abduction.
Abstaining Classification Abstaining classification aims to reject to classify the easily misclassified examples, so it is an effective approach to increase the clasificaiton reliability and reduce the misclassification risk in the cost-sensitive applications.
“Bounded-Abstention Method With two Constraints of Reject Rates”
Abstract Dialectical Framework
Abstract Dialectical Frameworks (ADFs) generalize Dung’s argumentation frameworks allowing various relationships among arguments to be expressed in a systematic way. We further generalize ADFs so as to accommodate arbitrary acceptance degrees for the arguments. This makes ADFs applicable in domains where both the initial status of arguments and their relationship are only insufficiently specified by Boolean functions. We define all standard ADF semantics for the weighted case, including grounded, preferred and stable semantics. We illustrate our approach using acceptance degrees from the unit interval and show how other valuation structures can be integrated. In each case it is sufficient to specify how the generalized acceptance conditions are represented by formulas, and to specify the information ordering underlying the characteristic ADF operator. We also present complexity results for problems related to weighted ADFs.
Abstract Meaning Representation
We describe AbstractMeaning Representation (AMR), a semantic representation language in which we are writing down the meanings of thousands of English sentences. We hope that a sembank of simple, whole-sentence semantic structures will spur new work in statistical natural language understanding and generation, like the Penn Treebank encouraged work on statistical parsing. This paper gives an overview of AMR and tools associated with it.
Abstract Meaning Representation (AMR) Annotation Release 1.0, Linguistic Data Consortium (LDC) Catalog Number LDC2014T12 and ISBN 1-58563-677-0, was developed by LDC, SDL/Language Weaver, Inc., the University of Colorado’s Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 13,000 English natural language sentences from newswire, weblogs and web discussion forums.
AMR captures ‘who is doing what to whom’ in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax.
AMR Specification
Abstract Meaning Representation: A survey
ABtree An Algorithm for Subgroup-Based Treatment Assignment. Given two possible treatments, there may exist subgroups who benefit greater from one treatment than the other. This problem is relevant to the field of marketing, where treatments may correspond to different ways of selling a product. It is similarly relevant to the field of public policy, where treatments may correspond to specific government programs
Accelerated Alternating Projections We study robust PCA for the fully observed setting, which is about separating a low rank matrix $\boldsymbol{L}$ and a sparse matrix $\boldsymbol{S}$ from their sum $\boldsymbol{D}=\boldsymbol{L}+\boldsymbol{S}$. In this paper, a new algorithm, termed accelerated alternating projections, is introduced for robust PCA which accelerates existing alternating projections proposed in [Netrapalli, Praneeth, et al., 2014]. Let $\boldsymbol{L}_k$ and $\boldsymbol{S}_k$ be the current estimates of the low rank matrix and the sparse matrix, respectively. The algorithm achieves significant acceleration by first projecting $\boldsymbol{D}-\boldsymbol{S}_k$ onto a low dimensional subspace before obtaining the new estimate of $\boldsymbol{L}$ via truncated SVD. Exact recovery guarantee has been established which shows linear convergence of the proposed algorithm. Empirical performance evaluations establish the advantage of our algorithm over other state-of-the-art algorithms for robust PCA.
Accelerated Destructive Degradation Test
Degradation data analysis is a powerful tool for reliability assessment. Useful reliability information is available from degradation data when there are few or even no failures. For some applications the degradation measurement process destroys or changes the physical/mechanical characteristics of test units. In such applications, only one meaningful measurement can be can be taken on each test unit. This is known as ‘destructive degradation’. Degradation tests are often accelerated by testing at higher than usual levels of accelerating variables like temperature.
Accelerated Distributed Nesterov Gradient Descent
This paper considers the distributed optimization problem over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. We develop an Accelerated Distributed Nesterov Gradient Descent (Acc-DNGD) method. When the objective function is convex and $L$-smooth, we show that it achieves a $O(\frac{1}{t^{1.4-\epsilon}})$ convergence rate for all $\epsilon\in(0,1.4)$. We also show the convergence rate can be improved to $O(\frac{1}{t^2})$ if the objective function is a composition of a linear map and a strongly-convex and smooth function. When the objective function is $\mu$-strongly convex and $L$-smooth, we show that it achieves a linear convergence rate of $O([ 1 – O( (\frac{\mu}{L})^{5/7} )]^t)$, where $\frac{L}{\mu}$ is the condition number of the objective.
Accelerated Failure Time Model
In the statistical area of survival analysis, an accelerated failure time model (AFT model) is a parametric model that provides an alternative to the commonly used proportional hazards models. Whereas a proportional hazards model assumes that the effect of a covariate is to multiply the hazard by some constant, an AFT model assumes that the effect of a covariate is to accelerate or decelerate the life course of a disease by some constant. This is especially appealing in a technical context where the ‘disease’ is a result of some mechanical process with a known sequence of intermediary stages.
Accelerated Gradient Boosting
Gradient tree boosting is a prediction algorithm that sequentially produces a model in the form of linear combinations of decision trees, by solving an infinite-dimensional optimization problem. We combine gradient boosting and Nesterov’s accelerated descent to design a new algorithm, which we call AGB (for Accelerated Gradient Boosting). Substantial numerical evidence is provided on both synthetic and real-life data sets to assess the excellent performance of the method in a large variety of prediction problems. It is empirically shown that AGB is much less sensitive to the shrinkage parameter and outputs predictors that are considerably more sparse in the number of trees, while retaining the exceptional performance of gradient boosting.
Accelerated Gradient Descent
In the world of optimization, we have a space and a convex objective function f we wish to minimize. We have seen that gradient descent is a simple greedy algorithm that works to minimize the objective function at some convergence rate (in this post we shall remain in discrete time). But the world is always stranger than we think. Indeed, there is a phenomenon of acceleration in convex optimization, in which we can boost the performance of some gradient-based algorithms by subtly modifying their implementation. In particular, we will discuss accelerated gradient descent, proposed by Yurii Nesterov in 1983, which achieves a faster – and optimal – convergence rate under the same assumption as gradient descent. Acceleration has received renewed research interests in recent years, leading to many proposed interpretations and further generalizations. Nevertheless, there is still a sense of mystery to what acceleration is doing and why it works; these are the questions that we want to understand better.
Accelerated Hierarchical Density Clustering We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter. This makes accelerated HDBSCAN* the default choice for density based clustering. Library available at: https://…/hdbscan
Accelerated Mini-Batch k-Means We propose an accelerated Mini-Batch k-means algorithm which combines three key improvements. The first is a modified center update which results in convergence to a local minimum in fewer iterations. The second is an adaptive increase of batchsize to meet an increasing requirement for centroid accuracy. The third is the inclusion of distance bounds based on the triangle inequality, which are used to eliminate distance calculations along the same lines as Elkan’s algorithm. The combination of the two latter constitutes a very powerful scheme to reuse computation already done over samples until statistical accuracy requires the use of additional data points.
Accelerated Proximal Boosting Gradient boosting is a prediction method that iteratively combines weak learners to produce a complex and accurate model. From an optimization point of view, the learning procedure of gradient boosting mimics a gradient descent on a functional variable. This paper proposes to build upon the proximal point algorithm when the empirical risk to minimize is not differentiable. In addition, the novel boosting approach, called accelerated proximal boosting, benefits from Nesterov’s acceleration in the same way as gradient boosting [Biau et al., 2018]. Advantages of leveraging proximal methods for boosting are illustrated by numerical experiments on simulated and real-world data. In particular, we exhibit a favorable comparison over gradient boosting regarding convergence rate and prediction accuracy.
Accelerated Proximal Gradient
More Efficient Accelerated Proximal Algorithm for Nonconvex Problems
Accelerated Sparse Discriminant Analysis accSDA
Accelerated Sparse Subspace Clustering State-of-the-art algorithms for sparse subspace clustering perform spectral clustering on a similarity matrix typically obtained by representing each data point as a sparse combination of other points using either basis pursuit (BP) or orthogonal matching pursuit (OMP). BP-based methods are often prohibitive in practice while the performance of OMP-based schemes are unsatisfactory, especially in settings where data points are highly similar. In this paper, we propose a novel algorithm that exploits an accelerated variant of orthogonal least-squares to efficiently find the underlying subspaces. We show that under certain conditions the proposed algorithm returns a subspace-preserving solution. Simulation results illustrate that the proposed method compares favorably with BP-based method in terms of running time while being significantly more accurate than OMP-based schemes.
Acceptance-Rejection Method “Rejection Sampling”
Accumulated Gradient Normalization This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous EASGD and DynSGD, which we show empirically.
Accumulated Local Effects
Accumulated Total Derivative Effects Plot Interpreting a nonparametric regression model with many predictors is known to be a challenging problem. There has been renewed interest in this topic due to the extensive use of machine learning algorithms and the difficulty in understanding and explaining their input-output relationships. This paper develops a unified framework using a derivative-based approach for existing tools in the literature, including the partial-dependence plots, marginal plots and accumulated effects plots. It proposes a new interpretation technique called the accumulated total derivative effects plot and demonstrates how its components can be used to develop extensive insights in complex regression models with correlated predictors. The techniques are illustrated through simulation results.
Accuracy In the fields of science, engineering, industry, and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value.
Accuracy and Precision Accuracy and precision are defined in terms of systematic and random errors. The more common definition associates accuracy with systematic errors and precision with random errors. Another definition, advanced by ISO, associates trueness with systematic errors and precision with random errors, and defines accuracy as the combination of both trueness and precision.
Accuracy Paradox The accuracy paradox for predictive analytics states that predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. It may be better to avoid the accuracy metric in favor of other metrics such as precision and recall.
Accuracy is often the starting point for analyzing the quality of a predictive model, as well as an obvious criterion for prediction. Accuracy measures the ratio of correct predictions to the total number of cases evaluated. It may seem obvious that the ratio of correct predictions to cases should be a key metric. A predictive model may have high accuracy, but be useless.
AccurateML The growing demands of processing massive datasets have promoted irresistible trends of running machine learning applications on MapReduce. When processing large input data, it is often of greater values to produce fast and accurate enough approximate results than slow exact results. Existing techniques produce approximate results by processing parts of the input data, thus incurring large accuracy losses when using short job execution times, because all the skipped input data potentially contributes to result accuracy. We address this limitation by proposing AccurateML that aggregates information of input data in each map task to create small aggregated data points. These aggregated points enable all map tasks producing initial outputs quickly to save computation times and decrease the outputs’ size to reduce communication times. Our approach further identifies the parts of input data most related to result accuracy, thus first using these parts to improve the produced outputs to minimize accuracy losses. We evaluated AccurateML using real machine learning applications and datasets. The results show: (i) it reduces execution times by 30 times with small accuracy losses compared to exact results; (ii) when using the same execution times, it achieves 2.71 times reductions in accuracy losses compared to existing approximate processing techniques.
AceKG Most existing knowledge graphs (KGs) in academic domains suffer from problems of insufficient multi-relational information, name ambiguity and improper data format for large-scale machine pro- cessing. In this paper, we present AceKG, a new large-scale KG in academic domain. AceKG not only provides clean academic information, but also offers a large-scale benchmark dataset for researchers to conduct challenging data mining projects including link prediction, community detection and scholar classification. Specifically, AceKG describes 3.13 billion triples of academic facts based on a consistent ontology, including necessary properties of papers, authors, fields of study, venues and institutes, as well as the relations among them. To enrich the proposed knowledge graph, we also perform entity alignment with existing databases and rule-based inference. Based on AceKG, we conduct experiments of three typical academic data mining tasks and evaluate several state-of- the-art knowledge embedding and network representation learning approaches on the benchmark datasets built from AceKG. Finally, we discuss several promising research directions that benefit from AceKG.
Action Schema Network
In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weight-sharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the network to be amortised over all problems in that domain. Further, we propose a training method which balances exploration and supervised training on small problems to produce a policy which remains robust when evaluated on larger problems. In experiments, we show that ASNet’s learning capability allows it to significantly outperform traditional non-learning planners in several challenging domains.
Action-Attending Graphic Neural Network
The motion analysis of human skeletons is crucial for human action recognition, which is one of the most active topics in computer vision. In this paper, we propose a fully end-to-end action-attending graphic neural network (A$^2$GNN) for skeleton-based action recognition, in which each irregular skeleton is structured as an undirected attribute graph. To extract high-level semantic representation from skeletons, we perform the local spectral graph filtering on the constructed attribute graphs like the standard image convolution operation. Considering not all joints are informative for action analysis, we design an action-attending layer to detect those salient action units (AUs) by adaptively weighting skeletal joints. Herein the filtering responses are parameterized into a weighting function irrelevant to the order of input nodes. To further encode continuous motion variations, the deep features learnt from skeletal graphs are gathered along consecutive temporal slices and then fed into a recurrent gated network. Finally, the spectral graph filtering, action-attending and recurrent temporal encoding are integrated together to jointly train for the sake of robust action recognition as well as the intelligibility of human actions. To evaluate our A$^2$GNN, we conduct extensive experiments on four benchmark skeleton-based action datasets, including the large-scale challenging NTU RGB+D dataset. The experimental results demonstrate that our network achieves the state-of-the-art performances.
Action-Convolution Deep Q-Network “Deep Curiosity Loop”
Action-Specific Deep Recurrent Q-Network
Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair. The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering Atari games.
Activation Ensemble Many activation functions have been proposed in the past, but selecting an adequate one requires trial and error. We propose a new methodology of designing activation functions within a neural network at each layer. We call this technique an ‘activation ensemble’ because it allows the use of multiple activation functions at each layer. This is done by introducing additional variables, $\alpha$, at each activation layer of a network to allow for multiple activation functions to be active at each neuron. By design, activations with larger $\alpha$ values at a neuron is equivalent to having the largest magnitude. Hence, those higher magnitude activations are ‘chosen’ by the network. We implement the activation ensembles on a variety of datasets using an array of Feed Forward and Convolutional Neural Networks. By using the activation ensemble, we achieve superior results compared to traditional techniques. In addition, because of the flexibility of this methodology, we more deeply explore activation functions and the features that they capture.
Active and Adaptive Sequential Learning A framework is introduced for actively and adaptively solving a sequence of machine learning problems, which are changing in bounded manner from one time step to the next. An algorithm is developed that actively queries the labels of the most informative samples from an unlabeled data pool, and that adapts to the change by utilizing the information acquired in the previous steps. Our analysis shows that the proposed active learning algorithm based on stochastic gradient descent achieves a near-optimal excess risk performance for maximum likelihood estimation. Furthermore, an estimator of the change in the learning problems using the active learning samples is constructed, which provides an adaptive sample size selection rule that guarantees the excess risk is bounded for sufficiently large number of time steps. Experiments with synthetic and real data are presented to validate our algorithm and theoretical results.
Active Betweenness Cardinality Centrality rankings such as degree, closeness, betweenness, Katz, PageRank, etc. are commonly used to identify critical nodes in a graph. These methods are based on two assumptions that restrict their wider applicability. First, they assume the exact topology of the network is available. Secondly, they do not take into account the activity over the network and only rely on its topology. However, in many applications, the network is autonomous, vast, and distributed, and it is hard to collect the exact topology. At the same time, the underlying pairwise activity between node pairs is not uniform and node criticality strongly depends on the activity on the underlying network. In this paper, we propose active betweenness cardinality, as a new measure, where the node criticalities are based on not the static structure, but the activity of the network. We show how this metric can be computed efficiently by using only local information for a given node and how we can find the most critical nodes starting from only a few nodes. We also show how this metric can be used to monitor a network and identify failed nodes.We present experimental results to show effectiveness by demonstrating how the failed nodes can be identified by measuring active betweenness cardinality of a few nodes in the system.
Active Function Cross-Entropy Clustering
Active function cross-entropy clustering partitions the n-dimensional data into the clusters by finding the parameters of the mixed generalized multivariate normal distribution, that optimally approximates the scattering of the data in the n-dimensional space, whose density function is of the form: p_1*N(mi_1,^sigma_1,sigma_1,f_1)+…+p_k*N(mi_k,^sigma_k,sigma_k,f_k). The above-mentioned generalization is performed by introducing so called ‘f-adapted Gaussian densities’ (i.e. the ordinary Gaussian densities adapted by the ‘active function’). Additionally, the active function cross-entropy clustering performs the automatic reduction of the unnecessary clusters. For more information please refer to P. Spurek, J. Tabor, K.Byrski, ‘Active function Cross-Entropy Clustering’ (2017) <doi:10.1016/j.eswa.2016.12.011>.
Active Information Store
The goal of the AIS project is to provide a scalable information repository to support data-intensive information worker and decision support applications. AIS can help businesses to enhance, structure and navigate data to discover information and relationships that are in existing data stores but not readily available or obvious. (see also HANA Graph Engine)
Active Learning Active learning is a special case of semi-supervised machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design. There are situations in which unlabeled data is abundant but manually labeling is expensive. In such a scenario, learning algorithms can actively query the user/teacher for labels. This type of iterative supervised learning is called active learning.
Active Learning with Partial Feedback In the large-scale multiclass setting, assigning labels often consists of answering multiple questions to drill down through a hierarchy of classes. Here, the labor required per annotation scales with the number of questions asked. We propose active learning with partial feedback. In this setup, the learner asks the annotator if a chosen example belongs to a (possibly composite) chosen class. The answer eliminates some classes, leaving the agent with a partial label. Success requires (i) a sampling strategy to choose (example, class) pairs, and (ii) learning from partial labels. Experiments on the TinyImageNet dataset demonstrate that our most effective method achieves a 21% relative improvement in accuracy for a 200k binary question budget. Experiments on the TinyImageNet dataset demonstrate that our most effective method achieves a 26% relative improvement (8.1% absolute) in top1 classification accuracy for a 250k (or 30%) binary question budget, compared to a naive baseline. Our work may also impact traditional data annotation. For example, our best method fully annotates TinyImageNet with only 482k (with EDC though, ERC is 491) binary questions (vs 827k for naive method).
Active Long Term Memory Network
Continual Learning in artificial neural networks suffers from interference and forgetting when different tasks are learned sequentially. This paper introduces the Active Long Term Memory Networks (A-LTM), a model of sequential multi-task deep learning that is able to maintain previously learned association between sensory input and behavioral output while acquiring knew knowledge. A-LTM exploits the non-convex nature of deep neural networks and actively maintains knowledge of previously learned, inactive tasks using a distillation loss. Distortions of the learned input-output map are penalized but hidden layers are free to transverse towards new local optima that are more favorable for the multi-task objective. We re-frame the McClelland’s seminal Hippocampal theory with respect to Catastrophic Inference (CI) behavior exhibited by modern deep architectures trained with back-propagation and inhomogeneous sampling of latent factors across epochs. We present empirical results of non-trivial CI during continual learning in Deep Linear Networks trained on the same task, in Convolutional Neural Networks when the task shifts from predicting semantic to graphical factors and during domain adaptation from simple to complex environments. We present results of the A-LTM model’s ability to maintain viewpoint recognition learned in the highly controlled iLab-20M dataset with 10 object categories and 88 camera viewpoints, while adapting to the unstructured domain of Imagenet with 1,000 object categories.
Active Neural Localizer Localization is the problem of estimating the location of an autonomous agent from an observation and a map of the environment. Traditional methods of localization, which filter the belief based on the observations, are sub-optimal in the number of steps required, as they do not decide the actions taken by the agent. We propose ‘Active Neural Localizer’, a fully differentiable neural network that learns to localize accurately and efficiently. The proposed model incorporates ideas of traditional filtering-based localization methods, by using a structured belief of the state with multiplicative interactions to propagate belief, and combines it with a policy model to localize accurately while minimizing the number of steps required for localization. Active Neural Localizer is trained end-to-end with reinforcement learning. We use a variety of simulation environments for our experiments which include random 2D mazes, random mazes in the Doom game engine and a photo-realistic environment in the Unreal game engine. The results on the 2D environments show the effectiveness of the learned policy in an idealistic setting while results on the 3D environments demonstrate the model’s capability of learning the policy and perceptual model jointly from raw-pixel based RGB observations. We also show that a model trained on random textures in the Doom environment generalizes well to a photo-realistic office space environment in the Unreal engine.
Active Rotating Filters
Deep Convolution Neural Networks (DCNNs) are capable of learning unprecedentedly effective image representations. However, their ability in handling significant local and global image rotations remains limited. In this paper, we propose Active Rotating Filters (ARFs) that actively rotate during convolution and produce feature maps with location and orientation explicitly encoded. An ARF acts as a virtual filter bank containing the filter itself and its multiple unmaterialised rotated versions. During back-propagation, an ARF is collectively updated using errors from all its rotated versions. DCNNs using ARFs, referred to as Oriented Response Networks (ORNs), can produce within-class rotation-invariant deep features while maintaining inter-class discrimination for classification tasks. The oriented response produced by ORNs can also be used for image and object orientation estimation tasks. Over multiple state-of-the-art DCNN architectures, such as VGG, ResNet, and STN, we consistently observe that replacing regular filters with the proposed ARFs leads to significant reduction in the number of network parameters and improvement in classification performance. We report the best results on several commonly used benchmarks.
Active Sampler Recent years have witnessed amazing outcomes from ‘Big Models’ trained by ‘Big Data’. Most popular algorithms for model training are iterative. Due to the surging volumes of data, we can usually afford to process only a fraction of the training data in each iteration. Typically, the data are either uniformly sampled or sequentially accessed. In this paper, we study how the data access pattern can affect model training. We propose an Active Sampler algorithm, where training data with more ‘learning value’ to the model are sampled more frequently. The goal is to focus training effort on valuable instances near the classification boundaries, rather than evident cases, noisy data or outliers. We show the correctness and optimality of Active Sampler in theory, and then develop a light-weight vectorized implementation. Active Sampler is orthogonal to most approaches optimizing the efficiency of large-scale data analytics, and can be applied to most analytics models trained by stochastic gradient descent (SGD) algorithm. Extensive experimental evaluations demonstrate that Active Sampler can speed up the training procedure of SVM, feature selection and deep learning, for comparable training quality by 1.6-2.2x.
Active Semi-supervised Transfer Learning
Single-trial classification of event-related potentials in electroencephalogram (EEG) signals is a very important paradigm of brain-computer interface (BCI). Because of individual differences, usually some subject-specific calibration data are required to tailor the classifier for each subject. Transfer learning has been extensively used to reduce such calibration data requirement, by making use of auxiliary data from similar/relevant subjects/tasks. However, all previous research assumes that all auxiliary data have been labeled. This paper considers a more general scenario, in which part of the auxiliary data could be unlabeled. We propose active semi-supervised transfer learning (ASTL) for offline BCI calibration, which integrates active learning, semi-supervised learning, and transfer learning. Using a visual evoked potential oddball task and three different EEG headsets, we demonstrate that ASTL can achieve consistently good performance across subjects and headsets, and it outperforms some state-of-the-art approaches in the literature.
Active Testing Much recent work on visual recognition aims to scale up learning to massive, noisily-annotated datasets. We address the problem of scaling-up the evaluation of such models to large-scale datasets with noisy labels. Current protocols for doing so require a human user to either vet (re-annotate) a small fraction of the test set and ignore the rest, or else correct errors in annotation as they are found through manual inspection of results. In this work, we re-formulate the problem as one of active testing, and examine strategies for efficiently querying a user so as to obtain an accurate performance estimate with minimal vetting. We demonstrate the effectiveness of our proposed active testing framework on estimating two performance metrics, Precision@K and mean Average Precision, for two popular computer vision tasks, multi-label classification and instance segmentation. We further show that our approach is able to save significant human annotation effort and is more robust than alternative evaluation protocols.
ActiveClean Data cleaning is often an important step to ensure that predictive models, such as regression and classification, are not affected by systematic errors such as inconsistent, out-of-date, or outlier data. Identifying dirty data is often a manual and iterative process, and can be challenging on large datasets. However, many data cleaning workflows can introduce subtle biases into the training processes due to violation of independence assumptions. We propose ActiveClean, a progressive cleaning approach where the model is updated incrementally instead of re-training and can guarantee accuracy on partially cleaned data. ActiveClean supports a popular class of models called convex loss models (e.g., linear regression and SVMs). ActiveClean also leverages the structure of a user’s model to prioritize cleaning those records likely to affect the results. We evaluate ActiveClean on five real-world datasets UCI Adult, UCI EEG, MNIST, Dollars For Docs, and WorldBank with both real and synthetic errors. Our results suggest that our proposed optimizations can improve model accuracy by up-to 2.5x for the same amount of data cleaned. Furthermore for a fixed cleaning budget and on all real dirty datasets, ActiveClean returns more accurate models than uniform sampling and Active Learning.
Actuarial Science Actuarial science is the discipline that applies mathematical and statistical methods to assess risk in insurance, finance and other industries and professions. Actuaries are professionals who are qualified in this field through education and experience. In many countries, actuaries must demonstrate their competence by passing a series of rigorous professional examinations. Actuarial science includes a number of interrelated subjects, including probability, mathematics, statistics, finance, economics, financial economics, and computer programming. Historically, actuarial science used deterministic models in the construction of tables and premiums. The science has gone through revolutionary changes during the last 30 years due to the proliferation of high speed computers and the union of stochastic actuarial models with modern financial theory (Frees 1990). Many universities have undergraduate and graduate degree programs in actuarial science. In 2010, a study published by job search website CareerCast ranked actuary as the #1 job in the United States (Needleman 2010). The study used five key criteria to rank jobs: environment, income, employment outlook, physical demands, and stress. A similar study by U.S. News & World Report in 2006 included actuaries among the 25 Best Professions that it expects will be in great demand in the future (Nemko 2006).
AdaDepth Supervised deep learning methods have shown promising results for the task of monocular depth estimation; but acquiring ground truth is costly, and prone to noise as well as inaccuracies. While synthetic datasets have been used to circumvent above problems, the resultant models do not generalize well to natural scenes due to the inherent domain shift. Recent adversarial approaches for domain adaption have performed well in mitigating the differences between the source and target domains. But these methods are mostly limited to a classification setup and do not scale well for fully-convolutional architectures. In this work, we propose AdaDepth – an unsupervised domain adaptation strategy for the pixel-wise regression task of monocular depth estimation. The proposed approach is devoid of above limitations through a) adversarial learning and b) explicit imposition of content consistency on the adapted target representation. Our unsupervised approach performs competitively with other established approaches on depth estimation tasks and achieves state-of-the-art results in a semi-supervised setting.
AdaGAN Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to train and can suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. We propose an iterative procedure, called AdaGAN, where at every step we add a new component into a mixture model by running a GAN algorithm on a reweighted sample. This is inspired by boosting algorithms, where many potentially weak individual predictors are greedily aggregated to form a strong composite predictor. We prove that such an incremental procedure leads to convergence to the true distribution in a finite number of steps if each step is optimal, and convergence at an exponential rate otherwise. We also illustrate experimentally that this procedure addresses the problem of missing modes.
ADAGE The ability to generalize across visual domains is crucial for the robustness of visual recognition systems in the wild. Several works have been dedicated to close the gap between a single labeled source domain and a target domain with transductive access to its data. In this paper we focus on the wider domain generalization task involving multiple sources and seamlessly extending to unsupervised domain adaptation when unlabeled target samples are available at training time. We propose a hybrid architecture that we name ADAGE: it gracefully maps different source data towards an agnostic visual domain through pixel-adaptation based on a novel incremental architecture, and closes the remaining domain gap through feature adaptation. Both the adaptive processes are guided by adversarial learning. Extensive experiments show remarkable improvements compared to the state of the art.
Adapted Increasingly Rarely Markov Chain Monte Carlo
We introduce a class of Adapted Increasingly Rarely Markov Chain Monte Carlo (AirMCMC) algorithms where the underlying Markov kernel is allowed to be changed based on the whole available chain output but only at specific time points separated by an increasing number of iterations. The main motivation is the ease of analysis of such algorithms. Under the assumption of either simultaneous or (weaker) local simultaneous geometric drift condition, or simultaneous polynomial drift we prove the $L_2-$convergence, Weak and Strong Laws of Large Numbers (WLLN, SLLN), Central Limit Theorem (CLT), and discuss how our approach extends the existing results. We argue that many of the known Adaptive MCMC algorithms may be transformed into the corresponding Air versions, and provide an empirical evidence that performance of the Air version stays virtually the same.
AdapterNet Deep neural networks have demonstrated impressive performance in various machine learning tasks. However, they are notoriously sensitive to changes in data distribution. Often, even a slight change in the distribution can lead to drastic performance reduction. Artificially augmenting the data may help to some extent, but in most cases, fails to achieve model invariance to the data distribution. Some examples where this sub-class of domain adaptation can be valuable are various imaging modalities such as thermal imaging, X-ray, ultrasound, and MRI, where changes in acquisition parameters or acquisition device manufacturer will result in different representation of the same input. Our work shows that standard finetuning fails to adapt the model in certain important cases. We propose a novel method of adapting to a new data source, and demonstrate near perfect adaptation on a customized ImageNet benchmark.
ADaPTION Toolbox Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are useful for many practical tasks in machine learning. Synaptic weights, as well as neuron activation functions within the deep network are typically stored with high-precision formats, e.g. 32 bit floating point. However, since storage capacity is limited and each memory access consumes power, both storage capacity and memory access are two crucial factors in these networks. Here we present a method and present the ADaPTION toolbox to extend the popular deep learning library Caffe to support training of deep CNNs with reduced numerical precision of weights and activations using fixed point notation. ADaPTION includes tools to measure the dynamic range of weights and activations. Using the ADaPTION tools, we quantized several CNNs including VGG16 down to 16-bit weights and activations with only 0.8% drop in Top-1 accuracy. The quantization, especially of the activations, leads to increase of up to 50% of sparsity especially in early and intermediate layers, which we exploit to skip multiplications with zero, thus performing faster and computationally cheaper inference.
ADaptive Algorithm for Betweenness via Random Approximation
We present KADABRA, a new algorithm to approximate betweenness centrality in directed and undirected graphs, which significantly outperforms all previous approaches on real-world complex networks. The efficiency of the new algorithm relies on two new theoretical contribution, of independent interest. The first contribution focuses on sampling shortest paths, a subroutine used by most algorithms that approximate betweenness centrality. We show that, on realistic random graph models, we can perform this task in time $|E|^{\frac{1}{2}+o(1)}$ with high probability, obtaining a significant speedup with respect to the $\Theta(|E|)$ worst-case performance. We experimentally show that this new technique achieves similar speedups on real-world complex networks, as well. The second contribution is a new rigorous application of the adaptive sampling technique. This approach decreases the total number of shortest paths that need to be sampled to compute all betweenness centralities with a given absolute error, and it also handles more general problems, such as computing the $k$ most central nodes. Furthermore, our analysis is general, and it might be extended to other settings, as well.
Adaptive BAll COver for Classification
Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments. Moreover, the learning system must be parameterless —traditional tuning methods are problematic in streaming settings— and avoid requiring prior knowledge of the number of distinct class labels occurring in the stream. In this paper, we introduce a new algorithmic approach for nonparametric learning in data streams. Our approach addresses all above mentioned challenges by learning a model that covers the input space using simple local classifiers. The distribution of these classifiers dynamically adapts to the local (unknown) complexity of the classification problem, thus achieving a good balance between model complexity and predictive accuracy. We design four variants of our approach of increasing adaptivity. By means of an extensive empirical evaluation against standard nonparametric baselines, we show state-of-the-art results in terms of accuracy versus model size. For the variant that imposes a strict bound on the model size, we show better performance against all other methods measured at the same model size value. Our empirical analysis is complemented by a theoretical performance guarantee which does not rely on any stochastic assumption on the source generating the stream.
Adaptive Batch Size
Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch size for all epochs, adaptively increases the batch size during the training process. Our method delivers the convergence rate of small batch sizes while achieving performance similar to large batch sizes. We analyse our approach using the standard AlexNet, ResNet, and VGG networks operating on the popular CIFAR-10, CIFAR-100, and ImageNet datasets. Our results demonstrate that learning with adaptive batch sizes can improve performance by factors of up to 6.25 on 4 NVIDIA Tesla P100 GPUs while changing accuracy by less than 1% relative to training with fixed batch sizes.
Adaptive Blending Unit
The most widely used activation functions in current deep feed-forward neural networks are rectified linear units (ReLU), and many alternatives have been successfully applied, as well. However, none of the alternatives have managed to consistently outperform the rest and there is no unified theory connecting properties of the task and network with properties of activation functions for most efficient training. A possible solution is to have the network learn its preferred activation functions. In this work, we introduce Adaptive Blending Units (ABUs), a trainable linear combination of a set of activation functions. Since ABUs learn the shape, as well as the overall scaling of the activation function, we also analyze the effects of adaptive scaling in common activation functions. We experimentally demonstrate advantages of both adaptive scaling and ABUs over common activation functions across a set of systematically varied network specifications. We further show that adaptive scaling works by mitigating covariate shifts during training, and that the observed advantages in performance of ABUs likewise rely largely on the activation function’s ability to adapt over the course of training.
Adaptive Boosting
AdaBoost, short for “Adaptive Boosting”, is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire who won the prestigious “Gödel Prize” in 2003 for their work. It can be used in conjunction with many other types of learning algorithms to improve their performance. The output of the other learning algorithms (‘weak learners’) is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. In some problems, however, it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (i.e., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner. While every learning algorithm will tend to suit some problem types better than others, and will typically have many different parameters and configurations to be adjusted before achieving optimal performance on a dataset, AdaBoost (with decision trees as the weak learners) is often referred to as the best out-of-the-box classifier. When used with decision tree learning, information gathered at each stage of the AdaBoost algorithm about the relative ‘hardness’ of each training sample is fed into the tree growing algorithm such that later trees tend to focus on harder to classify examples.
Adaptive Boosting LSTM with Attention Relation extraction has been widely studied to extract new relational facts from open corpus. Previous relation extraction methods are faced with the problem of wrong labels and noisy data, which substantially decrease the performance of the model. In this paper, we propose an ensemble neural network model – Adaptive Boosting LSTMs with Attention, to more effectively perform relation extraction. Specifically, our model first employs the recursive neural network LSTMs to embed each sentence. Then we import attention into LSTMs by considering that the words in a sentence do not contribute equally to the semantic meaning of the sentence. Next via adaptive boosting, we build strategically several such neural classifiers. By ensembling multiple such LSTM classifiers with adaptive boosting, we could build a more effective and robust joint ensemble neural networks based relation extractor. Experiment results on real dataset demonstrate the superior performance of the proposed model, improving F1-score by about 8% compared to the state-of-the-art models. The code of this work is publicly available on https://…/re.
Adaptive Computation Time
“Universal Transformer”
Adaptive Coordinate Descent Adaptive coordinate descent is an extension of the coordinate descent algorithm to non-separable optimization. The adaptive coordinate descent approach gradually builds a transformation of the coordinate system such that the new coordinates are as decorrelated as possible with respect to the objective function. The adaptive coordinate descent was shown to be competitive to the state-of-the-art evolutionary algorithms and has the following invariance properties:
1. Invariance with respect to monotonous transformations of the function (scaling)
2. Invariance with respect to orthogonal transformations of the search space (rotation).
CMA-like Adaptive Encoding Update mostly based on principal component analysis is used to extend the coordinate descent method to the optimization of non-separable problems. The adaptation of an appropriate coordinate system allows adaptive coordinate descent to outperform coordinate descent on non-separable functions.
Adaptive Density Peak Detection
ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon Rodriguez and Laio’s idea. ADPclust clusters data by finding density peaks in a density-distance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes an user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional ‘density-distance plot’.
Adaptive Feeding
Object detection aims at high speed and accuracy simultaneously. However, fast models are usually less accurate, while accurate models cannot satisfy our need for speed. A fast model can be 10 times faster but 50\% less accurate than an accurate model. In this paper, we propose Adaptive Feeding (AF) to combine a fast (but less accurate) detector and an accurate (but slow) detector, by adaptively determining whether an image is easy or hard and choosing an appropriate detector for it. In practice, we build a cascade of detectors, including the AF classifier which make the easy vs. hard decision and the two detectors. The AF classifier can be tuned to obtain different tradeoff between speed and accuracy, which has negligible training time and requires no additional training data. Experimental results on the PASCAL VOC, MS COCO and Caltech Pedestrian datasets confirm that AF has the ability to achieve comparable speed as the fast detector and comparable accuracy as the accurate one at the same time. As an example, by combining the fast SSD300 with the accurate SSD500 detector, AF leads to 50\% speedup over SSD500 with the same precision on the VOC2007 test set.
Adaptive Generalized PCA
(adaptive gPCA)
When working with large biological data sets, exploratory analysis is an important first step for understanding the latent structure and for generating hypotheses to be tested in subsequent analyses. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results which are unstable and difficult to interpret. To mitigate these problems, we have developed a method which allows the analyst to incorporate side information about the relationships between the variables in a way that encourages similar variables to have similar loadings on the principal axes. This leads to a low-dimensional representation of the samples which both describes the latent structure and which has axes which are interpretable in terms of groups of closely related variables. The method is derived by putting a prior encoding the relationships between the variables on the data and following through the analysis on the posterior distributions of the samples. We show that our method does well at reconstructing true latent structure in simulated data and we also demonstrate the method on a dataset investigating the effects of antibiotics on the composition of bacteria in the human gut.
Adaptive gPCA When working with large biological data sets, exploratory analysis is an important first step for understanding the latent structure and for generating hypotheses to be tested in subsequent analyses. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results which are unstable and difficult to interpret. To mitigate these problems, we have developed a method which allows the analyst to incorporate side information about the relationships between the variables in a way that encourages similar variables to have similar loadings on the principal axes. This leads to a low-dimensional representation of the samples which both describes the latent structure and which has axes which are interpretable in terms of groups of closely related variables. The method is derived by putting a prior encoding the relationships between the variables on the data and following through the analysis on the posterior distributions of the samples. We show that our method does well at reconstructing true latent structure in simulated data and we also demonstrate the method on a dataset investigating the effects of antibiotics on the composition of bacteria in the human gut.
Adaptive Graph Convolutional Neural Network Graph Convolutional Neural Networks (Graph CNNs) are generalizations of classical CNNs to handle graph data such as molecular data, point could and social networks. Current filters in graph CNNs are built for fixed and shared graph structure. However, for most real data, the graph structures varies in both size and connectivity. The paper proposes a generalized and flexible graph CNN taking data of arbitrary graph structure as input. In that way a task-driven adaptive graph is learned for each graph data while training. To efficiently learn the graph, a distance metric learning is proposed. Extensive experiments on nine graph-structured datasets have demonstrated the superior performance improvement on both convergence speed and predictive accuracy.
“Graph Convolutional Neural Network”
Adaptive Huber Regression Big data are often contaminated by outliers and heavy-tailed errors, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our framework is able to handle heavy-tailed data with bounded $(1 \! + \! \delta)$-th moment for any $\delta\!>\!0$. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when $\delta \!\geq\! 1$, the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime $0 \!<\! \delta \!<\! 1$ and the transition is smooth and optimal. Moreover, a nonasymptotic Bahadur representation for finite-sample inference is derived when the variance is finite. Numerical experiments lend further support to our obtained theory.
Adaptive Intelligence Optimizer
Particle Swarm Optimization (PSO) is an Evolutionary Algorithm (EA) that utilizes a swarm of particles to solve an optimization problem. Slow Intelligence System (SIS) is a learning framework which slowly learns the solution to a problem performing a series of operations. Moreover, Learning Automata (LA) are minuscule but effective decision making entities which are best suited to act as a controller component. In this paper, we combine two isolate populations of PSO to forge the Adaptive Intelligence Optimizer (AIO) which harnesses the advantages of a bi-population PSO to escape from the local minimum and avoid premature convergence. Furthermore, using the rich framework of SIS and the nifty control theory that LA derived from, we find the perfect matching between SIS and LA where acting slowly is the pillar of both of them. Both SIS and LA need time to converge to the optimal decision where this enables AIO to outperform standard PSO having an incomparable performance on evolutionary optimization benchmark functions.
Adaptive Learning for Multi-Agent Navigation
In multi-agent navigation, agents need to move towards their goal locations while avoiding collisions with other agents and static obstacles, often without communication with each other. Existing methods compute motions that are optimal locally but do not account for the aggregated motions of all agents, producing inefficient global behavior especially when agents move in a crowded space. In this work, we develop methods to allow agents to dynamically adapt their behavior to their local conditions. We accomplish this by formulating the multi-agent navigation problem as an action-selection problem, and propose an approach, ALAN, that allows agents to compute time-efficient and collision-free motions. ALAN is highly scalable because each agent makes its own decisions on how to move using a set of velocities optimized for a variety of navigation tasks. Experimental results show that the agents using ALAN, in general, reach their destinations faster than using ORCA, a state-of-the-art collision avoidance framework, the Social Forces model for pedestrian navigation, and a Predictive collision avoidance model.
Adaptive Learning Rate
Automatically set learning rate for each neuron in a neural network based on it´s training history.
Adaptive Least Absolute Shrinkage and Screening Operator
“Least Absolute Shrinkage and Screening Operator”
Adaptive Linear Neuron
ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early single-layer artificial neural network and the name of the physical device that implemented this network. The network uses memistors. It was developed by Professor Bernard Widrow and his graduate student Ted Hoff at Stanford University in 1960. It is based on the McCulloch-Pitts neuron. It consists of a weight, a bias and a summation function. The difference between Adaline and the standard (McCulloch-Pitts) perceptron is that in the learning phase the weights are adjusted according to the weighted sum of the inputs (the net). In the standard perceptron, the net is passed to the activation (transfer) function and the function’s output is used for adjusting the weights. There also exists an extension known as Madaline.
Adaptive Memory Network
We present Adaptive Memory Networks (AMN) that processes input-question pairs to dynamically construct a network architecture optimized for lower inference times for Question Answering (QA) tasks. AMN processes the input story to extract entities and stores them in memory banks. Starting from a single bank, as the number of input entities increases, AMN learns to create new banks as the entropy in a single bank becomes too high. Hence, after processing an input-question(s) pair, the resulting network represents a hierarchical structure where entities are stored in different banks, distanced by question relevance. At inference, one or few banks are used, creating a tradeoff between accuracy and performance. AMN is enabled by dynamic networks that allow input dependent network creation and efficiency in dynamic mini-batching as well as our novel bank controller that allows learning discrete decision making with high accuracy. In our results, we demonstrate that AMN learns to create variable depth networks depending on task complexity and reduces inference times for QA tasks.
Adaptive Mixture Discriminant Analysis
In supervised learning, an important issue usually not taken into account by classical methods is the possibility of having in the test set individuals belonging to a class which has not been observed during the learning phase. Classical supervised algorithms will automatically label such observations as belonging to one of the known classes in the training set and will not be able to detect new classes. This work introduces a model-based discriminant analysis method, called adaptive mixture discriminant analysis (AMDA), which is able to detect several unobserved groups of points and to adapt the learned classifier to the new situation. Two EM-based procedures are proposed for parameter estimation and model selection criteria are used for selecting the actual number of classes. Experiments on artificial and real data demonstrate the ability of the proposed method to deal with complex and real word problems. The proposed approach is also applied to the detection of unobserved communities in social network analysis.
Adaptive Moving Average
The Adaptive Moving Average (AMA) study is similar to the exponential moving average (EMA), except the AMA uses a scalable constant instead of a fixed constant for smoothing the data.
Adaptive Moving Self-organizing Map
Self-Organizing Map (SOM) is a neural network model which is used to obtain a topology-preserving mapping from the (usually high dimensional) input/feature space to an output/map space of fewer dimensions (usually two or three in order to facilitate visualization). Neurons in the output space are connected with each other but this structure remains fixed throughout training and learning is achieved through the updating of neuron reference vectors in feature space. Despite the fact that growing variants of SOM overcome the fixed structure limitation they increase computational cost and also do not allow the removal of a neuron after its introduction. In this paper, a variant of SOM is proposed called AMSOM (Adaptive Moving Self-Organizing Map) that on the one hand creates a more flexible structure where neuron positions are dynamically altered during training and on the other hand tackles the drawback of having a predefined grid by allowing neuron addition and/or removal during training. Experiments using multiple literature datasets show that the proposed method improves training performance of SOM, leads to a better visualization of the input dataset and provides a framework for determining the optimal number and structure of neurons.
Adaptive Neural Tree
Deep neural networks and decision trees operate on largely separate paradigms; typically, the former performs representation learning with pre-specified architectures, while the latter is characterised by learning hierarchies over pre-specified features with data-driven architectures. We unite the two via adaptive neural trees (ANTs), a model that incorporates representation learning into edges, routing functions and leaf nodes of a decision tree, along with a backpropagation-based training algorithm that adaptively grows the architecture from primitive modules (e.g., convolutional layers). We demonstrate that, whilst achieving over 99% and 90% accuracy on MNIST and CIFAR-10 datasets, ANTs benefit from (i) faster inference via conditional computation, (ii) increased interpretability via hierarchical clustering e.g. learning meaningful class associations, such as separating natural vs. man-made objects, and (iii) a mechanism to adapt the architecture to the size and complexity of the training dataset.
Adaptive Nonparametric Clustering “Adaptive Weights Clustering”
Adaptive p-value Thresholding
We consider the problem of multiple hypothesis testing with generic side information: for each hypothesis $H_i$ we observe both a p-value $p_i$ and some predictor $x_i$ encoding contextual information about the hypothesis. For large-scale problems, adaptively focusing power on the more promising hypotheses (those more likely to yield discoveries) can lead to much more powerful multiple testing procedures. We propose a general iterative framework for this problem, called the Adaptive p-value Thresholding (AdaPT) procedure, which adaptively estimates a Bayes-optimal p-value rejection threshold and controls the false discovery rate (FDR) in finite samples. At each iteration of the procedure, the analyst proposes a rejection threshold and observes partially censored p-values, estimates the false discovery proportion (FDP) below the threshold, and either stops to reject or proposes another threshold, until the estimated FDP is below $\alpha$. Our procedure is adaptive in an unusually strong sense, permitting the analyst to use any statistical or machine learning method she chooses to estimate the optimal threshold, and to switch between different models at each iteration as information accrues.
Adaptive Quantile Sparse Image
Inverse problems play a central role for many classical computer vision and image processing tasks. A key challenge in solving an inverse problem is to find an appropriate prior to convert an ill-posed problem into a well-posed task. Many of the existing priors, like total variation, are based on ad-hoc assumptions that have difficulties to represent the actual distribution of natural images. In this work, we propose the Adaptive Quantile Sparse Image (AQuaSI) prior. It is based on a quantile filter, can be used as a joint filter on guidance data, and be readily plugged into a wide range of numerical optimization algorithms. We demonstrate the efficacy of the proposed prior in joint RGB/depth upsampling, on RGB/NIR image restoration, and in a comparison with related regularization by denoising approaches.
Adaptive Resonance Theory
Adaptive resonance theory (ART) is a theory developed by Stephen Grossberg and Gail Carpenter on aspects of how the brain processes information. It describes a number of neural network models which use supervised and unsupervised learning methods, and address problems such as pattern recognition and prediction.
Adaptive Robust Control In this paper we propose a new methodology for solving an uncertain stochastic Markovian control problem in discrete time. We call the proposed methodology the adaptive robust control. We demonstrate that the uncertain control problem under consideration can be solved in terms of associated adaptive robust Bellman equation. The success of our approach is to the great extend owed to the recursive methodology for construction of relevant confidence regions. We illustrate our methodology by considering an optimal portfolio allocation problem, and we compare results obtained using the adaptive robust control method with some other existing methods.
Adaptive Scaling This paper focuses on detection tasks in information extraction, where positive instances are sparsely distributed and models are usually evaluated using F-measure on positive classes. These characteristics often result in deficient performance of neural network based detection models. In this paper, we propose adaptive scaling, an algorithm which can handle the positive sparsity problem and directly optimize over F-measure via dynamic cost-sensitive learning. To this end, we borrow the idea of marginal utility from economics and propose a theoretical framework for instance importance measuring without introducing any additional hyper-parameters. Experiments show that our algorithm leads to a more effective and stable training of neural network based detection models.
Adaptive Skip Interval
We introduce a method which enables a recurrent dynamics model to be temporally abstract. Our approach, which we call Adaptive Skip Intervals (ASI), is based on the observation that in many sequential prediction tasks, the exact time at which events occur is irrelevant to the underlying objective. Moreover, in many situations, there exist prediction intervals which result in particularly easy-to-predict transitions. We show that there are prediction tasks for which we gain both computational efficiency and prediction accuracy by allowing the model to make predictions at a sampling rate which it can choose itself.
Adaptive Software http://…/Adaptive_software_development
Adaptive SVM+ Incorporating additional knowledge in the learning process can be beneficial for several computer vision and machine learning tasks. Whether privileged information originates from a source domain that is adapted to a target domain, or as additional features available at training time only, using such privileged (i.e., auxiliary) information is of high importance as it improves the recognition performance and generalization. However, both primary and privileged information are rarely derived from the same distribution, which poses an additional challenge to the recognition task. To address these challenges, we present a novel learning paradigm that leverages privileged information in a domain adaptation setup to perform visual recognition tasks. The proposed framework, named Adaptive SVM+, combines the advantages of both the learning using privileged information (LUPI) paradigm and the domain adaptation framework, which are naturally embedded in the objective function of a regular SVM. We demonstrate the effectiveness of our approach on the publicly available Animals with Attributes and INTERACT datasets and report state-of-the-art results in both of them.
Adaptive System The term adaptation is used in biology in relation to how living beings adapt to their environments, but with two different meanings. First, the continuous adaptation of an organism to its environment, so as to maintain itself in a viable state, through sensory feedback mechanisms. Second, the development (through evolutionary steps) of an adaptation (an anatomic structure, physiological process or behavior characteristic) that increases the probability of an organism reproducing itself (although sometimes not directly). Generally speaking, an adaptive system is a set of interacting or interdependent entities, real or abstract, forming an integrated whole that together are able to respond to environmental changes or changes in the interacting parts. Feedback loops represent a key feature of adaptive systems, allowing the response to changes; examples of adaptive systems include: natural ecosystems, individual organisms, human communities, human organizations, and human families. Some artificial systems can be adaptive as well; for instance, robots employ control systems that utilize feedback loops to sense new conditions in their environment and adapt accordingly.
Adaptive Thouless-Anderson-Palmer Mean Field Approach
We develop an advanced mean field method for approximating averages in probabilistic data models that is based on the TAP approach of disorder physics. In contrast to conventional TAP, where the knowledge of the distribution of couplings between the random variables is required, our method adapts to the concrete couplings. We demonstrate the validity of our approach, which is sofar restricted to models with non-glassy behaviour, by replica calculations for a wide class of models as well as by simulations for a real data set.
Adaptive Weights Clustering
This paper presents a new approach to non-parametric cluster analysis called Adaptive Weights Clustering (AWC). The idea is to identify the clustering structure by checking at different points and for different scales on departure from local homogeneity. The proposed procedure describes the clustering structure in terms of weights \( w_{ij} \) each of them measures the degree of local inhomogeneity for two neighbor local clusters using statistical tests of ‘no gap’ between them. % The procedure starts from very local scale, then the parameter of locality grows by some factor at each step. The method is fully adaptive and does not require to specify the number of clusters or their structure. The clustering results are not sensitive to noise and outliers, the procedure is able to recover different clusters with sharp edges or manifold structure. The method is scalable and computationally feasible. An intensive numerical study shows a state-of-the-art performance of the method in various artificial examples and applications to text data. Our theoretical study states optimal sensitivity of AWC to local inhomogeneity.
Adaptive Window-based Streaming Edge Partitioning
In recent years, the graph partitioning problem gained importance as a mandatory preprocessing step for distributed graph processing on very large graphs. Existing graph partitioning algorithms minimize partitioning latency by assigning individual graph edges to partitions in a streaming manner — at the cost of reduced partitioning quality. However, we argue that the mere minimization of partitioning latency is not the optimal design choice in terms of minimizing total graph analysis latency, i.e., the sum of partitioning and processing latency. Instead, for complex and long-running graph processing algorithms that run on very large graphs, it is beneficial to invest more time into graph partitioning to reach a higher partitioning quality — which drastically reduces graph processing latency. In this paper, we propose ADWISE, a novel window-based streaming partitioning algorithm that increases the partitioning quality by always choosing the best edge from a set of edges for assignment to a partition. In doing so, ADWISE controls the partitioning latency by adapting the window size dynamically at run-time. Our evaluations show that ADWISE can reach the sweet spot between graph partitioning latency and graph processing latency, reducing the total latency of partitioning plus processing by up to 23-47 percent compared to the state-of-the-art.
Adaptive-Dynamic PCA mvMonitoring
Adaptively Transforming Graph Matching
Recently, many graph matching methods that incorporate pairwise constraint and that can be formulated as a quadratic assignment problem (QAP) have been proposed. Although these methods demonstrate promising results for the graph matching problem, they have high complexity in space or time. In this paper, we introduce an adaptively transforming graph matching (ATGM) method from the perspective of functional representation. More precisely, under a transformation formulation, we aim to match two graphs by minimizing the discrepancy between the original graph and the transformed graph. With a linear representation map of the transformation, the pairwise edge attributes of graphs are explicitly represented by unary node attributes, which enables us to reduce the space and time complexity significantly. Due to an efficient Frank-Wolfe method-based optimization strategy, we can handle graphs with hundreds and thousands of nodes within an acceptable amount of time. Meanwhile, because transformation map can preserve graph structures, a domain adaptation-based strategy is proposed to remove the outliers. The experimental results demonstrate that our proposed method outperforms the state-of-the-art graph matching algorithms.
Adaptive-Miner Extraction of valuable data from extensive datasets is a standout amongst the most vital exploration issues. Association rule mining is one of the highly used methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent itemsets) is most crucial part of the association rule mining task. Many single-machine based association rule mining algorithms exist but the massive amount of data available these days is above the capacity of a single machine based algorithm. Therefore, to meet the demands of this ever-growing enormous data, there is a need for distributed association rule mining algorithm which can run on multiple machines. For these types of parallel/distributed applications, MapReduce is one of the best fault-tolerant frameworks. Hadoop is one of the most popular open-source software frameworks with MapReduce based approach for distributed storage and processing of large datasets using standalone clusters built from commodity hardware. But heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce based platforms are being developed for parallel computing in recent years. Among them, a platform, namely, Spark have attracted a lot of attention because of its inbuilt support to distributed computations. Therefore, we implemented a distributed association rule mining algorithm on Spark named as Adaptive-Miner which uses adaptive approach for finding frequent patterns with higher accuracy and efficiency. Adaptive-Miner uses an adaptive strategy based on the partial processing of datasets. Adaptive-Miner makes execution plans before every iteration and goes with the best suitable plan to minimize time and space complexity. Adpative-Miner is a dynamic association rule mining algorithm which change its approach based on the nature of dataset. Therefore, it is different and better than state-of-the-art static association rule mining algorithms. We conduct in-depth experiments to gain insight into the effectiveness, efficiency, and scalability of the Adaptive-Miner algorithm on Spark. Available: https ://githu ysing hrath i/ Adapt ive-Miner
adaQN Recurrent Neural Networks (RNNs) are powerful models that achieve unparalleled performance on several pattern recognition problems. However, training of RNNs is a computationally difficult task owing to the well-known ‘vanishing/exploding’ gradient problems. In recent years, several algorithms have been proposed for training RNNs. These algorithms either: exploit no (or limited) curvature information and have cheap per-iteration complexity; or attempt to gain significant curvature information at the cost of increased per-iteration cost. The former set includes diagonally-scaled first-order methods such as ADAM and ADAGRAD while the latter consists of second-order algorithms like Hessian-Free Newton and K-FAC. In this paper, we present an novel stochastic quasi-Newton algorithm (adaQN) for training RNNs. Our approach retains a low per-iteration cost while allowing for non-diagonal scaling through a stochastic L-BFGS updating scheme. The method is judicious in storing and retaining L-BFGS curvature pairs which is indirectly used as a means of controlling the quality of the steps. We present numerical experiments on two language modeling tasks and show that adaQN performs at par, if not better, than popular RNN training algorithms. These results suggest that quasi-Newton algorithms have the potential to be a viable alternative to first- and second-order methods for training RNNs.
ADATE Deep learning and deep architectures are emerging as the best machine learning methods so far in many practical applications such as reducing the dimensionality of data, image classification, speech recognition or object segmentation. In fact, many leading technology companies such as Google, Microsoft or IBM are researching and using deep architectures in their systems to replace other traditional models. Therefore, improving the performance of these models could make a strong impact in the area of machine learning. However, deep learning is a very fast-growing research domain with many core methodologies and paradigms just discovered over the last few years. This thesis will first serve as a short summary of deep learning, which tries to include all of the most important ideas in this research area. Based on this knowledge, we suggested, and conducted some experiments to investigate the possibility of improving the deep learning based on automatic programming (ADATE). Although our experiments did produce good results, there are still many more possibilities that we could not try due to limited time as well as some limitations of the current ADATE version. I hope that this thesis can promote future work on this topic, especially when the next version of ADATE comes out. This thesis also includes a short analysis of the power of ADATE system, which could be useful for other researchers who want to know what it is capable of.
Additive Latent Effect Model
The past decade has seen a growth in the development and deployment of educational technologies for assisting college-going students in choosing majors, selecting courses and acquiring feedback based on past academic performance. Grade prediction methods seek to estimate a grade that a student may achieve in a course that she may take in the future (e.g., next term). Accurate and timely prediction of students’ academic grades is important for developing effective degree planners and early warning systems, and ultimately improving educational outcomes. Existing grade pre- diction methods mostly focus on modeling the knowledge components associated with each course and student, and often overlook other factors such as the difficulty of each knowledge component, course instructors, student interest, capabilities and effort. In this paper, we propose additive latent effect models that incorporate these factors to predict the student next-term grades. Specifically, the proposed models take into account four factors: (i) student’s academic level, (ii) course instructors, (iii) student global latent factor, and (iv) latent knowledge factors. We compared the new models with several state-of-the-art methods on students of various characteristics (e.g., whether a student transferred in or not). The experimental results demonstrate that the proposed methods significantly outperform the baselines on grade prediction problem. Moreover, we perform a thorough analysis on the importance of different factors and how these factors can practically assist students in course selection, and finally improve their academic performance.
Additive Noise Models
The core idea of these so-called Additive Noise Models (ANM) is that if X->Y, then the variability observed in Y will be either explained by X, or by some noise that is independent of X.
Additive Polynomial Design Matrix
An implementation of the additive polynomial (AP) design matrix. It constructs and appends an AP design matrix to a data frame for use with longitudinal data subject to seasonality.
Additive Principal Components
Additive principal components are a nonlinear generalization of linear principal components.
Additive Smoothing In statistics, additive smoothing, also called Laplace smoothing (not to be confused with Laplacian smoothing), or Lidstone smoothing, is a technique used to smooth categorical data.
ADef Algorithm While deep neural networks have proven to be a powerful tool for many recognition and classification tasks, their stability properties are still not well understood. In the past, image classifiers have been shown to be vulnerable to so-called adversarial attacks, which are created by additively perturbing the correctly classified image. In this paper, we propose the ADef algorithm to construct a different kind of adversarial attack created by iteratively applying small deformations to the image, found through a gradient descent step. We demonstrate our results on MNIST with a convolutional neural network and on ImageNet with Inception-v3 and ResNet-101.
Adjacency Diagram The adjacency diagram is a space-filling variant of the node-link diagram; rather than drawing a link between parent and child in the hierarchy, nodes are drawn as solid areas (either arcs or bars), and their placement relative to adjacent nodes reveals their position in the hierarchy. The icicle layout in figure 4D is similar to the first node-link diagram in that the root node appears at the top, with child nodes underneath. Because the nodes are now space-filling, however, we can use a length encoding for the size of software classes and packages. This reveals an additional dimension that would be difficult to show in a node-link diagram.
Adjacency Matrix In mathematics and computer science, an adjacency matrix is a means of representing which vertices (or nodes) of a graph are adjacent to which other vertices. Another matrix representation for a graph is the incidence matrix.
Admixture Models The core of the admixture model, proposed by Smith (1953) to deal with heterogeneous traits in genetic linkage analysis, is the hypothesis that a fraction lambda of pedigrees is linked to the marker locus being tested while a fraction 1 – lambda is unlinked. This model leads naturally to the vexing problem of testing for a mixture and has given rise to a large literature.
Adometry Adometry by Google solves the complex challenge of integrating, measuring, and optimizing marketing data across all channels – both online and offline – so you can generate actionable insights that improve ROI.
Advanced Analysis of Variance
Book: Advanced Analysis of Variance
Advanced Analytics There is an increasing use of the term advanced analytics, typically used to describe the technical aspects of analytics, especially predictive modeling, machine learning techniques, and neural networks.
Advanced LSTM
Long short-term memory (LSTM) is normally used in recurrent neural network (RNN) as basic recurrent unit. However,conventional LSTM assumes that the state at current time step depends on previous time step. This assumption constraints the time dependency modeling capability. In this study, we propose a new variation of LSTM, advanced LSTM (A-LSTM), for better temporal context modeling. We employ A-LSTM in weighted pooling RNN for emotion recognition. The A-LSTM outperforms the conventional LSTM by 5.5% relatively. The A-LSTM based weighted pooling RNN can also complement the state-of-the-art emotion classification framework. This shows the advantage of A-LSTM.
Advanced Supervised Principal Component Analysis
We present a straightforward non-iterative method for shallowing of deep Convolutional Neural Network (CNN) by combination of several layers of CNNs with Advanced Supervised Principal Component Analysis (ASPCA) of their outputs. We tested this new method on a practically important case of `friend-or-foe’ face recognition. This is the backyard dog problem: the dog should (i) distinguish the members of the family from possible strangers and (ii) identify the members of the family. Our experiments revealed that the method is capable of drastically reducing the depth of deep learning CNNs, albeit at the cost of mild performance deterioration.
AdvEntuRe We consider the problem of learning textual entailment models with limited supervision (5K-10K training examples), and present two complementary approaches for it. First, we propose knowledge-guided adversarial example generators for incorporating large lexical resources in entailment models via only a handful of rule templates. Second, to make the entailment model – a discriminator – more robust, we propose the first GAN-style approach for training it using a natural language example generator that iteratively adjusts based on the discriminator’s performance. We demonstrate effectiveness using two entailment datasets, where the proposed methods increase accuracy by 4.7% on SciTail and by 2.8% on a 1% training sub-sample of SNLI. Notably, even a single hand-written rule, negate, improves the accuracy on the negation examples in SNLI by 6.1%.
Adversarial Clustering Nowadays more and more data are gathered for detecting and preventing cyber attacks. In cyber security applications, data analytics techniques have to deal with active adversaries that try to deceive the data analytics models and avoid being detected. The existence of such adversarial behavior motivates the development of robust and resilient adversarial learning techniques for various tasks. Most of the previous work focused on adversarial classification techniques, which assumed the existence of a reasonably large amount of carefully labeled data instances. However, in practice, labeling the data instances often requires costly and time-consuming human expertise and becomes a significant bottleneck. Meanwhile, a large number of unlabeled instances can also be used to understand the adversaries’ behavior. To address the above mentioned challenges, in this paper, we develop a novel grid based adversarial clustering algorithm. Our adversarial clustering algorithm is able to identify the core normal regions, and to draw defensive walls around the centers of the normal objects utilizing game theoretic ideas. Our algorithm also identifies sub-clusters of attack objects, the overlapping areas within clusters, and outliers which may be potential anomalies.
Adversarial Complementary Learning
In this work, we propose Adversarial Complementary Learning (ACoL) to automatically localize integral objects of semantic interest with weak supervision. We first mathematically prove that class localization maps can be obtained by directly selecting the class-specific feature maps of the last convolutional layer, which paves a simple way to identify object regions. We then present a simple network architecture including two parallel-classifiers for object localization. Specifically, we leverage one classification branch to dynamically localize some discriminative object regions during the forward pass. Although it is usually responsive to sparse parts of the target objects, this classifier can drive the counterpart classifier to discover new and complementary object regions by erasing its discovered regions from the feature maps. With such an adversarial learning, the two parallel-classifiers are forced to leverage complementary object regions for classification and can finally generate integral object localization together. The merits of ACoL are mainly two-fold: 1) it can be trained in an end-to-end manner; 2) dynamically erasing enables the counterpart classifier to discover complementary object regions more effectively. We demonstrate the superiority of our ACoL approach in a variety of experiments. In particular, the Top-1 localization error rate on the ILSVRC dataset is 45.14%, which is the new state-of-the-art.
Adversarial Constraint Learning Constraint-based learning reduces the burden of collecting labels by having users specify general properties of structured outputs, such as constraints imposed by physical laws. We propose a novel framework for simultaneously learning these constraints and using them for supervision, bypassing the difficulty of using domain expertise to manually specify constraints. Learning requires a black-box simulator of structured outputs, which generates valid labels, but need not model their corresponding inputs or the input-label relationship. At training time, we constrain the model to produce outputs that cannot be distinguished from simulated labels by adversarial training. Providing our framework with a small number of labeled inputs gives rise to a new semi-supervised structured prediction model; we evaluate this model on multiple tasks — tracking, pose estimation and time series prediction — and find that it achieves high accuracy with only a small number of labeled inputs. In some cases, no labels are required at all.
Adversarial Contrastive Estimation Learning by contrasting positive and negative samples is a general strategy adopted by many methods. Noise contrastive estimation (NCE) for word embeddings and translating embeddings for knowledge graphs are examples in NLP employing this approach. In this work, we view contrastive learning as an abstraction of all such methods and augment the negative sampler into a mixture distribution containing an adversarially learned sampler. The resulting adaptive sampler finds harder negative examples, which forces the main model to learn a better representation of the data. We evaluate our proposal on learning word embeddings, order embeddings and knowledge graph embeddings and observe both faster convergence and improved results on multiple metrics.
Adversarial Data Programming
Paucity of large curated hand-labeled training data for every domain-of-interest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label has given a set of weak labeling functions. We validated our method on the MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many state-of-the-art models. We conducted extensive experiments to study its usefulness, as well as showed how the proposed ADP framework can be used for transfer learning as well as multi-task learning, where data from two domains are generated simultaneously using the framework along with the label information. Our future work will involve understanding the theoretical implications of this new framework from a game-theoretic perspective, as well as explore the performance of the method on more complex datasets.
Adversarial Eliminating With GAN
Although Neural networks could achieve state-of-the-art performance while recongnizing images, they often suffer a tremendous defeat from adversarial examples–inputs generated by utilizing imperceptible but intentional perturbations to samples from the datasets. How to defense against adversarial examples is an important problem which is well worth to research. So far, only two well-known methods adversarial training and defensive distillation have provided a significant defense. In contrast to existing methods mainly based on model itself, we address the problem purely based on the adversarial examples itself. In this paper, a novel idea and the first framework based Generative Adversarial Nets named AE-GAN capable of resisting adversarial examples are proposed. Extensive experiments on benchmark datasets indicate that AE-GAN is able to defense against adversarial examples effectively.
Adversarial Feature Learning
The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing generators learn to ‘linearize semantics’ in the latent space of such models. Intuitively, such latent spaces may serve as useful feature representations for auxiliary problems where semantics are relevant. However, in their existing form, GANs have no means of learning the inverse mapping — projecting data back into the latent space. We propose Bidirectional Generative Adversarial Networks (BiGANs) as a means of learning this inverse mapping, and demonstrate that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and self-supervised feature learning.
Adversarial Generalized Method of Moments We provide an approach for learning deep neural net representations of models described via conditional moment restrictions. Conditional moment restrictions are widely used, as they are the language by which social scientists describe the assumptions they make to enable causal inference. We formulate the problem of estimating the underling model as a zero-sum game between a modeler and an adversary and apply adversarial training. Our approach is similar in nature to Generative Adversarial Networks (GAN), though here the modeler is learning a representation of a function that satisfies a continuum of moment conditions and the adversary is identifying violating moments. We outline ways of constructing effective adversaries in practice, including kernels centered by k-means clustering, and random forests. We examine the practical performance of our approach in the setting of non-parametric instrumental variable regression.
Adversarial Generator-Encoder Networks We present a new autoencoder-type architecture, that is trainable in an unsupervised mode, sustains both generation and inference, and has the quality of conditional and unconditional samples boosted by adversarial learning. Unlike previous hybrids of autoencoders and adversarial networks, the adversarial game in our approach is set up directly between the encoder and the generator, and no external mappings are trained in the process of learning. The game objective compares the divergences of each of the real and the generated data distributions with the canonical distribution in the latent space. We show that direct generator-vs-encoder game leads to a tight coupling of the two components, resulting in samples and reconstructions of a comparable quality to some recently-proposed more complex architectures.
Adversarial Hashing Network
As the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. Deep-networks-based cross-modal hashing methods are appealing as they can integrate feature learning and hash coding into end-to-end trainable frameworks. However, it is still challenging to find content similarities between different modalities of data due to the heterogeneity gap. To further address this problem, we propose an adversarial hashing network with attention mechanism to enhance the measurement of content similarities by selectively focusing on informative parts of multi-modal data. The proposed new adversarial network, HashGAN, consists of three building blocks: 1) the feature learning module to obtain feature representations, 2) the generative attention module to generate an attention mask, which is used to obtain the attended (foreground) and the unattended (background) feature representations, 3) the discriminative hash coding module to learn hash functions that preserve the similarities between different modalities. In our framework, the generative module and the discriminative module are trained in an adversarial way: the generator is learned to make the discriminator cannot preserve the similarities of multi-modal data w.r.t. the background feature representations, while the discriminator aims to preserve the similarities of multi-modal data w.r.t. both the foreground and the background feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed HashGAN brings substantial improvements over other state-of-the-art cross-modal hashing methods.
Adversarial Label Learning We consider the task of training classifiers without labels. We propose a weakly supervised method—adversarial label learning—that trains classifiers to perform well against an adversary that chooses labels for training data. The weak supervision constrains what labels the adversary can choose. The method therefore minimizes an upper bound of the classifier’s error rate using projected primal-dual subgradient descent. Minimizing this bound protects against bias and dependencies in the weak supervision. Experiments on three real datasets show that our method can train without labels and outperforms other approaches for weakly supervised learning.
Adversarial Machine Learning
Adversarial machine learning is the formal name for studying what happens when conceding even a slightly more realistic alternative to assumptions of these types (harmlessly called ‘relaxing assumptions’ ….
Adversarial Machine Learning
ADversarial Meta-Learner
Meta-learning enables a model to learn from very limited data to undertake a new task. In this paper, we study the general meta-learning with adversarial samples. We present a meta-learning algorithm, ADML (ADversarial Meta-Learner), which leverages clean and adversarial samples to optimize the initialization of a learning model in an adversarial manner. ADML leads to the following desirable properties: 1) it turns out to be very effective even in the cases with only clean samples; 2) it is model-agnostic, i.e., it is compatible with any learning model that can be trained with gradient descent; and most importantly, 3) it is robust to adversarial samples, i.e., unlike other meta-learning methods, it only leads to a minor performance degradation when there are adversarial samples. We show via extensive experiments that ADML delivers the state-of-the-art performance on two widely-used image datasets, MiniImageNet and CIFAR100, in terms of both accuracy and robustness.
Adversarial Metric Learning
In the past decades, intensive efforts have been put to design various loss functions and metric forms for metric learning problem. These improvements have shown promising results when the test data is similar to the training data. However, the trained models often fail to produce reliable distances on the ambiguous test pairs due to the distribution bias between training set and test set. To address this problem, the Adversarial Metric Learning (AML) is proposed in this paper, which automatically generates adversarial pairs to remedy the distribution bias and facilitate robust metric learning. Specifically, AML consists of two adversarial stages, i.e. confusion and distinguishment. In confusion stage, the ambiguous but critical adversarial data pairs are adaptively generated to mislead the learned metric. In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both the adversarial pairs and the original training pairs. Thanks to the challenges posed by the confusion stage in such competing process, the AML model is able to grasp plentiful difficult knowledge that has not been contained by the original training pairs, so the discriminability of AML can be significantly improved. The entire model is formulated into optimization framework, of which the global convergence is theoretically proved. The experimental results on toy data and practical datasets clearly demonstrate the superiority of AML to the representative state-of-the-art metric learning methodologies.
Adversarial Negative Sampling In recent years, the Word2Vec model trained with the Negative Sampling loss function has shown state-of-the-art results in a number of machine learning tasks, including language modeling tasks, such as word analogy and word similarity, and in recommendation tasks, through Prod2Vec, an extension that applies to modeling user shopping activity and user preferences. Several methods that aim to improve upon the standard Negative Sampling loss have been proposed. In our paper we pursue more sophisticated Negative Sampling, by leveraging ideas from the field of Generative Adversarial Networks (GANs), and propose Adversarial Negative Sampling. We build upon the recent progress made in stabilizing the training objective of GANs in the discrete data setting, and introduce a new GAN-Word2Vec model.We evaluate our model on the task of basket completion, and show significant improvements in performance over Word2Vec trained using standard loss functions, including Noise Contrastive Estimation and Negative Sampling.
Adversarial Network Embedding Learning low-dimensional representations of networks has proved effective in a variety of tasks such as node classification, link prediction and network visualization. Existing methods can effectively encode different structural properties into the representations, such as neighborhood connectivity patterns, global structural role similarities and other high-order proximities. However, except for objectives to capture network structural properties, most of them suffer from lack of additional constraints for enhancing the robustness of representations. In this paper, we aim to exploit the strengths of generative adversarial networks in capturing latent features, and investigate its contribution in learning stable and robust graph representations. Specifically, we propose an Adversarial Network Embedding (ANE) framework, which leverages the adversarial learning principle to regularize the representation learning. It consists of two components, i.e., a structure preserving component and an adversarial learning component. The former component aims to capture network structural properties, while the latter contributes to learning robust representations by matching the posterior distribution of the latent representations to given priors. As shown by the empirical results, our method is competitive with or superior to state-of-the-art approaches on benchmark network embedding tasks.
Adversarial Noise Layer
In this paper, we introduce a novel regularization method called Adversarial Noise Layer (ANL), which significantly improve the CNN’s generalization ability by adding adversarial noise in the hidden layers. ANL is easy to implement and can be integrated with most of the CNN-based models. We compared the impact of the different type of noise and visually demonstrate that adversarial noise guide CNNs to learn to extract cleaner feature maps, further reducing the risk of over-fitting. We also conclude that the model trained with ANL is more robust to FGSM and IFGSM attack. Code is available at: https://…/ANL
Adversarial Personalized Ranking
Item recommendation is a personalized ranking task. To this end, many recommender systems optimize models with pairwise ranking objectives, such as the Bayesian Personalized Ranking (BPR). Using matrix Factorization (MF) — the most widely used model in recommendation — as a demonstration, we show that optimizing it with BPR leads to a recommender model that is not robust. In particular, we find that the resultant model is highly vulnerable to adversarial perturbations on its model parameters, which implies the possibly large error in generalization. To enhance the robustness of a recommender model and thus improve its generalization performance, we propose a new optimization framework, namely Adversarial Personalized Ranking (APR). In short, our APR enhances the pairwise ranking method BPR by performing adversarial training. It can be interpreted as playing a minimax game, where the minimization of the BPR objective function meanwhile defends an adversary, which adds adversarial perturbations on model parameters to maximize the BPR objective function. To illustrate how it works, we implement APR on MF by adding adversarial perturbations on the embedding vectors of users and items. Extensive experiments on three public real-world datasets demonstrate the effectiveness of APR — by optimizing MF with APR, it outperforms BPR with a relative improvement of 11.2% on average and achieves state-of-the-art performance for item recommendation. Our implementation is available at: https://…/adversarial_personalized_ranking.
Adversarial REward Learning
Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem. Different from captions, stories have more expressive language styles and contain many imaginary concepts that do not appear in the images. Thus it poses challenges to behavioral cloning algorithms. Furthermore, due to the limitations of automatic metrics on evaluating story quality, reinforcement learning methods with hand-crafted rewards also face difficulties in gaining an overall performance boost. Therefore, we propose an Adversarial REward Learning (AREL) framework to learn an implicit reward function from human demonstrations, and then optimize policy search with the learned reward function. Though automatic evaluation indicates slight performance boost over state-of-the-art (SOTA) methods in cloning expert behaviors, human evaluation shows that our approach achieves significant improvement in generating more human-like stories than SOTA systems.
Adversarial Robustness Toolbox
Adversarial examples have become an indisputable threat to the security of modern AI systems based on deep neural networks (DNNs). The Adversarial Robustness Toolbox (ART) is a Python library designed to support researchers and developers in creating novel defence techniques, as well as in deploying practical defences of real-world AI systems. Researchers can use ART to benchmark novel defences against the state-of-the-art. For developers, the library provides interfaces which support the composition of comprehensive defence systems using individual methods as building blocks. The Adversarial Robustness Toolbox supports machine learning models (and deep neural networks (DNNs) specifically) implemented in any of the most popular deep learning frameworks (TensorFlow, Keras, PyTorch). Currently, the library is primarily intended to improve the adversarial robustness of visual recognition systems, however, future releases that will comprise adaptations to other data modes (such as speech, text or time series) are envisioned. The ART source code is released (https://…/adversarial-robustness-toolbox ) under an MIT license. The release includes code examples and extensive documentation ( ) to help researchers and developers get quickly started.
Adversarial Transformation Networks Multiple different approaches of generating adversarial examples have been proposed to attack deep neural networks. These approaches involve either directly computing gradients with respect to the image pixels, or directly solving an optimization on the image pixels. In this work, we present a fundamentally new method for generating adversarial examples that is fast to execute and provides exceptional diversity of output. We efficiently train feed-forward neural networks in a self-supervised manner to generate adversarial examples against a target network or set of networks. We call such a network an Adversarial Transformation Network (ATN). ATNs are trained to generate adversarial examples that minimally modify the classifier’s outputs given the original input, while constraining the new classification to match an adversarial target class. We present methods to train ATNs and analyze their effectiveness targeting a variety of MNIST classifiers as well as the latest state-of-the-art ImageNet classifier Inception ResNet v2.
Adversarially Learned Mixture Model
The Adversarially Learned Mixture Model (AMM) is a generative model for unsupervised or semi-supervised data clustering. The AMM is the first adversarially optimized method to model the conditional dependence between inferred continuous and categorical latent variables. Experiments on the MNIST and SVHN datasets show that the AMM allows for semantic separation of complex data when little or no labeled data is available. The AMM achieves a state-of-the-art unsupervised clustering error rate of 2.86% on the MNIST dataset. A semi-supervised extension of the AMM yields competitive results on the SVHN dataset.
Adversary Model In computer science, an online algorithm measures its competitiveness against different adversary models. For deterministic algorithms, the adversary is the same, the adaptive offline adversary. For randomized online algorithms competitiveness can depend upon the adversary model used.
Adviser Problem Humans have an unparalleled visual intelligence and can overcome visual ambiguities that machines currently cannot. Recent works have shown that incorporating guidance from humans during inference for real-world, challenging tasks like viewpoint-estimation and fine-grained classification, can help overcome difficult cases in which the computer-alone would have otherwise failed. These hybrid intelligence approaches are hence gaining traction. However, deciding what question to ask the human in the loop at inference time remains an unknown for these problems. We address this question by formulating it as what we call the Adviser Problem: can we learn a mapping from the input to a specific question to ask the human in the loop so as to maximize the expected positive impact to the overall task? We formulate a solution to the adviser problem using a deep network and apply it to the viewpoint estimation problem where the question asks for the location of a specific keypoint in the input image. We show that by using the keypoint guidance from the Adviser Network and the human, the model is able to outperform the previous hybrid-intelligence state-of-the-art by 3.27%, and outperform the computer-only state-of-the-art by 10.44% absolute.
AE1-WELM In this paper, we propose a novel approach based on cost-sensitive ensemble weighted extreme learning machine; we call this approach AE1-WELM. We apply this approach to text classification. AE1-WELM is an algorithm including balanced and imbalanced multiclassification for text classification. Weighted ELM assigning the different weights to the different samples improves the classification accuracy to a certain extent, but weighted ELM considers the differences between samples in the different categories only and ignores the differences between samples within the same categories. We measure the importance of the documents by the sample information entropy, and generate cost-sensitive matrix and factor based on the document importance, then embed the cost-sensitive weighted ELM into the AdaBoost.M1 framework seamlessly. Vector space model(VSM) text representation produces the high dimensions and sparse features which increase the burden of ELM. To overcome this problem, we develop a text classification framework combining the word vector and AE1-WELM. The experimental results show that our method provides an accurate, reliable and effective solution for text classification.
AEQUITAS Fairness is a critical trait in decision making. As machine-learning models are increasingly being used in sensitive application domains (e.g. education and employment) for decision making, it is crucial that the decisions computed by such models are free of unintended bias. But how can we automatically validate the fairness of arbitrary machine-learning models For a given machine-learning model and a set of sensitive input parameters, our AEQUITAS approach automatically discovers discriminatory inputs that highlight fairness violation. At the core of AEQUITAS are three novel strategies to employ probabilistic search over the input space with the objective of uncovering fairness violation. Our AEQUITAS approach leverages inherent robustness property in common machine-learning models to design and implement scalable test generation methodologies. An appealing feature of our generated test inputs is that they can be systematically added to the training set of the underlying model and improve its fairness. To this end, we design a fully automated module that guarantees to improve the fairness of the underlying model. We implemented AEQUITAS and we have evaluated it on six state-of-the-art classifiers, including a classifier that was designed with fairness constraints. We show that AEQUITAS effectively generates inputs to uncover fairness violation in all the subject classifiers and systematically improves the fairness of the respective models using the generated test inputs. In our evaluation, AEQUITAS generates up to 70% discriminatory inputs (w.r.t. the total number of inputs generated) and leverages these inputs to improve the fairness up to 94%.
AFEL-REC In this paper, we present preliminary results of AFEL-REC, a recommender system for social learning environments. AFEL-REC is build upon a scalable software architecture to provide recommendations of learning resources in near real-time. Furthermore, AFEL-REC can cope with any kind of data that is present in social learning environments such as resource metadata, user interactions or social tags. We provide a preliminary evaluation of three recommendation use cases implemented in AFEL-REC and we find that utilizing social data in form of tags is helpful for not only improving recommendation accuracy but also coverage. This paper should be valuable for both researchers and practitioners interested in providing resource recommendations in social learning environments.
Affect-LM Human verbal communication includes affective messages which are conveyed through use of emotionally colored words. There has been a lot of research in this direction but the problem of integrating state-of-the-art neural language models with affective information remains an area ripe for exploration. In this paper, we propose an extension to an LSTM (Long Short-Term Memory) language model for generating conversational text, conditioned on affect categories. Our proposed model, Affect-LM enables us to customize the degree of emotional content in generated sentences through an additional design parameter. Perception studies conducted using Amazon Mechanical Turk show that Affect-LM generates naturally looking emotional sentences without sacrificing grammatical correctness. Affect-LM also learns affect-discriminative word representations, and perplexity experiments show that additional affective information in conversational text can improve language model prediction.
Affine Forward Variance
We introduce the class of affine forward variance (AFV) models of which both the conventional Heston model and the rough Heston model are special cases. We show that AFV models can be characterized by the affine form of their cumulant generating function, which can be obtained as solution of a convolution Riccati equation. We further introduce the class of affine forward order flow intensity (AFI) models, which are structurally similar to AFV models, but driven by jump processes, and which include Hawkes-type models. We show that the cumulant generating function of an AFI model satisfies a generalized convolution Riccati equation and that a high-frequency limit of AFI models converges in distribution to the AFV model.
Affinity Analysis Affinity analysis is a data analysis and data mining technique that discovers co-occurrence relationships among activities performed by (or recorded about) specific individuals or groups. In general, this can be applied to any process where agents can be uniquely identified and information about their activities can be recorded. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of cross-selling and up-selling, in addition to influencing sales promotions, loyalty programs, store design, and discount plans.
“Market Basket Analysis”
agate agate is a Python data analysis library that is optimized for humans instead of machines. It is an alternative to numpy and pandas that helps you solve real-world problems
Age Period Cohort Model
Age-Period-Cohort models is a class of models for demographic rates (mortality/morbidity/fertility/…) observed for a broad age range over a reasonably long time period, and classified by age and date of follow-up (period) and date of birth (cohort). This type of follow-up can be shown in a Lexis-diagram; a coordinate system with data of follow-up along the x-axis, and age along the y-axis. A single persons life-trajectory is therefore a straight line with slope 1 (as calender time and age advance at the same pace). Tabulated data enumerates the number of events and the risk time (sum of lengths of life-trajectories) in some subsets of the Lexis diagram, usually subsets classified by age and period in equally long intervals. Individual life-lines can be shown with colouring according to states, or the diagram can just be shown to indicate what ages and periods are covered, and what subsets are used for classification of events and risk time. The Age-Period-Cohort model describes the (log)rates as a sum of (non-linear) age- period- and cohort-effects. The three variables age (at follow-up), a, period (i.e. date of follow-up), p, and cohort (date of birth), c, are related by a=p-c – any one person’s age is calculated by subtracting the date of birth from the current date. Hence the three variables used to describe rates are linearly related, and the model can therefore be parametrized in different ways, and still produce the same estimated rates. In popular terms you can say that it is possible to move two linear effects around between the three terms, because the age-terms contains the linear effect of age, the period-terms contains the linear effect of period and the cohort effect contains the linear effect of cohort. An illustration of this phenomenon is in this little “film” of APC-effects on testis cancer rates in Denmark. All sets of estimates will yield the same set of fitted rates.
Agent Based Model
An agent-based model (ABM) is a class of computational models for simulating the actions and interactions of autonomous agents (both individual or collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole. It combines elements of game theory, complex systems, emergence, computational sociology, multi-agent systems, and evolutionary programming. Monte Carlo Methods are used to introduce randomness. Particularly within ecology, ABMs are also called individual-based models (IBMs), and individuals within IBMs may be simpler than than fully autonomous agents within ABMs. A review of recent literature on individual-based models, agent-based models, and multiagent systems shows that ABMs are used on non-computing related scientific domains including biology, ecology and social science. Agent-based modeling is related to, but distinct from, the concept of multi-agent systems or multi-agent simulation in that the goal of ABM is to search for explanatory insight into the collective behavior of agents obeying simple rules, typically in natural systems, rather than in designing agents or solving specific practical or engineering problems.
Agent-Based Computational Economics
Agent-based computational economics (ACE) is the area of computational economics that studies economic processes, including whole economies, as dynamic systems of interacting agents. As such, it falls in the paradigm of complex adaptive systems. In corresponding agent-based models, the ‘agents’ are ‘computational objects modeled as interacting according to rules’ over space and time, not real people. The rules are formulated to model behavior and social interactions based on incentives and information. Such rules could also be the result of optimization, realized through use of AI methods (such as Q-learning and other reinforcement learning techniques). The theoretical assumption of mathematical optimization by agents in equilibrium is replaced by the less restrictive postulate of agents with bounded rationality adapting to market forces. ACE models apply numerical methods of analysis to computer-based simulations of complex dynamic problems for which more conventional methods, such as theorem formulation, may not find ready use. Starting from initial conditions specified by the modeler, the computational economy evolves over time as its constituent agents repeatedly interact with each other, including learning from interactions. In these respects, ACE has been characterized as a bottom-up culture-dish approach to the study of economic systems. ACE has a similarity to, and overlap with, game theory as an agent-based method for modeling social interactions. But practitioners have also noted differences from standard methods, for example in ACE events modeled being driven solely by initial conditions, whether or not equilibria exist or are computationally tractable, and in the modeling facilitation of agent autonomy and learning. The method has benefited from continuing improvements in modeling techniques of computer science and increased computer capabilities. The ultimate scientific objective of the method is to ‘test theoretical findings against real-world data in ways that permit empirically supported theories to cumulate over time, with each researcher’s work building appropriately on the work that has gone before.’ The subject has been applied to research areas like asset pricing, competition and collaboration, transaction costs, market structure and industrial organization and dynamics, welfare economics, and mechanism design, information and uncertainty, macroeconomics, and Marxist economics.
Book: Introduction to Agent-Based Economics
Agent-Based Model
An agent-based model (ABM) is a class of computational models for simulating the actions and interactions of autonomous agents (both individual or collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole. It combines elements of game theory, complex systems, emergence, computational sociology, multi-agent systems, and evolutionary programming. Monte Carlo methods are used to introduce randomness. Particularly within ecology, ABMs are also called individual-based models (IBMs), and individuals within IBMs may be simpler than fully autonomous agents within ABMs. A review of recent literature on individual-based models, agent-based models, and multiagent systems shows that ABMs are used on non-computing related scientific domains including biology, ecology and social science. Agent-based modeling is related to, but distinct from, the concept of multi-agent systems or multi-agent simulation in that the goal of ABM is to search for explanatory insight into the collective behavior of agents obeying simple rules, typically in natural systems, rather than in designing agents or solving specific practical or engineering problems. Agent-based models are a kind of microscale model that simulate the simultaneous operations and interactions of multiple agents in an attempt to re-create and predict the appearance of complex phenomena. The process is one of emergence from the lower (micro) level of systems to a higher (macro) level. As such, a key notion is that simple behavioral rules generate complex behavior. This principle, known as K.I.S.S. (‘Keep it simple, stupid’), is extensively adopted in the modeling community. Another central tenet is that the whole is greater than the sum of the parts. Individual agents are typically characterized as boundedly rational, presumed to be acting in what they perceive as their own interests, such as reproduction, economic benefit, or social status, using heuristics or simple decision-making rules. ABM agents may experience ‘learning’, adaptation, and reproduction. Most agent-based models are composed of: (1) numerous agents specified at various scales (typically referred to as agent-granularity); (2) decision-making heuristics; (3) learning rules or adaptive processes; (4) an interaction topology; and (5) an environment. ABMs are typically implemented as computer simulations, either as custom software, or via ABM toolkits, and this software can be then used to test how changes in individual behaviors will affect the system’s emerging overall behavior.
Agent-Based Modeling and Simulation
“Agent Based Model”
Agent-Based Modeling and Simulation
Agglomerative Contextual Decomposition
Deep neural networks (DNNs) have achieved impressive predictive performance due to their ability to learn complex, non-linear relationships between variables. However, the inability to effectively visualize these relationships has led to DNNs being characterized as black boxes and consequently limited their applications. To ameliorate this problem, we introduce the use of hierarchical interpretations to explain DNN predictions through our proposed method, agglomerative contextual decomposition (ACD). Given a prediction from a trained DNN, ACD produces a hierarchical clustering of the input features, along with the contribution of each cluster to the final prediction. This hierarchy is optimized to identify clusters of features that the DNN learned are predictive. Using examples from Stanford Sentiment Treebank and ImageNet, we show that ACD is effective at diagnosing incorrect predictions and identifying dataset bias. Through human experiments, we demonstrate that ACD enables users both to identify the more accurate of two DNNs and to better trust a DNN’s outputs. We also find that ACD’s hierarchy is largely robust to adversarial perturbations, implying that it captures fundamental aspects of the input and ignores spurious noise.
Agglomerative Hierarchical Clustering
Hierarchical clustering algorithms are either top-down or bottom-up. Bottom-up algorithms treat each document as a singleton cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all documents. Bottom-up hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC. Top-down clustering requires a method for splitting a cluster. It proceeds by splitting clusters recursively until individual documents are reached.
Agglomerative Info-Clustering An agglomerative clustering of random variables is proposed, where clusters of random variables sharing the maximum amount of multivariate mutual information are merged successively to form larger clusters. Compared to the previous info-clustering algorithms, the agglomerative approach allows the computation to stop earlier when clusters of desired size and accuracy are obtained. An efficient algorithm is also derived based on the submodularity of entropy and the duality between the principal sequence of partitions and the principal sequence for submodular functions.
Aggregated Learning “AgrLearn”
Aggregated Lexical Table
Correspondence Analysis on Generalised Aggregated Lexical Tables
Aggregated Wasserstein We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time spot follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as Gaussian mixture model-HMM (GMM-HMM). The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semi-metric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of the states. This distance quantifies the dissimilarity of GMM-HMMs by measuring both the difference between the two marginal GMMs and the difference between the two transition matrices. Our new distance is tested on the tasks of retrieval and classification of time series. Experiments on both synthetic data and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the Kullback-Leibler divergence.
Aggregation Operator Aggregation operators are mathematical functions that are used to combine information. That is, they are used to combine N data (for example, N numerical values) in a single datum. The arithmetic mean and the weighted mean are the most well-known aggregation operators. The median and the mode can also been as aggregation operators. The main difference between the arithmetic mean and the weighted mean is that the latter permits us to weight the different data according to their relevance. There exist a large number of different aggregation operators that differ on the assumptions on the data (data types) and about the type of information that we can incorporate in the model. For example, fuzzy integrals permits us to assign relevance to sets of information sources and not only to individual sources as for the weighted mean.
AgileNet The success of deep learning models is heavily tied to the use of massive amount of labeled data and excessively long training time. With the emergence of intelligent edge applications that use these models, the critical challenge is to obtain the same inference capability on a resource-constrained device while providing adaptability to cope with the dynamic changes in the data. We propose AgileNet, a novel lightweight dictionary-based few-shot learning methodology which provides reduced complexity deep neural network for efficient execution at the edge while enabling low-cost updates to capture the dynamics of the new data. Evaluations of state-of-the-art few-shot learning benchmarks demonstrate the superior accuracy of AgileNet compared to prior arts. Additionally, AgileNet is the first few-shot learning approach that prevents model updates by eliminating the knowledge obtained from the primary training. This property is ensured through the dictionaries learned by our novel end-to-end structured decomposition, which also reduces the memory footprint and computation complexity to match the edge device constraints.
Agnostic Disambiguation of Named Entities Using Linked Open Data
AGDISTIS is an Open Source Named Entity Disambiguation Framework able to link entities against every Linked Data Knowledge Base. The ongoing transition from the current Web of unstructured data to the Data Web yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework). One of the key steps towards extracting RDF from natural-language corpora is the disambiguation of named entities. AGDISTIS combines the HITS algorithm with label expansion strategies and string similarity measures. Based on this combination, it can efficiently detect the correct URIs for a given set of named entities within an input text. Furthermore, AGDISTIS is agnostic of the underlying knowledge base. AGDISTIS has been evaluated on different datasets against state-of-the-art named entity disambiguation frameworks.
Agreement-Based Learning Model selection is a problem that has occupied machine learning researchers for a long time. Recently, its importance has become evident through applications in deep learning. We propose an agreement-based learning framework that prevents many of the pitfalls associated with model selection. It relies on coupling the training of multiple models by encouraging them to agree on their predictions while training. In contrast with other model selection and combination approaches used in machine learning, the proposed framework is inspired by human learning. We also propose a learning algorithm defined within this framework which manages to significantly outperform alternatives in practice, and whose performance improves further with the availability of unlabeled data. Finally, we describe a number of potential directions for developing more flexible agreement-based learning algorithms.
AgrLearn We establish an equivalence between information bottleneck (IB) learning and an unconventional quantization problem, `IB quantization’. Under this equivalence, standard neural network models correspond to scalar IB quantizers. We prove a coding theorem for IB quantization, which implies that scalar IB quantizers are in general inferior to vector IB quantizers. This inspires us to develop a learning framework for neural networks, AgrLearn, that corresponds to vector IB quantizers. We experimentally verify that AgrLearn applied to some deep network models of current art improves upon them, while requiring less training data. With a heuristic smoothing, AgrLearn further improves its performance, resulting in new state of the art in image classification on Cifar10.
AIDE In this paper, we present two new communication-efficient methods for distributed minimization of an average of functions. The first algorithm is an inexact variant of the DANE algorithm that allows any local algorithm to return an approximate solution to a local subproblem. We show that such a strategy does not affect the theoretical guarantees of DANE significantly. In fact, our approach can be viewed as a robustification strategy since the method is substantially better behaved than DANE on data partition arising in practice. It is well known that DANE algorithm does not match the communication complexity lower bounds. To bridge this gap, we propose an accelerated variant of the first method, called AIDE, that not only matches the communication lower bounds but can also be implemented using a purely first-order oracle. Our empirical results show that AIDE is superior to other communication efficient algorithms in settings that naturally arise in machine learning applications.
AIOps AIOps platforms utilize big data, modern machine learning and other advanced analytics technologies to directly and indirectly enhance IT operations (monitoring, automation and service desk) functions with proactive, personal and dynamic insight. AIOps platforms enable the concurrent use of multiple data sources, data collection methods, analytical (real-time and deep) technologies, and presentation technologies.
AIQ Focusing on Business AI, this article introduces the AIQ quadrant that enables us to measure AI for business applications in a relative comparative manner, i.e. to judge that software A has more or less intelligence than software B. Recognizing that the goal of Business software is to maximize value in terms of business results, the dimensions of the quadrant are the key factors that determine the business value of AI software: Level of Output Quality (Smartness) and Level of Automation. The use of the quadrant is illustrated by several software solutions to support the real life business challenge of field service scheduling. The role of machine learning and conversational digital assistants in increasing the business value are also discussed and illustrated with a recent integration of existing intelligent digital assistants for factory floor decision making with the new version of Google Glass. Such hands free AI solutions elevate the AIQ level to its ultimate position.
AIXIjs Reinforcement learning is a general and powerful framework with which to study and implement artificial intelligence. Recent advances in deep learning have enabled RL algorithms to achieve impressive performance in restricted domains such as playing Atari video games (Mnih et al., 2015) and, recently, the board game Go (Silver et al., 2016). However, we are still far from constructing a generally intelligent agent. Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments? What, in fact, does it mean to be optimal in the general sense? The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the sub-field of general reinforcement learning (GRL). Recently, AIXI has been shown to be flawed in important ways; it doesn’t explore enough to be asymptotically optimal (Orseau, 2010), and it can perform poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI have been proposed to attempt to address these shortfalls: among them are entropy-seeking agents (Orseau, 2011), knowledge-seeking agents (Orseau et al., 2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike, 2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL agents. This implementation is accompanied by a framework for running experiments against various environments, similar to OpenAI Gym (Brockman et al., 2016), and a suite of interactive demos that explore different properties of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to present numerous experiments illustrating fundamental properties of, and differences between, these agents.
Akaike Information Criterion
The Akaike information criterion (AIC) is a measure of the relative quality of a statistical model for a given set of data. As such, AIC provides a means for model selection. AIC deals with the trade-off between the goodness of fit of the model and the complexity of the model. It is founded on information theory: it offers a relative estimate of the information lost when a given model is used to represent the process that generates the data. AIC does not provide a test of a model in the sense of testing a null hypothesis; i.e. AIC can tell nothing about the quality of the model in an absolute sense. If all the candidate models fit poorly, AIC will not give any warning of that.
Akid Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet another statistic model, but a model for the natural processing engine — the brain. In this work, we describe a neural network library named {\texttt akid}. It provides higher level of abstraction for entities (abstracted as blocks) in nature upon the abstraction done on signals (abstracted as tensors) by Tensorflow, characterizing the dataism observation that all entities in nature processes input and emit out in some ways. It includes a full stack of software that provides abstraction to let researchers focus on research instead of implementation, while at the same time the developed program can also be put into production seamlessly in a distributed environment, and be production ready. At the top application stack, it provides out-of-box tools for neural network applications. Lower down, akid provides a programming paradigm that lets user easily build customized models. The distributed computing stack handles the concurrency and communication, thus letting models be trained or deployed to a single GPU, multiple GPUs, or a distributed environment without affecting how a model is specified in the programming paradigm stack. Lastly, the distributed deployment stack handles how the distributed computing is deployed, thus decoupling the research prototype environment with the actual production environment, and is able to dynamically allocate computing resources, so development (Devs) and operations (Ops) could be separated. Please refer to http://…/latest for documentation.
Akismet Akismet or Automattic Kismet is a spam filtering service. It attempts to filter link spam from blog comments and spam TrackBack pings. The filter works by combining information about spam captured on all participating blogs, and then using those spam rules to block future spam. Akismet is offered by Automattic, the company behind Launched on October 25, 2005, Akismet is said to have captured over 100 billion spam comments and pings as of October 2013.
akka Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM.
ALAMO ALAMO is a computational methodology for leaning algebraic functions from data. Given a data set, the approach begins by building a low-complexity, linear model composed of explicit non-linear transformations of the independent variables. Linear combinations of these non-linear transformations allow a linear model to better approximate complex behavior observed in real processes. The model is refined, as additional data are obtained in an adaptive fashion through error maximization sampling using derivative-free optimization. Models built using ALAMO can enforce constraints on the response variables to incorporate first-principles knowledge. The ability of ALAMO to generate simple and accurate models for a number of reaction problems is demonstrated. The error maximization sampling is compared with Latin hypercube designs to demonstrate its sampling efficiency. ALAMO’s constrained regression methodology is used to further refine concentration models, resulting in models that perform better on validation data and satisfy upper and lower bounds placed on model outputs.
Alchemist The need for efficient and scalable numerical linear algebra and machine-learning implementations continues to grow with the increasing importance of big data analytics. Since its introduction, Apache Spark has become an integral tool in this field, with attractive features such as ease of use, interoperability with the Hadoop ecosystem, and fault tolerance. However, it has been shown that numerical linear algebra routines implemented using MPI, a tool for parallel programming commonly used in high-performance computing, can outperform the equivalent Spark routines by an order of magnitude or more. We describe Alchemist, a system for interfacing between Spark and existing MPI libraries that is designed to address this performance gap. The libraries can be called from a Spark application with little effort, and we illustrate how the resulting system leads to efficient and scalable performance on large datasets.
Alchemist: An Apache Spark <=> MPI Interface
Algebraic Machine Learning Machine learning algorithms use error function minimization to fit a large set of parameters in a preexisting model. However, error minimization eventually leads to a memorization of the training dataset, losing the ability to generalize to other datasets. To achieve generalization something else is needed, for example a regularization method or stopping the training when error in a validation dataset is minimal. Here we propose a different approach to learning and generalization that is parameter-free, fully discrete and that does not use function minimization. We use the training data to find an algebraic representation with minimal size and maximal freedom, explicitly expressed as a product of irreducible components. This algebraic representation is shown to directly generalize, giving high accuracy in test data, more so the smaller the representation. We prove that the number of generalizing representations can be very large and the algebra only needs to find one. We also derive and test a relationship between compression and error rate. We give results for a simple problem solved step by step, hand-written character recognition, and the Queens Completion problem as an example of unsupervised learning. As an alternative to statistical learning, \enquote{algebraic learning} may offer advantages in combining bottom-up and top-down information, formal concept derivation from data and large-scale parallelization.
Algebraic Statistics Algebraic statistics is the use of algebra to advance statistics. Algebra has been useful for experimental design, parameter estimation, and hypothesis testing. Traditionally, algebraic statistics has been associated with the design of experiments and multivariate analysis (especially time series). In recent years, the term “algebraic statistics” has been sometimes restricted, sometimes being used to label the use of algebraic geometry and commutative algebra in statistics.
Algebraic Subspace Clustering
Algebraic Subspace Clustering (ASC) is a simple and elegant method based on polynomial fitting and differentiation for clustering noiseless data drawn from an arbitrary union of subspaces. In practice, however, ASC is limited to equi-dimensional subspaces because the estimation of the subspace dimension via algebraic methods is sensitive to noise. This paper proposes a new ASC algorithm that can handle noisy data drawn from subspaces of arbitrary dimensions. The key ideas are (1) to construct, at each point, a decreasing sequence of subspaces containing the subspace passing through that point; (2) to use the distances from any other point to each subspace in the sequence to construct a subspace clustering affinity, which is superior to alternative affinities both in theory and in practice. Experiments on the Hopkins 155 dataset demonstrate the superiority of the proposed method with respect to sparse and low rank subspace clustering methods.
Algorithm Quasi-Optimal Learning
The algorithm quasi-optimal (AQ) is a powerful machine learning methodology aimed at learning symbolic decision rules from a set of examples and counterexamples. It was first proposed in the late 1960s to solve the Boolean function satisfiability problem and further refined over the following decade to solve the general covering problem. In its newest implementations, it is a powerful but yet little explored methodology for symbolic machine learning classification. It has been applied to solve several problems from different domains, including the generation of individuals within an evolutionary computation framework. The current article introduces the main concepts of the AQ methodology and describes AQ for source detection(AQ4SD), a tailored implementation of the AQ methodology to solve the problem of finding the sources of atmospheric releases using distributed sensor measurements.
Algorithmia An Open Marketplace For Algorithms: We’re building a community around state-of-the-art algorithm development. Users can create, share, and build on other algorithms and then instantly make them available as a web service.
Algorithmic Complexity
The information content or complexity of an object can be measured by the length of its shortest description. For instance the string “01010101010101010101010101010101” has the short description “16 repetitions of 01”, while “11001000011000011101111011101100” presumably has no simpler description other than writing down the string itself. More formally, the Algorithmic “Kolmogorov” Complexity (AC) of a string x is defined as the length of the shortest program that computes or outputs x , where the program is run on some fixed reference universal computer.
Algorithmic Polynomial The approximate degree of a Boolean function $f(x_{1},x_{2},\ldots,x_{n})$ is the minimum degree of a real polynomial that approximates $f$ pointwise within $1/3$. Upper bounds on approximate degree have a variety of applications in learning theory, differential privacy, and algorithm design in general. Nearly all known upper bounds on approximate degree arise in an existential manner from bounds on quantum query complexity. We develop a first-principles, classical approach to the polynomial approximation of Boolean functions. We use it to give the first constructive upper bounds on the approximate degree of several fundamental problems: – $O\bigl(n^{\frac{3}{4}-\frac{1}{4(2^{k}-1)}}\bigr)$ for the $k$-element distinctness problem; – $O(n^{1-\frac{1}{k+1}})$ for the $k$-subset sum problem; – $O(n^{1-\frac{1}{k+1}})$ for any $k$-DNF or $k$-CNF formula; – $O(n^{3/4})$ for the surjectivity problem. In all cases, we obtain explicit, closed-form approximating polynomials that are unrelated to the quantum arguments from previous work. Our first three results match the bounds from quantum query complexity. Our fourth result improves polynomially on the $\Theta(n)$ quantum query complexity of the problem and refutes the conjecture by several experts that surjectivity has approximate degree $\Omega(n)$. In particular, we exhibit the first natural problem with a polynomial gap between approximate degree and quantum query complexity.
Algorithmic Social Intervention Social and behavioral interventions are a critical tool for governments and communities to tackle deep-rooted societal challenges such as homelessness, disease, and poverty. However, real-world interventions are almost always plagued by limited resources and limited data, which creates a computational challenge: how can we use algorithmic techniques to enhance the targeting and delivery of social and behavioral interventions? The goal of my thesis is to provide a unified study of such questions, collectively considered under the name ‘algorithmic social intervention’. This proposal introduces algorithmic social intervention as a distinct area with characteristic technical challenges, presents my published research in the context of these challenges, and outlines open problems for future work. A common technical theme is decision making under uncertainty: how can we find actions which will impact a social system in desirable ways under limitations of knowledge and resources? The primary application area for my work thus far is public health, e.g. HIV or tuberculosis prevention. For instance, I have developed a series of algorithms which optimize social network interventions for HIV prevention. Two of these algorithms have been pilot-tested in collaboration with LA-area service providers for homeless youth, with preliminary results showing substantial improvement over status-quo approaches. My work also spans other topics in infectious disease prevention and underlying algorithmic questions in robust and risk-aware submodular optimization.
Algorithmic Transparency What information is your black-box analytics system really relying on, and should it?
Aligned Rank Transform
Nonparametric data from multi-factor experiments arise often in human-computer interaction (HCI). Examples may include error counts, Likert responses, and preference tallies. But because multiple factors are involved, common nonparametric tests (e.g., Friedman) are inadequate, as they are unable to examine interaction effects. While some statistical techniques exist to handle such data, these techniques are not widely available and are complex. To address these concerns, we present the Aligned Rank Transform (ART) for nonparametric factorial data analysis in HCI. The ART relies on a preprocessing step that ‘aligns’ data before applying averaged ranks, after which point common ANOVA procedures can be used, making the ART accessible to anyone familiar with the F-test. Unlike most articles on the ART, which only address two factors, we generalize the ART to N factors. We also provide ARTool and ARTweb, desktop and Web-based programs for aligning and ranking data. Our re-examination of some published HCI results exhibits advantages of the ART.
ALINE Algorithm
The ALINE algorithm (Kondrak, 2000) assigns a similarity score to pairs of phonetically-transcribed words on the basis of the decomposition of phonemes into elementary phonetic features. The algorithm was originally designed to identify and align cognates in vocabularies of related languages. Nevertheless, thanks to its grounding in universal phonetic principles, the algorithm can be used for estimating the similarity of any pair of words. The principal component of ALINE is a function that calculates the similarity of two phonemes that are expressed in terms of about a dozen multi-valued phonetic features (Place, Manner, Voice, etc.). The phonetic features are assigned salience weights that express their relative importance. Feature values are encoded as oating-point numbers in the range [0;1]. For example, the feature Manner can take any of the following seven values: stop = 1.0, affricate = 0.9, fricative = 0.8, approximant = 0.6, high vowel = 0.4, mid vowel = 0.2, and low vowel = 0.0. The numerical values re ect the distances between vocal organs during speech production. The overall similarity score is the sum of individual similarity scores between pairs of phonemes in an optimal alignment of two words, which is computed by a dynamic programming algorithm (Wagner and Fischer, 1974). A constant insertion/deletion penalty is applied for each unaligned phoneme. Another constant penalty is set to reduce relative importance of vowel as opposed to consonant phoneme matches. The similarity value is normalized by the length of the longer word. ALINE’s behavior is controlled by a number of parameters: the maximum phonemic score, the insertion/ deletion penalty, the vowel penalty, and the feature salience weights. The parameters have default settings for the cognate matching task, but these settings can be optimized (tuned) on a development set that includes both positive and negative examples of similar words.
Greg Kondrak
Evaluation of Several Phonetic Similarity Algorithms on the Task of Cognate Identification
Allan Variance
The Allan variance (AVAR), also known as two-sample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan. The Allan variance is intended to estimate stability due to noise processes and not that of systematic errors or imperfections such as frequency drift or temperature effects. The Allan variance and Allan deviation describe frequency stability, i.e. the stability in frequency. See also the section entitled ‘Interpretation of value’ below. There are also different adaptations or alterations of Allan variance, notably the modified Allan variance MAVAR or MVAR, the total variance, and the Hadamard variance. There also exist time stability variants such as time deviation TDEV or time variance TVAR. Allan variance and its variants have proven useful outside the scope of timekeeping and are a set of improved statistical tools to use whenever the noise processes are not unconditionally stable, thus a derivative exists. The general M-sample variance remains important since it allows dead time in measurements and bias functions allows conversion into Allan variance values. Nevertheless, for most applications the special case of 2-sample, or ‘Allan variance’ with T = tau is of greatest interest.
AllenNLP This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding. AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily. It is built on top of PyTorch, allowing for dynamic computation graphs, and provides (1) a flexible data API that handles intelligent batching and padding, (2) high-level abstractions for common operations in working with text, and (3) a modular and extensible experiment framework that makes doing good science easy. It also includes reference implementations of high quality approaches for both core semantic problems (e.g. semantic role labeling (Palmer et al., 2005)) and language understanding applications (e.g. machine comprehension (Rajpurkar et al., 2016)). AllenNLP is an ongoing open-source effort maintained by engineers and researchers at the Allen Institute for Artificial Intelligence.
All-Pairs Testing In computer science, all-pairs testing or pairwise testing is a combinatorial method of software testing that, for each pair of input parameters to a system (typically, a software algorithm), tests all possible discrete combinations of those parameters. Using carefully chosen test vectors, this can be done much faster than an exhaustive search of all combinations of all parameters, by “parallelizing” the tests of parameter pairs.
Alluvial Diagram Alluvial diagrams are a type of flow diagram originally developed to represent changes in network structure over time. In allusion to both their visual appearance and their emphasis on flow, alluvial diagrams are named after alluvial fans that are naturally formed by the soil deposited from streaming water.
ALOJA This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.
alpha-Algorithm The alpha-algorithm is an algorithm used in process mining, aimed at reconstructing causality from a set of sequences of events. It was first put forward by van der Aalst, Weijters and Maruster. Several extensions or modifications of it have since been presented, which will be listed below.
It constructs P/T nets with special properties (workflow nets) from event logs (as might be collected by an ERP system). Each transition in the net corresponds to an observed task.
Alpha-Outliers Crossover,crossdes
Alpha-Sutte Indicator “Sutte Indicator”
AlphaX We present AlphaX, a fully automated agent that designs complex neural architectures from scratch. AlphaX explores the exponentially exploded search space with a novel distributed Monte Carlo Tree Search (MCTS) and a Meta-Deep Neural Network (DNN). MCTS intrinsically improves the search efficiency by automatically balancing the exploration and exploitation at each state, while Meta-DNN predicts the network accuracy to guide the search, and to provide an estimated reward for the preemptive backpropagation in the distributed setup. As the search progresses, AlphaX also generates the training date for Meta-DNN. So, the learning of Meta-DNN is end-to-end. In searching for NASNet style architectures, AlphaX found several promising architectures with up to 1% higher accuracy than NASNet using only 17 GPUs for 5 days, demonstrating up to 23.5x speedup over the original searching for NASNet that used 500 GPUs in 4 days.
Alternating Direction Method of Multipliers
We propose a significant expansion of the scope of the alternating direction method of multipliers (ADMM). Specifically, we show that ADMM, when employed to solve problems with multiaffine constraints that satisfy certain easily verifiable assumptions, converges to the set of constrained stationary points if the penalty parameter in the augmented Lagrangian is sufficiently large. Our analysis applies under assumptions that we have endeavored to make as weak as possible. It applies to problems that involve nonconvex and/or nonsmooth objective terms, in addition to the multiaffine constraints that can involve multiple (three or more) blocks of variables. To illustrate the applicability of our results, we describe examples including nonnegative matrix factorization, sparse learning, risk parity portfolio selection, nonconvex formulations of convex problems, and neural network training. In each case, our ADMM approach encounters only subproblems that have closed-form solutions.
Alternating Directions Dual Decomposition
We present AD3, a new algorithm for approximate maximum a posteriori (MAP) inference on factor graphs, based on the alternating directions method of multipliers. Like other dual decomposition algorithms, AD3 has a modular architecture, where local subproblems are solved independently, and their solutions are gathered to compute a global update. The key characteristic of AD3 is that each local subproblem has a quadratic regularizer, leading to faster convergence, both theoretically and in practice. We provide closed-form solutions for these AD3 subproblems for binary pairwise factors and factors imposing rst-order logic constraints. For arbitrary factors (large or combinatorial), we introduce an active set method which requires only an oracle for computing a local MAP con guration, making AD3 applicable to a wide range of problems. Experiments on synthetic and real-world problems show that AD3 compares favorably with the state-of-the-art.
Alternating Least Squares
In ALS you’re minimizing the entire loss function at once, but, only twiddling half the parameters. That’s because the optimization has an easy algebraic solution – if half your parameters are fixed. So you fix half, recompute the other half, and repeat. There is no gradient in the optimization step since each optimization problem is convex and doesn’t need an approximate approach. But, each problem you’re solving is not the “real” optimization problem – you fixed half the parameters.
Alternating Minimization Induced Active Set Algorithms
Altham-Poisson Distribution The multiplicative binomial model was introduced as a generalization of the binomial distribution for modelling correlated binomial data. This distribution has not been extensively explored and is revisited in the present study. Some properties of the multiplicative binomial distribution, such as, expressions for the factorial moments and the information matrix, are investigated. The distribution is also extended to accommodate data arising from a Wadley’s problem setting which is frequently encountered in dose-mortality studies and is one in which the number of organisms initially treated with a drug is unobserved. The Altham-Poisson distribution is introduced by modelling the unobserved initial number of organisms, as specified by Formula in the multiplicative binomial model, with a Poisson distribution and its suitability for overdispersed data from a Wadley’s problem setting is explored.
Amazon Machine Learning Amazon Machine Learning is a new service that makes it easy for developers of all skill levels to use machine learning technology. Amazon Machine Learning provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology. Once your models are ready, Amazon Machine Learning makes it easy to get predictions for your application using simple APIs, without having to implement custom prediction generation code, or manage any infrastructure. Amazon Machine Learning is based on the same proven, highly scalable, ML technology used for years by Amazon’s internal data scientist community. The service uses powerful algorithms to create ML models by finding patterns in your existing data. Then, Amazon Machine Learning uses these models to process new data and generate predictions for your application. Amazon Machine Learning is highly scalable and can generate billions of predictions daily, and serve those predictions in real-time and at high throughput. With Amazon Machine Learning, there is no upfront hardware or software investment, and you pay as you go, so you can start small and scale as your application grows.
Ambient Diagnostics People can usually sense troubles in a car from noises, vibrations, or smells. An experienced driver can even tell where the problem is. We call this kind of skill ‘Ambient Diagnostics’. Ambient Diagnostics is an emerging field that is aimed at detecting abnormities from seemly disconnected ambient data that we take for granted. For example, the human body is a rich ambient data source: temperature, pulses, gestures, sound, forces, moisture, et al. Also, many electronic devices provide pervasive ambient data streams, such as mobile phones, surveillance cameras, satellite images, personal data assistants, wireless networks and so on.
AmbientGAN Generative models provide a way to model structure in complex distributions and have been shown to be useful for many tasks of practical interest. However, current techniques for training generative models require access to fully-observed samples. In many settings, it is expensive or even impossible to obtain fully-observed samples, but economical to obtain partial, noisy observations. We consider the task of learning an implicit generative model given only lossy measurements of samples from the distribution of interest. We show that the true underlying distribution can be provably recovered even in the presence of per-sample information loss for a class of measurement models. Based on this, we propose a new method of training Generative Adversarial Networks (GANs) which we call AmbientGAN. On three benchmark datasets, and for various measurement models, we demonstrate substantial qualitative and quantitative improvements. Generative models trained with our method can obtain $2$-$4$x higher inception scores than the baselines.
Amortized Inference Regularization
The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.
Anaconda We investigate distribution testing with access to non-adaptive conditional samples. In the conditional sampling model, the algorithm is given the following access to a distribution: it submits a query set $S$ to an oracle, which returns a sample from the distribution conditioned on being from $S$. In the non-adaptive setting, all query sets must be specified in advance of viewing the outcomes. Our main result is the first polylogarithmic-query algorithm for equivalence testing, deciding whether two unknown distributions are equal to or far from each other. This is an exponential improvement over the previous best upper bound, and demonstrates that the complexity of the problem in this model is intermediate to the the complexity of the problem in the standard sampling model and the adaptive conditional sampling model. We also significantly improve the sample complexity for the easier problems of uniformity and identity testing. For the former, our algorithm requires only $\tilde O(\log n)$ queries, matching the information-theoretic lower bound up to a $O(\log \log n)$-factor. Our algorithm works by reducing the problem from $\ell_1$-testing to $\ell_\infty$-testing, which enjoys a much cheaper sample complexity. Necessitated by the limited power of the non-adaptive model, our algorithm is very simple to state. However, there are significant challenges in the analysis, due to the complex structure of how two arbitrary distributions may differ.
Analysis of Covariance
Covariance is a measure of how much two variables change together and how strong the relationship is between them. Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether population means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV), while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV). Therefore, when performing ANCOVA, the DV means are adjusted to what they would be if all groups were equal on the CV.
Analysis of Means
The analysis of means (ANOM) is a graphical procedure for comparing a collection of means, rates, or proportions to see if any of them differ significantly from the overall mean, rate, or proportion. The ANOM is a type of multiple comparison procedure. The results of the analysis are summarized in an ANOM decision chart. This chart is similar in appearance to a control chart. It has a centerline, located at the overall mean (rate, or proportion), and upper and lower decision limits. Group means (or rates, or proportions) are plotted on this chart and if one falls beyond a decision limit then that group is said to be statistically different from the overal mean (rate or proportion).
In many situations, you need to examine differences among more than two groups. When you are interested in comparing multiple group means, you can use the analysis of means (ANOM) as an alternative to the one-way analysis of variance F test. The ANOM provides a ‘confidence interval type of approach’ that allows you to determine which, if any, of the c groups has a mean significantly different from the overall average of all the group means combined. Instead of looking at the lower and upper limits of a confidence interval, you will be studying which of the c group means are not contained in an interval formed between a lower decision line and an upper decision line. Any individual group mean not contained in this interval is deemed significantly higher than the overall average of all groups if it lies above the upper decision line. Similarly, any group mean that falls below the lower decision line is declared significantly lower than the overall group average.
This paper presents a novel method, called Analysis-of-marginal-Tail-Means (ATM), for parameter optimization over a large, discrete design space. The key advantage of ATM is that it offers effective and robust optimization performance for both smooth and rugged response surfaces, using only a small number of function evaluations. This method can therefore tackle a wide range of engineering problems, particularly in applications where the performance metric to optimize is ‘black-box’ and expensive to evaluate. The ATM framework unifies two parameter optimization methods in the literature: the Analysis-of-marginal-Means (AM) approach (Taguchi, 1986), and the Pick-the-Winner (PW) approach (Wu et al., 1990). In this paper, we show that by providing a continuum between AM and PW via the novel idea of marginal tail means, the proposed method offers a balance between three fundamental trade-offs. By adaptively tuning these trade-offs, ATM can then provide excellent optimization performance over a broad class of response surfaces using limited data. We illustrate the effectiveness of ATM using several numerical examples, and demonstrate how such a method can be used to solve two real-world engineering design problems.
Analytic Hierarchy Process
The analytic hierarchy process (AHP) is a structured technique for organizing and analyzing complex decisions, based on mathematics and psychology. It was developed by Thomas L. Saaty in the 1970s and has been extensively studied and refined since then. It has particular application in group decision making, and is used around the world in a wide variety of decision situations, in fields such as government, business, industry, healthcare, shipbuilding and education. Rather than prescribing a ‘correct’ decision, the AHP helps decision makers find one that best suits their goal and their understanding of the problem. It provides a comprehensive and rational framework for structuring a decision problem, for representing and quantifying its elements, for relating those elements to overall goals, and for evaluating alternative solutions. Users of the AHP first decompose their decision problem into a hierarchy of more easily comprehended sub-problems, each of which can be analyzed independently. The elements of the hierarchy can relate to any aspect of the decision problem-tangible or intangible, carefully measured or roughly estimated, well or poorly understood-anything at all that applies to the decision at hand. Once the hierarchy is built, the decision makers systematically evaluate its various elements by comparing them to each other two at a time, with respect to their impact on an element above them in the hierarchy. In making the comparisons, the decision makers can use concrete data about the elements, but they typically use their judgments about the elements’ relative meaning and importance. It is the essence of the AHP that human judgments, and not just the underlying information, can be used in performing the evaluations. The AHP converts these evaluations to numerical values that can be processed and compared over the entire range of the problem. A numerical weight or priority is derived for each element of the hierarchy, allowing diverse and often incommensurable elements to be compared to one another in a rational and consistent way. This capability distinguishes the AHP from other decision making techniques. In the final step of the process, numerical priorities are calculated for each of the decision alternatives. These numbers represent the alternatives’ relative ability to achieve the decision goal, so they allow a straightforward consideration of the various courses of action.
Analytical Data Mart Data Mart is a table or a collection of tables containing only the information which the analysts require to do their job. This data is pulled from multiple sources, processed in a uniform manner, documented and optimized. Data Marts frequently contain data aggregated on the customer level, such as the average number of transactions in last 6 months, number of cash loans drawn by the client during the last 12 months, etc. When the computed aggregate values are available, it is much easier to prepare reports. Data Marts are built only once, at the start of the analytical process, but they are cyclically and automatically updated, so as to contain all the relevant information pertaining to customers/products/transactions in a given period of time.
“Analytics Data Store”
Analytics Analytics is the discovery and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics, computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight.
Analytics Data Store
The ADS (which can be a distributed data store on a Cloud somewhere) supports the data requirements of statistical analysis. Here the data is organised, structured, integrated and enriched to meet the ongoing and occasionally volatile needs of the statisticians and data scientists focusing on data mining. Data in the ADS can be accumulative or completely refreshed. It can have a short life span or have a significantly long life-time. The ADS is the logistics centre for analytics data. Not only can it be used to provide data into the statistical analysis process, but it can also be used to provide persistent long term storage for analysis outcomes and scenarios, and for future analysis, hence the ability to ‘write back’. The data and information in the ADS may also be augmented with data derived from data stored in the data warehouse, it may also benefit from having its own dedicated Data Mart specifically designed for this purpose. Results of statistical analysis on the ADS data may also result in feedback being used to tune the data reduction, filtering and enrichment rules further downstream, either in smart data analytics, complex event and discrimination adapters or in ET(AL) job streams.
Anchored Bayesian Gaussian Mixture Model Finite Gaussian mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We present an alternative to the exchangeable prior: by assuming that a small number of latent class labels are known a priori, we can make inference on the component features without post-processing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a data-dependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide practical guidelines for model selection that are motivated by maximizing prior information about the class labels and we demonstrate our method on real and simulated data.
AnchorNet Despite significant progress of deep learning in recent years, state-of-the-art semantic matching methods still rely on legacy features such as SIFT or HoG. We argue that the strong invariance properties that are key to the success of recent deep architectures on the classification task make them unfit for dense correspondence tasks, unless a large amount of supervision is used. In this work, we propose a deep network, termed AnchorNet, that produces image representations that are well-suited for semantic matching. It relies on a set of filters whose response is geometrically consistent across different object instances, even in the presence of strong intra-class, scale, or viewpoint variations. Trained only with weak image-level labels, the final representation successfully captures information about the object structure and improves results of state-of-the-art semantic matching methods such as the deformable spatial pyramid or the proposal flow methods. We show positive results on the cross-instance matching task where different instances of the same object category are matched as well as on a new cross-category semantic matching task aligning pairs of instances each from a different object class.
Ancillary Statistic In statistics, an ancillary statistic is a statistic whose sampling distribution does not depend on the parameters of the model. An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to construct prediction intervals. This concept was introduced by the statistical geneticist Sir Ronald Fisher.
https://… Lesson of the Day – Ancillary Statistics
Angle-Based Outlier Detection
Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All existing approaches, however, are based on an assessment of distances (sometimes indirectly by assuming certain distributions) in the full-dimensional Euclidean data space. In high-dimensional data, these approaches are bound to deteriorate due to the notorious ‘curse of dimensionality’. In this paper, we propose a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points. This way, the effects of the ‘curse of dimensionality’ are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking. In a thorough experimental evaluation, we compare ABOD to the well-established distance-based method LOF for various artificial and a real world data set and show ABOD to perform especially well on high-dimensional data.
Annabell A cognitive neural architecture able to learn and communicate through natural language
ANN-Benchmarks This paper describes ANN-Benchmarks, a tool for evaluating the performance of in-memory approximate nearest neighbor algorithms. It provides a standard interface for measuring the performance and quality achieved by nearest neighbor algorithms on different standard data sets. It supports several different ways of integrating $k$-NN algorithms, and its configuration system automatically tests a range of parameter settings for each algorithm. Algorithms are compared with respect to many different (approximate) quality measures, and adding more is easy and fast; the included plotting front-ends can visualise these as images, $\LaTeX$ plots, and websites with interactive plots. ANN-Benchmarks aims to provide a constantly updated overview of the current state of the art of $k$-NN algorithms. In the short term, this overview allows users to choose the correct $k$-NN algorithm and parameters for their similarity search task; in the longer term, algorithm designers will be able to use this overview to test and refine automatic parameter tuning. The paper gives an overview of the system, evaluates the results of the benchmark, and points out directions for future work. Interestingly, very different approaches to $k$-NN search yield comparable quality-performance trade-offs. The system is available at
Annealed Generative Adversarial Networks We introduce a novel framework for adversarial training where the target distribution is annealed between the uniform distribution and the data distribution. We posited a conjecture that learning under continuous annealing in the nonparametric regime is stable irrespective of the divergence measures in the objective function and proposed an algorithm, dubbed {\ss}-GAN, in corollary. In this framework, the fact that the initial support of the generative network is the whole ambient space combined with annealing are key to balancing the minimax game. In our experiments on synthetic data, MNIST, and CelebA, {\ss}-GAN with a fixed annealing schedule was stable and did not suffer from mode collapse.
ANNETT-O Deep learning models, while effective and versatile, are becoming increasingly complex, often including multiple overlapping networks of arbitrary depths, multiple objectives and non-intuitive training methodologies. This makes it increasingly difficult for researchers and practitioners to design, train and understand them. In this paper we present ANNETT-O, a much-needed, generic and computer-actionable vocabulary for researchers and practitioners to describe their deep learning configurations, training procedures and experiments. The proposed ontology focuses on topological, training and evaluation aspects of complex deep neural configurations, while keeping peripheral entities more succinct. Knowledge bases implementing ANNETT-O can support a wide variety of queries, providing relevant insights to users. In addition to a detailed description of the ontology, we demonstrate its suitability to the task via a number of hypothetical use-cases of increasing complexity.
Annotation Chart Annotation charts are interactive time series line charts that support annotations.
Annotation Query Language
Annotation Query Language (AQL) is the language for developing text analytics extractors in the InfoSphere BigInsights Text Analytics system. An extractor is a program written in AQL that extracts structured information from unstructured or semistructured text. AQL is a declarative language. The syntax of AQL is similar to that of Structured Query Language (SQL), but with several important differences.
Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.
Anomaly Detection In data mining, anomaly detection (or outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or finding errors in text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.
Anonymous Information Delivery
We introduce the problem of anonymous information delivery (AID), comprised of $K$ messages, a user, and $N$ servers (each holds $M$ messages) that wish to deliver one out of $K$ messages to the user anonymously, i.e., without revealing the delivered message index to the user. This AID problem may be viewed as the dual of the private information retrieval problem. The information theoretic capacity of AID, $C$, is defined as the maximum number of bits of the desired message that can be anonymously delivered per bit of total communication to the user. For the AID problem with $K$ messages, $N$ servers, $M$ messages stored per server, and $N \geq \lceil \frac{K}{M} \rceil$, we provide an achievable scheme of rate $1/\lceil \frac{K}{M} \rceil$ and an information theoretic converse of rate $M/K$, i.e., the AID capacity satisfies $1/\lceil \frac{K}{M} \rceil \leq C \leq M/K$. This settles the capacity of AID when $\frac{K}{M}$ is an integer. When $\frac{K}{M}$ is not an integer, we show that the converse rate of $M/K$ is achievable if $N \geq \frac{K}{\gcd(K,M)} – (\frac{M}{\gcd(K,M)}-1)(\lfloor \frac{K}{M} \rfloor -1)$, and the achievable rate of $1/\lceil \frac{K}{M} \rceil$ is optimal if $N = \lceil \frac{K}{M} \rceil$. Otherwise if $\lceil \frac{K}{M} \rceil < N < \frac{K}{\gcd(K,M)} – (\frac{M}{\gcd(K,M)}-1)(\lfloor \frac{K}{M} \rfloor -1)$, we give an improved achievable scheme and prove its optimality for several small settings.
ANOVA Kernel Regression High-order parametric models that include terms for feature interactions are applied to various data min- ing tasks, where ground truth depends on interactions of features. However, with sparse data, the high- dimensional parameters for feature interactions often face three issues: expensive computation, difficulty in parameter estimation and lack of structure. Previous work has proposed approaches which can partially re- solve the three issues. In particular, models with fac- torized parameters (e.g. Factorization Machines) and sparse learning algorithms (e.g. FTRL-Proximal) can tackle the first two issues but fail to address the third. Regarding to unstructured parameters, constraints or complicated regularization terms are applied such that hierarchical structures can be imposed. However, these methods make the optimization problem more challeng- ing. In this work, we propose Strongly Hierarchical Factorization Machines and ANOVA kernel regression where all the three issues can be addressed without making the optimization problem more difficult. Ex- perimental results show the proposed models signifi- cantly outperform the state-of-the-art in two data min- ing tasks: cold-start user response time prediction and stock volatility prediction.
Anscombe’s Quartet Anscombe’s quartet comprises four datasets that have nearly identical simple statistical properties, yet appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties.


Answer Set Programming
Answer set programming (ASP) is a form of declarative programming oriented towards difficult (primarily NP-hard) search problems. It is based on the stable model (answer set) semantics of logic programming. In ASP, search problems are reduced to computing stable models, and answer set solvers – programs for generating stable models – are used to perform search. The computational process employed in the design of many answer set solvers is an enhancement of the DPLL algorithm and, in principle, it always terminates (unlike Prolog query evaluation, which may lead to an infinite loop). In a more general sense, ASP includes all applications of answer sets to knowledge representation and the use of Prolog-style query evaluation for solving problems arising in these applications.
Answer Set Programming – Reinforcement Learning
Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains.
Ant Colony Optimization
In computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it constitutes some metaheuristic optimizations. Initially proposed by Marco Dorigo in 1992 in his PhD thesis, the first algorithm was aiming to search for an optimal path in a graph, based on the behavior of ants seeking a path between their colony and a source of food. The original idea has since diversified to solve a wider class of numerical problems, and as a result, several problems have emerged, drawing on various aspects of the behavior of ants.
Anticipatory Analytics In fact, the application of predictive analytics to intelligence often focuses on what is missing in the data or how certain observations diverge from the expected norm. Some experts insist on calling this process not predictive analytics but anticipatory analytics, in order to distinguish it from similar processes in other realms.
Any-k Ranking Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called ‘heterogeneous information networks’ or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the top-k matches ac- cording to a ranking function over edge and node weights. For users, it is di cult to select value k . We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, re- turn as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continues until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.
AOGParsing Operator This paper presents a method of learning qualitatively interpretable models in object detection using popular two-stage region-based ConvNet detection systems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI (Region-of-Interest) prediction network.By interpretable models, we focus on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. We utilize a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of RoIs. We propose an AOGParsing operator to substitute the RoIPooling operator widely used in R-CNN, so the proposed method is applicable to many state-of-the-art ConvNet based detection systems. The AOGParsing operator aims to harness both the explainable rigor of top-down hierarchical and compositional grammar models and the discriminative power of bottom-up deep neural networks through end-to-end training. In detection, a bounding box is interpreted by the best parse tree derived from the AOG on-the-fly, which is treated as the extractive rationale generated for interpreting detection. In learning, we propose a folding-unfolding method to train the AOG and ConvNet end-to-end. In experiments, we build on top of the R-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets with performance comparable to state-of-the-art methods.
Apache Accumulo The Apache Accumulo sorted, distributed key/value store is based on Google’s BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process.
Apache Ambari Apache Ambari makes Hadoop cluster provisioning, managing, and monitoring dead simple.
Apache Apex 1. Enterprise Grade: Apex is a Hadoop YARN native platform that unifies stream and batch processing. It processes big data in-motion in a way that is highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and easily operable.
2. Low Barrier-to-Entry: Write your business logic and leave all operability to the platform. It provides a simple API that enables developers to write or re-use generic Java code, thereby lowering the expertise needed to write big data applications.
3. Modular: The Apex platform comes with Malhar, a library of operators (units of functionality) that can be leveraged to quickly create non-trivial applications. Includes many connectors for messaging systems, databases, files etc.
Apache Avro Apache Avro is a data serialization system.
Apache Beam Apache Beam provides an advanced unified programming model, allowing you to implement batch and streaming data processing jobs that can run on any execution engine. Apache Beam is:
· UNIFIED – Use a single programming model for both batch and streaming use cases.
· PORTABLE – Execute pipelines on multiple execution environments, including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.
· EXTENSIBLE – Write and share new SDKs, IO connectors, and transformation libraries.
Apache Bigtop Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem. The primary goal of Bigtop is to build a community around the packaging and interoperability testing of Hadoop-related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc…) developed by a community with a focus on the system as a whole, rather than individual projects. In short we strive to be for Hadoop what Debian is to Linux.
Apache Calcite Apache Calcite is a Dynamic Data Management Framework. It Contains Many of the Pieces That Comprise a Typical Database Management System, but Omits Some key Functions: Storage of Data, Algorithms to Process Data, and a Repository for Storing Metadata. Calcite Intentionally Stays out of the Business of Storing and Processing Data. As we Shall See, This Makes it an Excellent Choice for Mediating Between Applications and one or More Data Storage Locations and Data Processing Engines. It is Also a Perfect Foundation for Building a Database: Just add Data.
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
Apache Cassandra / Cassandra Query Language
Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra also places a high value on performance. In 2012, University of Toronto researchers studying NoSQL systems concluded that “In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments.”
Apache Chukwa Chukwa is an open source data collection system for monitoring large distributed systems. Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying, monitoring and analyzing results to make the best use of the collected data.
Apache Clerezza Clerezza allows to easily develop semantic web applications by providing tools to manipulate RDF data, create RESTful Web Services and Renderlets using ScalaServerPages. Contents are stored as triples based on W3C RDF specification. These triples are stored via Clerezza’s Smart Content Binding (SCB). SCB defines a technology-agnostic layer to access and modify triple stores. It provides a java implementation of the graph data model specified by W3C RDF and functionalities to operate on that data model. SCB offers a service interface to access multiple named graphs and it can use various providers to manage RDF graphs in a technology specific manner, e.g., using Jena or Sesame. It also provides for adaptors that allow an application to use various APIs (including the Jena api) to process RDF graphs. Furthermore, SCB offers a serialization and a parsing service to convert a graph into a certain representation (format) and vice versa.
Apache CloudStack Apache CloudStack is open source software designed to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. CloudStack is used by a number of service providers to offer public cloud services, and by many companies to provide an on-premises (private) cloud offering, or as part of a hybrid cloud solution. CloudStack is a turnkey solution that includes the entire “stack” of features most organizations want with an IaaS cloud: compute orchestration, Network-as-a-Service, user and account management, a full and open native API, resource accounting, and a first-class User Interface (UI). CloudStack currently supports the most popular hypervisors: VMware, KVM, XenServer and Xen Cloud Platform (XCP). Users can manage their cloud with an easy to use Web interface, command line tools, and / or a full-featured RESTful API. In addition, CloudStack provides an API that’s compatible with AWS EC2 and S3 for organizations that wish to deploy hybrid clouds.
Apache Commons Mathematics Library Commons Math is a library of lightweight, self-contained mathematics and statistics components addressing the most common problems not available in the Java programming language or Commons Lang.
Apache Crunch The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run. Running on top of Hadoop MapReduce and Apache Spark, the Apache Crunch library is a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce. The APIs are especially useful when processing data that does not fit naturally into relational model, such as time series, serialized object formats like protocol buffers or Avro records, and HBase rows and columns. For Scala users, there is the Scrunch API, which is built on top of the Java APIs and includes a REPL (read-eval-print loop) for creating MapReduce pipelines.
Apache Curator New users of ZooKeeper are surprised to learn that a significant amount of connection management must be done manually. For example, when the ZooKeeper client connects to the ensemble it must negotiate a new session, etc. This takes some time. If you use a ZooKeeper client API before the connection process has completed, ZooKeeper will throw an exception. These types of exceptions are referred to as “recoverable” errors. Curator automatically handles connection management, greatly simplifying client code. Instead of directly using the ZooKeeper APIs you use Curator APIs that internally check for connection completion and wrap each ZooKeeper API in a retry loop. Curator uses a retry mechanism to handle recoverable errors and automatically retry operations. The method of retry is customizable. Curator comes bundled with several implementations (ExponentialBackoffRetry, etc.) or custom implementations can be written.
Apache DataFu Apache DataFu consists of two libraries: Apache DataFu Pig is a collection of useful user-defined functions for data analysis in Apache Pig. Apache DataFu Hourglass is a library for incrementally processing data using Apache Hadoop MapReduce. This library was inspired by the prevelance of sliding window computations over daily tracking data. Computations such as these typically happen at regular intervals (e.g. daily, weekly), and therefore the sliding nature of the computations means that much of the work is unnecessarily repeated. DataFu’s Hourglass was created to make these computations more efficient, yielding sometimes 50-95% reductions in computational resources.
Apache Drill Apache Drill is an open source, low latency SQL query engine for Hadoop and NoSQL. It is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google’s Dremel.
Apache Falcon Apache Falcon is a data processing and management solution for Hadoop designed for data motion, coordination of data pipelines, lifecycle management, and data discovery. Falcon enables end consumers to quickly onboard their data and its associated processing and management tasks on Hadoop clusters.
Apache Flink Apache Flink is an open source platform for scalable batch and stream data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink includes several APIs for creating applications that use the Flink engine:
1. DataSet API for static data embedded in Java, Scala, and Python,
2. DataStream API for unbounded streams embedded in Java and Scala, and
3. Table API with a SQL-like expression language embedded in Java and Scala.
Flink also bundles libraries for domain-specific use cases:
1. Machine Learning library, and
2. Gelly, a graph processing API and library.
You can integrate Flink easily with other well-known open source systems both for data input and output as well as deployment.
Apache Flume Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store
Apache Forrest Apache Forrest software is a publishing framework that transforms input from various sources into a unified presentation in one or more output formats. The modular and extensible plug-in architecture of Apache Forrest is based on Apache Cocoon and the relevant industry standards that separate presentation from content. Forrest can generate static documents, or be used as a dynamic server, or be deployed by its automated facility.
Apache Gearpump Apache Gearpump is a real-time big data streaming engine. The name Gearpump is a reference to the engineering term ‘gear pump’ which is a super simple pump that consists of only two gears, but is very powerful at streaming water. Different to other streaming engines, Gearpump’s engine is event/message based. Per initial benchmarks we are able to process 18 million messages per second (message length is 100 bytes) with a 8ms latency on a 4-node cluster.
Apache Giraph Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections.
Apache Hadoop Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0.
The Apache Hadoop framework is composed of the following modules:
· Hadoop Common – contains libraries and utilities needed by other Hadoop modules
· Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.
· Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users’ applications.
· Hadoop MapReduce – a programming model for large scale data processing.
Hadoop is being regarded as one of the best platforms for storing and managing big data. It owes its success to its high data storage and processing scalability, low price/performance ratio, high performance, high availability, high schema flexibility, and its capability to handle all types of data.
Apache Hama The Apache Hama is an efficient and scalable general-purpose BSP computing engine which can be used to speed up a large variety of compute-intensive analytics applications.
Apache HBase Use Apache HBase software when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
Apache Hive The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides * tools to enable easy data extract/transform/load (ETL) * a mechanism to impose structure on a variety of data formats * access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM) * query execution via MapReduce Hive defines a simple SQL-like query language, called HiveQL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language. HiveQL can also be extended with custom scalar functions (UDF’s), aggregations (UDAF’s), and table functions (UDTF’s).
Apache Ignite Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies. Apache Ignite In-Memory Data Fabric is designed to deliver uncompromised performance for a wide set of in-memory computing use cases from high performance computing, to the industry most advanced data grid, highly available service grid, and streaming.
Apache Jena Apache Jena provides a complete framework for building Semantic Web and Linked Data applications in Java, and provides: parsers for RDF/XML, Turtle and N-triples; a Java programming API; a complete implementation of the SPARQL query language; a rule-based inference engine for RDFS and OWL entailments; TDB (a non-SQL persistent triple store); SDB (a persistent triples store built on a relational store) and Fuseki, an RDF server using web protocols. Jena complies with all relevant recommendations for RDF and related technologies from the W3C.
Apache Kafka Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. The design is heavily influenced by transactions logs.
Apache Lucene The goal of Apache Lucene and Solr is to provide world class search capabilities. The Apache Lucene project develops open-source search software, including:
· Lucene Core, our flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
· Solr is a high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface.
· Open Relevance Project is a subproject with the aim of collecting and distributing free materials for relevance testing and performance.
· PyLucene is a Python port of the Core project.
Apache Lucy The Apache Lucy search engine library provides full-text search for dynamic programming languages.
Apache Mahout Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification. Many of the implementations use the Apache Hadoop platform. Mahout also provides Java libraries for common math operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; the number of implemented algorithms has grown quickly, but various algorithms are still missing.
Apache NiFi Put simply NiFi was built to automate the flow of data between systems. While the term ‘dataflow’ is used in a variety of contexts, we use it here to mean the automated and managed flow of information between systems. This problem space has been around ever since enterprises had more than one system, where some of the systems created data and some of the systems consumed data. The problems and solution patterns that emerged have been discussed and articulated extensively. A comprehensive and readily consumed form is found in the Enterprise Integration Patterns [eip].
Apache Object Oriented Data Technology
Metadata for middleware (and vice versa):
· Transparent access to distributed resources
· Data discovery and query optimization
· Distributed processing and virtual archives
But it’s not just for science! It’s also a software architecture:
· Models for information representation
· Solutions to knowledge capture problems
· Unification of technology, data, and metadata
Apache Oozie Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
Apache OpenNLP Apache OpenNLP software supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning..
Apache Phoenix Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the noSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL. Phoenix compiles queries and other statements into native noSQL store APIs rather than using MapReduce enabling the building of low latency applications on top of noSQL stores.
Apache Pig Pig is a high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF (User Defined Functions) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language.
Pig was originally developed at Yahoo Research around 2006 for researchers to have an ad-hoc way of creating and executing map-reduce jobs on very large data sets. In 2007, it was moved into the Apache Software Foundation.
Apache Pulsar Pulsar is a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.
Apache REEF REEF, (Retainable Evaluator Execution Framework), is our approach to simplify and unify the lower layers of big data systems on modern resource managers. For managers like Apache YARN, Apache Mesos, Google Omega, and Facebook Corona, REEF provides a centralized control plane abstraction that can be used to build a decentralized data plane for supporting big data systems. Special consideration is given to graph computation and machine learning applications, both of which require data retention on allocated resources to execute multiple passes over the data. More broadly, applications that run on YARN will have the need for a variety of data-processing tasks e.g., data shuffle, group communication, aggregation, checkpointing, and many more. Rather than reimplement these for each application, REEF aims to provide them in a library form, so that they can be reused by higher-level applications and tuned for a specific domain problem e.g., Machine Learning. In that sense, our long-term vision is that REEF will mature into a Big Data Application Server, that will host a variety of tool kits and applications, on modern resource managers.
Apache S4
S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. S4 fills the gap between complex proprietary systems and batch-oriented open source computing platforms. We aim to develop a high performance computing platform that hides the complexity inherent in parallel processing system from the application programmer. The core platform is written in Java. The implementation is modular and pluggable, and S4 applications can be easily and dynamically combined for creating more sophisticated stream processing systems.
Apache SAMOA
Apache SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage, and analyze, due to the time and memory complexity. Apache SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Apache Flink, Apache Storm, and Apache Samza. Apache SAMOA is written in Java and is available at under the Apache Software License version 2.0.
Large-Scale Learning from Data Streams with Apache SAMOA
Apache Samza Apache Samza is a distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.
Apache SINGA SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. A variety of popular deep learning models are supported, namely feed-forward models including convolutional neural networks (CNN), energy models like restricted Boltzmann machine (RBM), and recurrent neural networks (RNN). Many built-in layers are provided for users. SINGA architecture is sufficiently flexible to run synchronous, asynchronous and hybrid training frameworks. SINGA also supports different neural net partitioning schemes to parallelize the training of large models, namely partitioning on batch dimension, feature dimension or hybrid partitioning.
Apache Spark
Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce, for certain applications. Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms.
Apache Spatial Information System Apache SIS provides data structures for geographic data and associated metadata along with methods to manipulate those data structures. The library is an implementation of GeoAPI interfaces and can be used for desktop or server applications.
Apache Sqoop Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Apache Stanbol Apache Stanbol provides a set of reusable components for semantic content management. Apache Stanbol’s intended use is to extend traditional content management systems with semantic services. Other feasible use cases include: direct usage from web applications (e.g. for tag extraction/suggestion; or text completion in search fields), ‘smart’ content workflows or email routing based on extracted entities, topics, etc Apache Stanbol – the Semantic Engine. In order to be used as a semantic engine via its services, all components offer their functionalities in terms of a RESTful web service API. Apache Stanbol’s main features are:
· Content Enhancement: Services that add semantic information to ‘non-semantic’ pieces of content.
· Reasoning: Services that are able to retrieve additional semantic information about the content based on the semantic information retrieved via content enhancement.
· Knowledge Models: Services that are used to define and manipulate the data models (e.g. ontologies) that are used to store the semantic information.
· Persistence: Services that store (or cache) semantic information, i.e. enhanced content, entities, facts, and make it searchable.
Apache Storm
Storm is a distributed computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. It uses custom created “spouts” and “bolts” to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on September 17, 2011.
A Storm application is designed as a topology of interfaces which create a “stream” of transformations. It provides similar functionality as a MapReduce job with the exception that the topology will theoretically run indefinitely until it is manually terminated.
In 2013 the Apache Software Foundation has accepted Storm into its incubator program.
Apache Tajo A big data warehouse system on Hadoop. Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources. By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities.
Apache Texen Texen is a general purpose text generating utility. It is capable of producing almost any sort of text output. Driven by Ant, essentially an Ant Task, Texen uses a control template, an optional set of worker templates, and control context to govern the generated output. Although TexenTask can be used directly, it is usually subclassed to initialize your control context before generating any output.
Apache Tez Apache Tez is an extensible framework for building high performance batch and interactive data processing applications, coordinated by YARN in Apache Hadoop. Tez improves the MapReduce paradigm by dramatically improving its speed, while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and Apache Pig use Apache Tez, as do a growing number of third party data access applications developed for the broader Hadoop ecosystem.
Apache Tika The Apache Tika toolkit detects and extracts metadata and text content from various documents – from PPT to CSV to PDF – using existing parser libraries. Tika unifies these parsers under a single interface to allow you to easily parse over a thousand different file types. Tika is useful for search engine indexing, content analysis, translation, and much more.
Apache Twill Apache Twill is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed applications, allowing developers to focus more on their application logic. Apache Twill allows you to use YARN’s distributed capabilities with a programming model that is similar to running threads.
Apache UIMA Unstructured Information Management Applications (UIMA) are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA is made of many things UIMA enables applications to be decomposed into components, for example ‘language identification’ => ‘language specific segmentation’ => ‘sentence boundary detection’ => ‘entity detection (person/place names etc.)’. Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes. Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS , a standards organization).
Apache Yarn
We present the next generation of Hadoop compute platform known as YARN, which departs from its familiar, monolithic architecture. By separating resource management functions from the programming model, YARN delegates many scheduling-related functions to per-job components. In this new context, MapReduce is just one of the applications running on top of YARN. This separation provides a great deal of flexibility in the choice of programming framework. Examples of alternative programming models that are becoming available on YARN are: Dryad, Giraph, Hoya, REEF, Spark, Storm and Tez.
Apache Zeppelin A web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.
Apache ZooKeeper Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly reliable distributed coordination.
Application Function Library
SAP HANA Library References: The SAP HANA Business Function Library (BFL) Reference describes the Business Function Library (BFL) delivered with SAP HANA. This application function library (AFL) contains pre-built financial functions implemented in C++. The SAP HANA Predictive Analysis Library (PAL) Reference describes the Predictive Analysis Library (PAL) delivered with SAP HANA. This application function library (AFL) defines functions that can be called from within SAP HANA SQLScript procedures to perform analytic algorithms.
Application Function Modeler
The Application Function Modeler 2.0 (AFM 2) is a graphical editor for complex data analysis pipelines in the HANA Studio. This tool is based on the HANA Data Scientist prototype developed at the HANA Platform Innovation Center in Potsdam, Germany. It is planned to be the next generation of the existing HANA Studio Application Function Modeler which was developed at the TIP CE&SP Algorithm Labs in Shanghai, China. The AFM 2 team consists of original and new developers from both locations.
Applied Mathematics Applied mathematics is a branch of mathematics that deals with mathematical methods that find use in science, engineering, business, computer science, and industry. Thus, ‘applied mathematics’ is a mathematical science with specialized knowledge. The term ‘applied mathematics’ also describes the professional specialty in which mathematicians work on practical problems by formulating and studying mathematical models. In the past, practical applications have motivated the development of mathematical theories, which then became the subject of study in pure mathematics where abstract concepts are studied for their own sake. The activity of applied mathematics is thus intimately connected with research in pure mathematics.
Applied Singular Spectrum Analysis
Functions to model and decompose time series into principal components using singular spectrum analysis (de Carvalho and Rua (2017) <doi:10.1016/j.ijforecast.2015.09.004>; de Carvalho et al (2012) <doi:10.1016/j.econlet.2011.09.007>).
Approximate Bayesian Computation
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences, e.g. in population genetics, ecology, epidemiology, and systems biology.
Overview of Approximate Bayesian Computation
Approximate Exploration Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call \emph{approximate exploration}. We first provide results when the approximation is explicit, quantifying the performance of an exploration algorithm, MBIE-EB \citep{strehl2008analysis}, when combined with state aggregation. In particular, we show that this allows the agent to trade off between learning speed and quality of the policy learned. We then turn to a successful exploration scheme in practical, pseudo-count based exploration bonuses \citep{bellemare2016unifying}. We show that choosing a density model implicitly defines an abstraction and that the pseudo-count bonus incentivizes the agent to explore using this abstraction. We find, however, that implicit exploration may result in a mismatch between the approximated value function and exploration bonus, leading to either under- or over-exploration.
Approximate Leave-One-Out Cross Validation
We consider the parametric learning problem, where the objective of the learner is determined by a parametric loss function. Employing empirical risk minimization with possibly regularization, the inferred parameter vector will be biased toward the training samples. Such bias is measured by the cross validation procedure in practice where the data set is partitioned into a training set used for training and a validation set, which is not used in training and is left to measure the out-of-sample performance. A classical cross validation strategy is the leave-one-out cross validation (LOOCV) where one sample is left out for validation and training is done on the rest of the samples that are presented to the learner, and this process is repeated on all of the samples. LOOCV is rarely used in practice due to the high computational complexity. In this paper, we first develop a computationally efficient approximate LOOCV (ALOOCV) and provide theoretical guarantees for its performance. Then we use ALOOCV to provide an optimization algorithm for finding the regularizer in the empirical risk minimization framework. In our numerical experiments, we illustrate the accuracy and efficiency of ALOOCV as well as our proposed framework for the optimization of the regularizer.
Approximate Message Passing
The standard linear regression (SLR) problem is to recover a vector $\mathbf{x}^0$ from noisy linear observations $\mathbf{y}=\mathbf{Ax}^0+\mathbf{w}$. The approximate message passing (AMP) algorithm recently proposed by Donoho, Maleki, and Montanari is a computationally efficient iterative approach to SLR that has a remarkable property: for large i.i.d.\ sub-Gaussian matrices $\mathbf{A}$, its per-iteration behavior is rigorously characterized by a scalar state-evolution whose fixed points, when unique, are Bayes optimal. AMP, however, is fragile in that even small deviations from the i.i.d.\ sub-Gaussian model can cause the algorithm to diverge. This paper considers a ‘vector AMP’ (VAMP) algorithm and shows that VAMP has a rigorous scalar state-evolution that holds under a much broader class of large random matrices $\mathbf{A}$: those that are right-rotationally invariant. After performing an initial singular value decomposition (SVD) of $\mathbf{A}$, the per-iteration complexity of VAMP can be made similar to that of AMP. In addition, the fixed points of VAMP’s state evolution are consistent with the replica prediction of the minimum mean-squared error recently derived by Tulino, Caire, Verd\’u, and Shamai. The effectiveness and state evolution predictions of VAMP are confirmed in numerical experiments.
Approximate Nearest Neighbor
In some applications it may be acceptable to retrieve a ‘good guess’ of the nearest neighbor. In those cases, we can use an algorithm which doesn’t guarantee to return the actual nearest neighbor in every case, in return for improved speed or memory savings. Often such an algorithm will find the nearest neighbor in a majority of cases, but this depends strongly on the dataset being queried. Algorithms that support the approximate nearest neighbor search include locality-sensitive hashing, best bin first and balanced box-decomposition tree based search.
Approximate Nearest Neighbor Search in High Dimensions
Approximate Query Processing
While Big Data opens the possibility of gaining unprecedented insights, it comes at the price of increased need for computational resources (or risk of higher latency) for answering queries over voluminous data. The ability to provide approximate answers to queries at a fraction of the cost of executing the query in the traditional way, has the disruptive potential of allowing us to explore large datasets efficiently. Specifically, such techniques could prove effective in helping data scientists identify the subset of data that needs further drill-down, discovering trends, and enabling fast visualization. In this article, we will focus on approximate query processing schemes that are based on sampling techniques. An approximate query processing (AQP) scheme can be characterized by the generality of the query language it supports, its error model and accuracy guarantee, the amount of work saved at runtime, and the amount of additional resources it requires in precomputation. These dimensions of an AQP scheme are not independent and much of the past work makes specific choices along the above four dimensions. Specifically, it seems impossible to have an AQP system that supports the richness of SQL with significant saving of work while providing an accuracy guarantee that is acceptable to a broad set of application workloads. Put another way, you cannot have it all.
MISS: Finding Optimal Sample Sizes for Approximate Analytics
Approximate Random Dropout The training phases of Deep neural network (DNN) consume enormous processing time and energy. Compression techniques for inference acceleration leveraging the sparsity of DNNs, however, can be hardly used in the training phase. Because the training involves dense matrix-multiplication using GPGPU, which endorse regular and structural data layout. In this paper, we exploit the sparsity of DNN resulting from the random dropout technique to eliminate the unnecessary computation and data access for those dropped neurons or synapses in the training phase. Experiments results on MLP and LSTM on standard benchmarks show that the proposed Approximate Random Dropout can reduce the training time by half on average with ignorable accuracy loss.
Approximate Semantic Matching Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over a structured representation of Wikipedia and Freebase events. Initial evaluations show that the approach matches events with a maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach.
Towards an Inexact Semantic Complex Event Processing Framework
Approximate Survey Propagation
Approximate message passing algorithm enjoyed considerable attention in the last decade. In this paper we introduce a variant of the AMP algorithm that takes into account glassy nature of the system under consideration. We coin this algorithm as the approximate survey propagation (ASP) and derive it for a class of low-rank matrix estimation problems. We derive the state evolution for the ASP algorithm and prove that it reproduces the one-step replica symmetry breaking (1RSB) fixed-point equations, well-known in physics of disordered systems. Our derivation thus gives a concrete algorithmic meaning to the 1RSB equations that is of independent interest. We characterize the performance of ASP in terms of convergence and mean-squared error as a function of the free Parisi parameter s. We conclude that when there is a model mismatch between the true generative model and the inference model, the performance of AMP rapidly degrades both in terms of MSE and of convergence, while ASP converges in a larger regime and can reach lower errors. Among other results, our analysis leads us to a striking hypothesis that whenever s (or other parameters) can be set in such a way that the Nishimori condition $M=Q>0$ is restored, then the corresponding algorithm is able to reach mean-squared error as low as the Bayes-optimal error obtained when the model and its parameters are known and exactly matched in the inference procedure.
Approximately Binary Clamping
Although traditionally binary visual representations are mainly designed to reduce computational and storage costs in the image retrieval research, this paper argues that binary visual representations can be applied to large scale recognition and detection problems in addition to hashing in retrieval. Furthermore, the binary nature may make it generalize better than its real-valued counterparts. Existing binary hashing methods are either two-stage or hinging on loss term regularization or saturated functions, hence converge slowly and only emit soft binary values. This paper proposes Approximately Binary Clamping (ABC), which is non-saturating, end-to-end trainable, with fast convergence and can output true binary visual representations. ABC achieves comparable accuracy in ImageNet classification as its real-valued counterpart, and even generalizes better in object detection. On benchmark image retrieval datasets, ABC also outperforms existing hashing methods.
Approximation Tree This paper examines the stability of learned explanations for black-box predictions via model distillation with decision trees. One approach to intelligibility in machine learning is to use an understandable `student’ model to mimic the output of an accurate `teacher’. Here, we consider the use of regression trees as a student model, in which nodes of the tree can be used as `explanations’ for particular predictions, and the whole structure of the tree can be used as a global representation of the resulting function. However, individual trees are sensitive to the particular data sets used to train them, and an interpretation of a student model may be suspect if small changes in the training data have a large effect on it. In this context, access to outcomes from a teacher helps to stabilize the greedy splitting strategy by generating a much larger corpus of training examples than was originally available. We develop tests to ensure that enough examples are generated at each split so that the same splitting rule would be chosen with high probability were the tree to be re trained. Further, we develop a stopping rule to indicate how deep the tree should be built based on recent results on the variability of Random Forests when these are used as the teacher. We provide concrete examples of these procedures on the CAD-MDD and COMPAS data sets.
APRIL We propose a method to perform automatic document summarisation without using reference summaries. Instead, our method interactively learns from users’ preferences. The merit of preference-based interactive summarisation is that preferences are easier for users to provide than reference summaries. Existing preference-based interactive learning methods suffer from high sample complexity, i.e. they need to interact with the oracle for many rounds in order to converge. In this work, we propose a new objective function, which enables us to leverage active learning, preference learning and reinforcement learning techniques in order to reduce the sample complexity. Both simulation and real-user experiments suggest that our method significantly advances the state of the art. Our source code is freely available at https://…/emnlp2018-april.
Apriori Algorithm Apriori is an algorithm for frequent item set mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.
Apriori-Graph Web Usage Mining is an application of Data Mining Techniques to discover interesting usage patterns from web data in order to understand and better serve the needs of web-based applications. The paper proposes an algorithm for finding these usage patterns using a modified version of Apriori Algorithm called Apriori-Graph. These rules will help service providers to predict, which web pages, the user is likely to visit next. This will optimize the website in terms of efficiency, bandwidth and will have positive economic benefits for them. The proposed Apriori Graph Algorithm O((V)(E)) works faster compared to the existing Apriori Algorithm and is well suitable for real-time application.
Arc Diagram In graph drawing, an arc diagram is a style of graph drawing, in which the vertices of a graph are placed along a line in the Euclidean plane, with edges being drawn as semicircles in one of the two halfplanes bounded by the line, or as smooth curves formed by sequences of semicircles. In some cases, line segments of the line itself are also allowed as edges, as long as they connect only vertices that are consecutive along the line.
ArcGIS Esri’s ArcGIS is a geographic information system (GIS) for working with maps and geographic information. It is used for: creating and using maps; compiling geographic data; analyzing mapped information; sharing and discovering geographic information; using maps and geographic information in a range of applications; and managing geographic information in a database. The system provides an infrastructure for making maps and geographic information available throughout an organization, across a community, and openly on the Web.
Archetypal Analysis Archetypal analysis (Cutler and Breiman, 1994) has the aim to find a few, not necessarily observed, extremal observations (the archetypes) in a multivariate data set such that:
1. all the observations are approximated by convex combinations of the archetypes, and
2. all the archetypes are convex combinations of the observations.
Arcsine Law In probability theory, the arcsine laws are a collection of results for one-dimensional random walks and Brownian motion (the Wiener process). The best known of these is attributed to Paul Lévy (1939). All three laws relate path properties of the Wiener process to the arcsine distribution. The Lévy Arcsine Law states that the proportion of time that the one-dimensional Wiener process is positive follows an arcsine distribution.
Area under Curve
Sometimes, the ROC is used to generate a summary statistic. A common version is the area under the ROC curve, or “AUC” (“Area Under Curve”), or A’ (pronounced “a-prime”), or “c-statistic”.
Argument Component Detection
Argument component detection (ACD) is an important sub-task in argumentation mining. ACD aims at detecting and classifying different argument components in natural language texts.
Argument Harvesting Much research in computational argumentation assumes that arguments and counterarguments can be obtained in some way. Yet, to improve and apply models of argument, we need methods for acquiring them. Current approaches include argument mining from text, hand coding of arguments by researchers, or generating arguments from knowledge bases. In this paper, we propose a new approach, which we call argument harvesting, that uses a chatbot to enter into a dialogue with a participant to get arguments and counterarguments from him or her. Because it is automated, the chatbot can be used repeatedly in many dialogues, and thereby it can generate a large corpus. We describe the architecture of the chatbot, provide methods for managing a corpus of arguments and counterarguments, and an evaluation of our approach in a case study concerning attitudes of women to participation in sport.
Argumentation Mining The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people’s argumentation.
ARMA Point Process We introduce the ARMA (autoregressive-moving-average) point process, which is a Hawkes process driven by a Neyman-Scott process with Poisson immigration. It contains both the Hawkes and Neyman-Scott process as special cases and naturally combines self-exciting and shot-noise cluster mechanisms, useful in a variety of applications. The name ARMA is used because the ARMA point process is an appropriate analogue of the ARMA time series model for integer-valued series. As such, the ARMA point process framework accommodates a flexible family of models sharing methodological and mathematical similarities with ARMA time series. We derive an estimation procedure for ARMA point processes, as well as the integer ARMA models, based on an MCEM (Monte Carlo Expectation Maximization) algorithm. This powerful framework for estimation accommodates trends in immigration, multiple parametric specifications of excitement functions, as well as cases where marks and immigrants are not observed.
AR-MDN Accurate demand forecasts can help on-line retail organizations better plan their supply-chain processes. The challenge, however, is the large number of associative factors that result in large, non-stationary shifts in demand, which traditional time series and regression approaches fail to model. In this paper, we propose a Neural Network architecture called AR-MDN, that simultaneously models associative factors, time-series trends and the variance in the demand. We first identify several causal features and use a combination of feature embeddings, MLP and LSTM to represent them. We then model the output density as a learned mixture of Gaussian distributions. The AR-MDN can be trained end-to-end without the need for additional supervision. We experiment on a dataset of an year’s worth of data over tens-of-thousands of products from Flipkart. The proposed architecture yields a significant improvement in forecasting accuracy when compared with existing alternatives.
ArrayFire Library The ArrayFire accelerated computing library is a free, general-purpose, open-source library that simplifies the process of developing software that targets parallel and massively-parallel architectures including CPUs, GPUs, and other hardware acceleration devices. ArrayFire is used on devices from low-powered mobile phones to high-powered GPU-enabled supercomputers including CPUs from all major vendors (Intel, AMD, Arm), GPUs from the dominant manufacturers (NVIDIA, AMD, and Qualcomm), as well as a variety of other accelerator devices on Windows, Mac, and Linux.
Articulate Articulate is a platform for building conversational interfaces with intelligent agents. Articulate is an open source project that will allow you to take control of you conversational interfaces, without being worried where and how your data is stored. Also, Articulate is built with an user-centered design where the main goal is to make experts and beginners feel comfortable when building their intelligent agents.
The main features of Articulate are:
• Open source project
• Based on Rasa NLU
• Docker and docker-compose based (Easy to set up locally and in the cloud)
• Awesome UI/UX
• Webhook connection
• Response formatting
• Handlebars.js for template responses
• Community support on Gitter and Github
Articulates makes it super easy to get up and running with Rasa NLU. You´ll be guided as you build and train your custom agent using our friendly and intuitive interface.
Artificial Continuous Prediction Market
We propose the Artificial Continuous Prediction Market (ACPM) as a means to predict a continuous real value, by integrating a range of data sources and aggregating the results of different machine learning (ML) algorithms. ACPM adapts the concept of the (physical) prediction market to address the prediction of real values instead of discrete events. Each ACPM participant has a data source, a ML algorithm and a local decision-making procedure that determines what to bid on what value. The contributions of ACPM are: (i) adaptation to changes in data quality by the use of learning in: (a) the market, which weights each market participant to adjust the influence of each on the market prediction and (b) the participants, which use a Q-learning based trading strategy to incorporate the market prediction into their subsequent predictions, (ii) resilience to a changing population of low- and high-performing participants. We demonstrate the effectiveness of ACPM by application to an influenza-like illnesses data set, showing ACPM out-performs a range of well-known regression models and is resilient to variation in data source quality.
Artificial General Intelligence
Artificial general intelligence (AGI) is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can. It is a primary goal of artificial intelligence research and an important topic for science fiction writers and futurists. Artificial general intelligence is also referred to as “strong AI”, “full AI” or as the ability to perform “general intelligent action”. AGI is associated with traits such as consciousness, sentience, sapience, and self-awareness observed in living beings.
Artificial Intelligence
Artificial intelligence (AI) is the human-like intelligence exhibited by machines or software. It is also an academic field of study. Major AI researchers and textbooks define the field as “the study and design of intelligent agents”, where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success. John McCarthy, who coined the term in 1955, defines it as “the science and engineering of making intelligent machines”.
Artificial Quadratic Neural Network
In machine learning, the use of an artificial neural network is the mainstream approach. Such a network consists of layers of neurons. These neurons are of the same type characterized by the two features: (1) an inner product of an input vector and a matching weighting vector of trainable parameters and (2) a nonlinear excitation function. Here we investigate the possibility of replacing the inner product with a quadratic function of the input vector, thereby upgrading the 1st order neuron to the 2nd order neuron, empowering individual neurons, and facilitating the optimization of neural networks. Also, numerical examples are provided to illustrate the feasibility and merits of the 2nd order neurons. Finally, further topics are discussed.
Artificial Stupidity And artificial stupidity’s most valid usage relates to examples of the obvious faultiness of AI technologies and systems. Finally within the field of computer science, artificial stupidity has one other significant application: it refers to a technique of deliberately dumbing down computer programs in order to introduce errors in their responses.
Building Safer AGI by introducing Artificial Stupidity
Aspect Based Sentiment Analysis
Sentiment analysis is increasingly viewed as a vital task both from an academic and a commercial standpoint. The majority of current approaches, however, attempt to detect the overall polarity of a sentence, paragraph, or text span, regardless of the entities mentioned (e.g., laptops, restaurants) and their aspects (e.g., battery, screen ; food, service). By contrast, this task is concerned with aspect based sentiment analysis (ABSA), where the goal is to identify the aspects of given target entities and the sentiment expressed towards each aspect.
Aspect Based Sentiment Analysis with Gated Convolutional Networks
Aspect-Oriented Software Development
In computing, Aspect-oriented software development (AOSD) is an emerging software development technology that seeks new modularizations of software systems in order to isolate secondary or supporting functions from the main program’s business logic. AOSD allows multiple concerns to be expressed separately and automatically unified into working systems. Aspect-Oriented Software Development focuses on the identification, specification and representation of cross-cutting concerns and their modularization into separate functional units as well as their automated composition into a working system.
ASReml ASReml is a statistical software package for fitting linear mixed models using restricted maximum likelihood, a technique commonly used in plant and animal breeding and quantitative genetics as well as other fields. It is notable for its ability to fit very large and complex data sets efficiently, due to its use of the average information algorithm and sparse matrix methods. It was originally developed by Arthur Gilmour. ASREML can be used in Windows, Linux, and as an add-on to S-PLUS and R.
Association Analysis
Association for Uncertainty in Artificial Intelligence
The Association for Uncertainty in Artificial Intelligence is a non-profit organization focused on organizing the annual Conference on Uncertainty in Artificial Intelligence (UAI) and, more generally, on promoting research in pursuit of advances in knowledge representation, learning and reasoning under uncertainty. Principles and applications developed within the UAI community have been at the forefront of research in Artificial Intelligence. The UAI community and annual meeting have been primary sources of advances in graphical models for representing and reasoning with uncertainty.
Association Mapping Association mapping (genetics), also known as “linkage disequilibrium mapping”, is a method of mapping quantitative trait loci (QTLs) that takes advantage of historic linkage disequilibrium to link phenotypes (observable characteristics) to genotypes (the genetic constitution of organisms).
Association Rule Classification
Association Rule Learning Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. Based on the concept of strong rules, Rakesh Agrawal et al. introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements. In addition to the above example from market basket analysis association rules are employed today in many application areas including Web usage mining, intrusion detection, Continuous production, and bioinformatics. As opposed to sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions.
Association Rule Mining
Associative Domain Adaptation We propose associative domain adaptation, a novel technique for end-to-end domain adaptation with neural networks, the task of inferring class labels for an unlabeled target domain based on the statistical properties of a labeled source domain. Our training scheme follows the paradigm that in order to effectively derive class labels for the target domain, a network should produce statistically domain invariant embeddings, while minimizing the classification error on the labeled source domain. We accomplish this by reinforcing associations between source and target data directly in embedding space. Our method can easily be added to any existing classification network with no structural and almost no computational overhead. We demonstrate the effectiveness of our approach on various benchmarks and achieve state-of-the-art results across the board with a generic convolutional neural network architecture not specifically tuned to the respective tasks. Finally, we show that the proposed association loss produces embeddings that are more effective for domain adaptation compared to methods employing maximum mean discrepancy as a similarity measure in embedding space.
Assumed Density Filtering Updating Beliefs on Action-Values
While off-policy temporal difference methods have been broadly used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have been relatively understudied. This is mainly because the max operator in the Bellman optimality equation brings non-linearity and inconsistent distributions over value function. In this paper, we introduce a new Bayesian approach to off-policy TD methods using Assumed Density Filtering, called ADFQ, which updates beliefs on action-values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs not only are used in exploration but they provide a natural regularization in the belief updates. We also present a connection between ADFQ and Q-learning. Our empirical results show the proposed ADFQ algorithms outperform comparing algorithms in several task domains. Moreover, our algorithms improve general drawbacks in BRL such as computational complexity, usage of uncertainty, and nonlinearity.
Asymmetric Convolution Bidirectional Long Short-Term Memory Networks
Recently deeplearning models have been shown to be capable of making remarkable performance in sentences and documents classification tasks. In this work, we propose a novel framework called AC-BLSTM for modeling setences and documents, which combines the asymmetric convolution neural network (ACNN) with the Bidirectional Long Short-Term Memory network (BLSTM). Experiment results demonstrate that our model achieves state-of-the-art results on all six tasks, including sentiment analysis, question type classification, and subjectivity classification.
Asynchronous Advantage Actor-Critic
Asynchronous Distributed Gibbs
Gibbs sampling is a widely used Markov Chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences. It is widely believed that MCMC methods do not extend easily to parallel implementations, as their inherently sequential nature incurs a large synchronization cost. This means that new solutions are needed to bring Bayesian analysis fully into the era of large-scale computation. In this paper, we present a novel scheme – Asynchronous Distributed Gibbs (ADG) sampling – that allows us to perform MCMC in a parallel fashion with no synchronization or locking, avoiding the typical performance bottlenecks of parallel algorithms. Our method is especially attractive in settings, such as hierarchical random-effects modeling in which each observation has its own random effect, where the problem dimension grows with the sample size. We prove convergence under some basic regularity conditions, and discuss the proof for similar parallelization schemes for other iterative algorithms. We provide three examples that illustrate some of the algorithm’s properties with respect to scaling. Because our hardware resources are bounded, we have not yet found a limit to the algorithm’s scaling, and thus its true capabilities remain unknown.
Asynchronous Parallel SAGA
We describe Asaga, an asynchronous parallel version of the incremental gradient algorithm Saga that enjoys fast linear convergence rates. We highlight a subtle but important technical issue present in a large fraction of the recent convergence rate proofs for asynchronous parallel optimization algorithms, and propose a simplification of the recently proposed ‘perturbed iterate’ framework that resolves it. We thereby prove that Asaga can obtain a theoretical linear speedup on multi-core systems even without sparsity assumptions. We present results of an implementation on a 40-core architecture illustrating the practical speedup as well as the hardware overhead.
Asynchronous Parallel Sampling Gradient Boosting Decision Tree
With the development of big data technology, Gradient Boosting Decision Tree, i.e. GBDT, becomes one of the most important machine learning algorithms for its accurate output. However, the training process of GBDT needs a lot of computational resources and time. In order to accelerate the training process of GBDT, the asynchronous parallel sampling gradient boosting decision tree, abbr. asynch-SGBDT is proposed in this paper. Via introducing sampling, we adapt the numerical optimization process of traditional GBDT training process into stochastic optimization process and use asynchronous parallel stochastic gradient descent to accelerate the GBDT training process. Meanwhile, the theoretical analysis of asynch-SGBDT is provided by us in this paper. Experimental results show that GBDT training process could be accelerated by asynch-SGBDT. Our asynchronous parallel strategy achieves an almost linear speedup, especially for high-dimensional sparse datasets.
Asynchronous Parallel Stochastic Coordinate Descent
An asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity property and a sublinear rate (1=K) on general convex functions. Nearlinear speedup on a multicore system can be expected if the number of processors is O(n1=2) in unconstrained optimization and O(n1=4) in the separable-constrained case, where n is the number of variables.
Asynchronous Stochastic Variational Inference Stochastic variational inference (SVI) employs stochastic optimization to scale up Bayesian computation to massive data. Since SVI is at its core a stochastic gradient-based algorithm, horizontal parallelism can be harnessed to allow larger scale inference. We propose a lock-free parallel implementation for SVI which allows distributed computations over multiple slaves in an asynchronous style. We show that our implementation leads to linear speed-up while guaranteeing an asymptotic ergodic convergence rate $O(1/\sqrt(T)$ ) given that the number of slaves is bounded by $\sqrt(T)$ ($T$ is the total number of iterations). The implementation is done in a high-performance computing (HPC) environment using message passing interface (MPI) for python (MPI4py). The extensive empirical evaluation shows that our parallel SVI is lossless, performing comparably well to its counterpart serial SVI with linear speed-up.
Asynchronous Subgradient-Push Algorithm
This paper proposes a novel exact asynchronous subgradient-push algorithm (AsySPA) to solve an additive cost optimization problem over digraphs where each node only has access to a local convex function and updates asynchronously with an arbitrary rate. Specifically, each node of a strongly connected digraph does not wait for updates from other nodes but simply starts a new update within any bounded time interval by using local information available from its in-neighbors. ‘Exact’ means that every node of the AsySPA can asymptotically converge to the same optimal solution, even under different update rates among nodes and bounded communication delays. To compensate uneven update rates, we design a simple mechanism to adaptively adjust stepsizes per update in each node, which is substantially different from the existing works. Then, we construct a delay-free augmented system to address asynchrony and delays, and perform the convergence analysis by proposing a generalized subgradient algorithm, which clearly has its own significance and helps us to explicitly evaluate the convergence speed of the AsySPA. Finally, we demonstrate advantages of the AsySPA over the celebrated synchronous SPA in both theory and simulation.
Atomic Information Resource Data Model
Entities are simply formed by grouping Attributes together. One or more Attributes are shared between two or more Entities. According to the AtomicDB terminology, shared attributes are called bridge concepts and this is the equivalent of the relationship that is implemented with primary and foreign keys on two tables but here we are completely independent to mix and match them. For example PartColor, could be merged with another attribute from a different table in another relational database.
Atomic Triangular Matrix An atomic (upper or lower) triangular matrix is a special form of unitriangular matrix, where all of the off-diagonal entries are zero, except for the entries in a single column. Such a matrix is also called a Gauss matrix or a Gauss transformation matrix.
ATOMO Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that methods such as QSGD and TernGrad are special cases of ATOMO and show that sparsifiying gradients in their singular value decomposition (SVD), rather than the coordinate-wise one, can lead to significantly faster distributed training.
ATPboost ATPboost is a system for solving sets of large-theory problems by interleaving ATP runs with state-of-the-art machine learning of premise selection from the proofs. Unlike many previous approaches that use multi-label setting, the learning is implemented as binary classification that estimates the pairwise-relevance of (theorem, premise) pairs. ATPboost uses for this the XGBoost gradient boosting algorithm, which is fast and has state-of-the-art performance on many tasks. Learning in the binary setting however requires negative examples, which is nontrivial due to many alternative proofs. We discuss and implement several solutions in the context of the ATP/ML feedback loop, and show that ATPboost with such methods significantly outperforms the k-nearest neighbors multilabel classifier.
Atrain Distributed System
A special type of distributed system called by ‘Atrain Distributed System’ (ADS) which is very suitable for processing big data using the heterogeneous data structures r-atrain or the homogeneous data structure r-train. A simple ‘Atrain Distributed System’ is called an uni-tier ADS. The ‘Multi-tier Atrain Distributed System’ is an extension of the uni-tier ADS. The ADS is scalable upto any extent as many times as required. Two new type of network topologies are defined for ADS called by ‘multi-horse cart’ topology and ‘cycle’ topology which can support increasing volume of big data. Where r-atrain and r-train data structures are introduced for the processing of big data, the data structures ‘heterogeneous data structure MA’ and ‘homogeneous data structure MT’ are introduced for the processing of big data including temporal big data too. Both MA and MT can be well implemented in multi-tier ADS. We define cyclic train and cyclic atrain, and then doubly linked train/atrain. A method is proposed on how to implement Solid Matrices, n-dimensional arrays, n-dimensional larrays etc. in a computer memory using the data structures MT and MA.
ATRank A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNN-based methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via self-attention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models.
A-Tree Index structures are one of the most important tools that DBAs leverage in order to improve the performance of analytics and transactional workloads. However, with the explosion of data that is constantly being generated in a wide variety of domains including autonomous vehicles, Internet of Things (IoT) devices, and E-commerce sites, building several indexes can often become prohibitive and consume valuable system resources. In fact, a recent study has shown that indexes created as part of the TPC-C benchmark can account for 55% of the total memory available in a state-of-the-art in-memory DBMS. This overhead consumes valuable and expensive main memory, and limits the amount of space that a database has available to store new data or process existing data. In this paper, we present a novel approximate index structure called A-Tree. At the core of our index is a tunable error parameter that allows a DBA to balance lookup performance and space consumption. To navigate this tradeoff, we provide a cost model that helps the DBA choose an appropriate error parameter given either (1) a lookup latency requirement (e.g., 500ns) or (2) a storage budget (e.g., 100MB). Using a variety of real-world datasets, we show that our index structure is able to provide performance that is comparable to full index structures while reducing the storage footprint by orders of magnitude.
Attack Investigation Query Language
The need for countering Advanced Persistent Threat (APT) attacks has led to the solutions that ubiquitously monitor system activities in each host, and perform timely attack investigation over the monitoring data for analyzing attack provenance. However, existing query systems based on relational databases and graph databases lack language constructs to express key properties of major attack behaviors, and often execute queries inefficiently since their semantics-agnostic design cannot exploit the properties of system monitoring data to speed up query execution. To address this problem, we propose a novel query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation. Our system provides (1) domain-specific data model and storage for scaling the storage, (2) a domain-specific query language, Attack Investigation Query Language (AIQL) that integrates critical primitives for attack investigation, and (3) an optimized query engine based on the characteristics of the data and the semantics of the queries to efficiently schedule the query execution. We deployed our system in NEC Labs America comprising 150 hosts and evaluated it using 857 GB of real system monitoring data (containing 2.5 billion events). Our evaluations on a real-world APT attack and a broad set of attack behaviors show that our system surpasses existing systems in both efficiency (124x over PostgreSQL, 157x over Neo4j, and 16x over Greenplum) and conciseness (SQL, Neo4j Cypher, and Splunk SPL contain at least 2.4x more constraints than AIQL).
Attention Economy Attention economics is an approach to the management of information that treats human attention as a scarce commodity, and applies economic theory to solve various information management problems.
Attention Gated Network We propose a novel attention gate (AG) model for medical image analysis that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules when using convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN models such as VGG or U-Net architectures with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed AG models are evaluated on a variety of tasks, including medical image classification and segmentation. For classification, we demonstrate the use case of AGs in scan plane detection for fetal ultrasound screening. We show that the proposed attention mechanism can provide efficient object localisation while improving the overall prediction performance by reducing false positives. For segmentation, the proposed architecture is evaluated on two large 3D CT abdominal datasets with manual annotations for multiple organs. Experimental results show that AG models consistently improve the prediction performance of the base architectures across different datasets and training sizes while preserving computational efficiency. Moreover, AGs guide the model activations to be focused around salient regions, which provides better insights into how model predictions are made. The source code for the proposed AG models is publicly available.
Attention Incorporate Network
In traditional neural networks for image processing, the inputs of the neural networks should be the same size such as 224*224*3. But how can we train the neural net model with different input size A common way to do is image deformation which accompany a problem of information loss (e.g. image crop or wrap). Sequence model(RNN, LSTM, etc.) can accept different size of input like text and audio. But one disadvantage for sequence model is that the previous information will become more fragmentary during the transfer in time step, it will make the network hard to train especially for long sequential data. In this paper we propose a new network structure called Attention Incorporate Network(AIN). It solve the problem of different size of inputs including: images, text, audio, and extract the key features of the inputs by attention mechanism, pay different attention depends on the importance of the features not rely on the data size. Experimentally, AIN achieve a higher accuracy, better convergence comparing to the same size of other network structure
Attentional Multi-agent Predictive Modeling
Multi-agent predictive modeling is an essential step for understanding physical, social and team-play systems. Recently, Interaction Networks (INs) were proposed for the task of modeling multi-agent physical systems, INs scale with the number of interactions in the system (typically quadratic or higher order in the number of agents). In this paper we introduce VAIN, a novel attentional architecture for multi-agent predictive modeling that scales linearly with the number of agents. We show that VAIN is effective for multi-agent predictive modeling. Our method is evaluated on tasks from challenging multi-agent prediction domains: chess and soccer, and outperforms competing multi-agent approaches.
Attention-Aware Generative Adversarial Network
In this work, we present a novel approach for training Generative Adversarial Networks (GANs). Using the attention maps produced by a Teacher- Network we are able to improve the quality of the generated images as well as perform weakly object localization on the generated images. To this end, we generate images of HEp-2 cells captured with Indirect Imunofluoresence (IIF) and study the ability of our network to perform a weakly localization of the cell. Firstly, we demonstrate that whilst GANs can learn the mapping between the input domain and the target distribution efficiently, the discriminator network is not able to detect the regions of interest. Secondly, we present a novel attention transfer mechanism which allows us to enforce the discriminator to put emphasis on the regions of interest via transfer learning. Thirdly, we show that this leads to more realistic images, as the discriminator learns to put emphasis on the area of interest. Fourthly, the proposed method allows one to generate both images as well as attention maps which can be useful for data annotation e.g in object detection.
Attention-based Adversarial Autoencoder Network Embedding
Network embedding represents nodes in a continuous vector space and preserves structure information from the Network. Existing methods usually adopt a ‘one-size-fits-all’ approach when concerning multi-scale structure information, such as first- and second-order proximity of nodes, ignoring the fact that different scales play different roles in the embedding learning. In this paper, we propose an Attention-based Adversarial Autoencoder Network Embedding(AAANE) framework, which promotes the collaboration of different scales and lets them vote for robust representations. The proposed AAANE consists of two components: 1) Attention-based autoencoder effectively capture the highly non-linear network structure, which can de-emphasize irrelevant scales during training. 2) An adversarial regularization guides the autoencoder learn robust representations by matching the posterior distribution of the latent embeddings to given prior distribution. This is the first attempt to introduce attention mechanisms to multi-scale network embedding. Experimental results on real-world networks show that our learned attention parameters are different for every network and the proposed approach outperforms existing state-of-the-art approaches for network embedding.
Attention-Based Bidirectional LSTM Text segmentation plays an important role in various Natural Language Processing (NLP) tasks like summarization, context understanding, document indexing and document noise removal. Previous methods for this task require manual feature engineering, huge memory requirements and large execution times. To the best of our knowledge, this paper is the first one to present a novel supervised neural approach for text segmentation. Specifically, we propose an attention-based bidirectional LSTM model where sentence embeddings are learned using CNNs and the segments are predicted based on contextual information. This model can automatically handle variable sized context information. Compared to the existing competitive baselines, the proposed model shows a performance improvement of ~7% in WinDiff score on three benchmark datasets.
Attentive Convolution Network
In NLP, convolution neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because attention in CNNs has been mainly implemented as attentive pooling (i.e., it is applied to pooling) rather than as attentive convolution (i.e., it is integrated into convolution). Convolution is the differentiator of CNNs in that it can powerfully model the higher-level representation of a word by taking into account its local fixed-size context in input text $t^x$. In this work, we propose an attentive convolution network, AttentiveConvNet. It extends the context scope of the convolution operation, deriving higher-level features for a word not only from local context, but also from information extracted from nonlocal context by the attention mechanism commonly used in RNNs. This nonlocal context can come (i) from parts of the input text $t^x$ that are distant or (ii) from a second input text, the context text $t^y$. In an evaluation on sentence relation classification (textual entailment and answer sentence selection) and text classification, experiments demonstrate that AttentiveConvNet has state-of-the-art performance and outperforms RNN/CNN variants with and without attention.
Attentive Dense Graph Propagation Module
The potential of graph convolutional neural networks for the task of zero-shot learning has been demonstrated recently. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, knowledge from distant nodes can get diluted when propagating through intermediate nodes, because current approaches to zero-shot learning use graph propagation schemes that perform Laplacian smoothing at each layer. We show that extensive smoothing does not help the task of regressing classifier weights in zero-shot learning. In order to still incorporate information from distant nodes and utilize the graph structure, we propose an Attentive Dense Graph Propagation Module (ADGPM). ADGPM allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a node’s relationship to its ancestors and descendants and an attention scheme is further used to weigh their contribution depending on the distance to the node. Finally, we illustrate that finetuning of the feature representation after training the ADGPM leads to considerable improvements. Our method achieves competitive results, outperforming previous zero-shot learning approaches.
Attentive Memory Network Recent advances in conversational systems have changed the search paradigm. Traditionally, a user poses a query to a search engine that returns an answer based on its index, possibly leveraging external knowledge bases and conditioning the response on earlier interactions in the search session. In a natural conversation, there is an additional source of information to take into account: utterances produced earlier in a conversation can also be referred to and a conversational IR system has to keep track of information conveyed by the user during the conversation, even if it is implicit. We argue that the process of building a representation of the conversation can be framed as a machine reading task, where an automated system is presented with a number of statements about which it should answer questions. The questions should be answered solely by referring to the statements provided, without consulting external knowledge. The time is right for the information retrieval community to embrace this task, both as a stand-alone task and integrated in a broader conversational search setting. In this paper, we focus on machine reading as a stand-alone task and present the Attentive Memory Network (AMN), an end-to-end trainable machine reading algorithm. Its key contribution is in efficiency, achieved by having an hierarchical input encoder, iterating over the input only once. Speed is an important requirement in the setting of conversational search, as gaps between conversational turns have a detrimental effect on naturalness. On 20 datasets commonly used for evaluating machine reading algorithms we show that the AMN achieves performance comparable to the state-of-the-art models, while using considerably fewer computations.
Attentive Regularization
We propose Attentive Regularization (AR), a method to constrain the activation maps of kernels in Convolutional Neural Networks (CNNs) to specific regions of interest (ROIs). Each kernel learns a location of specialization along with its weights through standard backpropagation. A differentiable attention mechanism requiring no additional supervision is used to optimize the ROIs. Traditional CNNs of different types and structures can be modified with this idea into equivalent Targeted Kernel Networks (TKNs), while keeping the network size nearly identical. By restricting kernel ROIs, we reduce the number of sliding convolutional operations performed throughout the network in its forward pass, speeding up both training and inference. We evaluate our proposed architecture on both synthetic and natural tasks across multiple domains. TKNs obtain significant improvements over baselines, requiring less computation (around an order of magnitude) while achieving superior performance.
Attribution Modeling Attribution modelling, in essence, means reporting on the impact of communication activity using metrics like:
· Turnover
· Profit
· Customer retention
· Volume of sales
Instead of metrics like:
· Share of voice
· Web visits
· Click through rate (CTR)
· Impressions
There’s a big difference between these two lists. The second list contains important metrics, but businesses could survive without ever increasing them. The business metrics in the first list, however, are essential for all companies that want to survive and thrive.
Understanding the impact of communications on business metrics is – rightly – more important to senior executives. This is the primary objective of attribution modelling; to provide holistic, accurate information about the financial return activities are delivering so you can refine them, adjust what you’re doing, and use the same budget to deliver more value to your business and your customers.
AttriGuard Users in various web and mobile applications are vulnerable to attribute inference attacks, in which an attacker leverages a machine learning classifier to infer a target user’s private attributes (e.g., location, sexual orientation, political view) from its public data (e.g., rating scores, page likes). Existing defenses leverage game theory or heuristics based on correlations between the public data and attributes. These defenses are not practical. Specifically, game-theoretic defenses require solving intractable optimization problems, while correlation-based defenses incur large utility loss of users’ public data. In this paper, we present AttriGuard, a practical defense against attribute inference attacks. AttriGuard is computationally tractable and has small utility loss. Our AttriGuard works in two phases. Suppose we aim to protect a user’s private attribute. In Phase I, for each value of the attribute, we find a minimum noise such that if we add the noise to the user’s public data, then the attacker’s classifier is very likely to infer the attribute value for the user. We find the minimum noise via adapting existing evasion attacks in adversarial machine learning. In Phase II, we sample one attribute value according to a certain probability distribution and add the corresponding noise found in Phase I to the user’s public data. We formulate finding the probability distribution as solving a constrained convex optimization problem. We extensively evaluate AttriGuard and compare it with existing methods using a real-world dataset. Our results show that AttriGuard substantially outperforms existing methods. Our work is the first one that shows evasion attacks can be used as defensive techniques for privacy protection.
Auer-Gervini Graphical Bayesian Approach This article approaches the problem of selecting significant principal components from a Bayesian model selection perspective. The resulting Bayes rule provides a simple graphical technique that can be used instead of (or together with) the popular scree plot to determine the number of significant components to retain. We study the theoretical properties of the new method and show, by examples and simulation, that it provides more clear-cut answers than the scree plot in many interesting situations.
Augmented Backward Elimination
Augmented backward elimination combines significance or information based criteria with the change in estimate to either select the optimal model for prediction purposes or to serve as a tool to obtain a practically sound, highly interpretable model. More details can be found in Dunkler et al. (2014) <doi:10.1371/journal.pone.0113677>.
Augmented Dickey-Fuller Test
In statistics and econometrics, an augmented Dickey-Fuller test (ADF) is a test for a unit root in a time series sample. It is an augmented version of the Dickey-Fuller test for a larger and more complicated set of time series models. The augmented Dickey-Fuller (ADF) statistic, used in the test, is a negative number. The more negative it is, the stronger the rejection of the hypothesis that there is a unit root at some level of confidence.
Augmented Intelligence
Augmented Interval Markov Chains
In this paper we propose augmented interval Markov chains (AIMCs): a generalisation of the familiar interval Markov chains (IMCs) where uncertain transition probabilities are in addition allowed to depend on one another. This new model preserves the flexibility afforded by IMCs for describing stochastic systems where the parameters are unclear, for example due to measurement error, but also allows us to specify transitions with probabilities known to be identical, thereby lending further expressivity. The focus of this paper is reachability in AIMCs. We study the qualitative, exact quantitative and approximate reachability problem, as well as natural subproblems thereof, and establish several upper and lower bounds for their complexity. We prove the exact reachability problem is at least as hard as the famous square-root sum problem, but, encouragingly, the approximate version lies in $\mathbf{NP}$ if the underlying graph is known, whilst the restriction of the exact problem to a constant number of uncertain edges is in $\mathbf{P}$. Finally, we show that uncertainty in the graph structure affects complexity by proving $\mathbf{NP}$-completeness for the qualitative subproblem, in contrast with an easily-obtained upper bound of $\mathbf{P}$ for the same subproblem with known graph structure.
Augmented Inverse Probability Weighting
In this paper, we discuss an estimator for average treatment effects (ATEs) known as the augmented inverse propensity weighted (AIPW) estimator. This estimator has attractive theoretical properties and only requires practitioners to do two things they are already comfortable with: (1) specify a binary regression model for the propensity score, and (2) specify a regression model for the outcome variable. Perhaps the most interesting property of this estimator is its so-called ‘‘double robustness.” Put simply, the estimator remains consistent for the ATE if either the propensity score model or the outcome regression is misspecified but the other is properly specified. After explaining the AIPW estimator, we conduct a Monte Carlo experiment that compares the finite sample performance of the AIPW estimator to three common competitors: a regression estimator, an inverse propensity weighted (IPW) estimator, and a propensity score matching estimator. The Monte Carlo results show that the AIPW estimator has comparable or lower mean square error than the competing estimators when the propensity score and outcome models are both properly specified and, when one of the models is misspecified, the AIPW estimator is superior.
‘Robust-squared’ Imputation Models Using BART
Augmented k-means Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, k-means is a very important algorithm for this. In this technical report, we develop a new k-means variant called Augmented k-means, which is a hybrid of k-means and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent re-estimation of cluster means. Observations which can’t be firmly identified into clusters are excluded from the re-estimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter. Augmented k-means frequently outperforms k-means by more accurately classifying observations into known clusters and / or converging in fewer iterations. We demonstrate this on both simulated and real datasets. Our algorithm is implemented in Python and will be available with this report.
Augmented Lagrangian Alternating Direction Inexact Newton
Distributed Optimal Power Flow using ALADIN
Augmented Lagrangian Method
Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems; the difference is that the augmented Lagrangian method adds an additional term to the unconstrained objective. This additional term is designed to mimic a Lagrange multiplier. The augmented Lagrangian is not the same as the method of Lagrange multipliers. Viewed differently, the unconstrained objective is the Lagrangian of the constrained problem, with an additional penalty term (the augmentation).
Augmented RNN
The recent adoption of recurrent neural networks (RNNs) for session modeling has yielded substantial performance gains compared to previous approaches. In terms of context-aware session modeling, however, the existing RNN-based models are limited in that they are not designed to explicitly model rich static user-side contexts (e.g., age, gender, location). Therefore, in this paper, we explore the utility of explicit user-side context modeling for RNN session models. Specifically, we propose an augmented RNN (ARNN) model that extracts high-order user-contextual preference using the product-based neural network (PNN) in order to augment any existing RNN session model. Evaluation results show that our proposed model outperforms the baseline RNN session model by a large margin when rich user-side contexts are available.
Author Rank Author Rank is a new aspect of Google’s search algorithm that will score online content creators. Similar to SEO rankings for sites and pages, authors will now have an associated ranking based on a few contributing factors, including, but not limited to:
· Social sharing of your Google+ posts
· Quality of backlinks to your content
· Interactions with your content (comments and shares)
· Timely and topical content
· Reputation and authority on other social networks
· PageRank
In short, Google will be assessing your reputation, authority, and the general reception of your content to determine just how valuable you are as an author. This ranking methodology provides writers with a greater incentive to not only build out their Google Plus profiles (smart move, Google), but also to ensure their online presence is streamlined and connected across all social networks and blogs to which they contribute.
A user who is well-connected, well-informed, produces great content, and is seen by the larger community as valuable, will without question reap the rewards of a high author ranking.
A Guide on Google’s Author Rank
Auto Regressive TIme VArying
Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions. To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.). ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail.
Autocorrelation Autocorrelation, also known as serial correlation or cross-autocorrelation, is the cross-correlation of a signal with itself at different points in time (that is what the cross stands for). Informally, it is the similarity between observations as a function of the time lag between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.
AutoCorrelation Function
The auto-correlation function measures the correlation of a signal x(t) with itself shifted by some time delay tau. The auto-correlation function can be used to detect repeats or periodicity in a signal. E.g. to use the auto-correlation to assess the effect of fluctuations (noise) on a periodic signal.
Auto-Correlational Neural Network
In recent years, the natural language processing community has moved away from task-specific feature engineering, i.e., researchers discovering ad-hoc feature representations for various tasks, in favor of general-purpose methods that learn the input representation by themselves. However, state-of-the-art approaches to disfluency detection in spontaneous speech transcripts currently still depend on an array of hand-crafted features, and other representations derived from the output of pre-existing systems such as language models or dependency parsers. As an alternative, this paper proposes a simple yet effective model for automatic disfluency detection, called an auto-correlational neural network (ACNN). The model uses a convolutional neural network (CNN) and augments it with a new auto-correlation operator at the lowest layer that can capture the kinds of ‘rough copy’ dependencies that are characteristic of repair disfluencies in speech. In experiments, the ACNN model outperforms the baseline CNN on a disfluency detection task with a 5% increase in f-score, which is close to the previous best result on this task.
Autodependogram In this paper the serial dependences between the observed time series and the lagged series, taken into account one-by-one, are graphically analyzed by what we have chosen to call the ‘autodependogram’. This tool, is a sort of natural nonlinear counterpart of the well-known autocorrelogram used in the linear context. The simple idea, instead of using autocorrelations at varying time lags, exploits the c2-test statistics applied to convenient contingency tables. The usefulness of this graphical device is confirmed by simulations from certain classes of well-known models, characterized by randomness and also by different kinds of linear and nonlinear dependences. The autodependogram is also applied to both environmental and economic real data. In this way its ability to detect nonlinear features is highlighted.
Autoencoded Variational Inference For Topic Model
Topic models are one of the most popular methods for learning representations of text, but a major challenge is that any change to the topic model requires mathematically deriving a new inference algorithm. A promising approach to address this problem is autoencoding variational Bayes (AEVB), but it has proven difficult to apply to topic models in practice. We present what is to our knowledge the first effective AEVB based inference method for latent Dirichlet allocation (LDA), which we call Autoencoded Variational Inference For Topic Model (AVITM). This model tackles the problems caused for AEVB by the Dirichlet prior and by component collapsing. We find that AVITM matches traditional methods in accuracy with much better inference time. Indeed, because of the inference network, we find that it is unnecessary to pay the computational cost of running variational optimization on test data. Because AVITM is black box, it is readily applied to new topic models. As a dramatic illustration of this, we present a new topic model called ProdLDA, that replaces the mixture model in LDA with a product of experts. By changing only one line of code from LDA, we find that ProdLDA yields much more interpretable topics, even if LDA is trained via collapsed Gibbs sampling.
Autoencoder An autoencoder, autoassociator or Diabolo network is an artificial neural network used for learning efficient codings. The aim of an auto-encoder is to learn a compressed, distributed representation (encoding) for a set of data, typically for the purpose of dimensionality reduction.
AutoEncoder Feature Selector
High-dimensional data in many areas such as computer vision and machine learning brings in computational and analytical difficulty. Feature selection which select a subset of features from original ones has been proven to be effective and efficient to deal with high-dimensional data. In this paper, we propose a novel AutoEncoder Feature Selector (AEFS) for unsupervised feature selection. AEFS is based on the autoencoder and the group lasso regularization. Compared to traditional feature selection methods, AEFS can select the most important features in spite of nonlinear and complex correlation among features. It can be viewed as a nonlinear extension of the linear method regularized self-representation (RSR) for unsupervised feature selection. In order to deal with noise and corruption, we also propose robust AEFS. An efficient iterative algorithm is designed for model optimization and experimental results verify the effectiveness and superiority of the proposed method.
Auto-Encoding Variational Bayes How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.
Automated Canary Analysis
automated CLAUse DETectEr
Machine Learning Powered Analysis of Consumer Contracts and Privacy Policies. CLAUDETTE – ‘automated CLAUse DETectEr’ – is an interdisciplinary research project hosted at the Law Department of the European University Institute, led by professors Giovanni Sartor and Hans-W. Micklitz, in cooperation with engineers from University of Bologna and University of Modena and Reggio Emilia. The research objective is to test to what extent is it possible to automate reading and legal assessment of online consumer contracts and privacy policies, to evaluate their compliance with EU´s unfair contractual terms law and personal data protection law (GDPR), using machine learning and grammar-based approaches. The idea arose out of bewilderment. Having read dozens of terms of service and of privacy policies of online platforms, we came to conclusion that despite substantive law in place, and despite enforcers´ competence for abstract control, providers of online services still tend to use unfair and unlawful clauses in these documents. Hence, the idea to automate parts of enforcement process by delegating certain tasks to machines. On one hand, we believe that relying on automation can increase quality and effectiveness of legal work of enforcers. On the other, we want to empower consumers themselves, by giving them tools to quickly assess whether what they agree to online is fair and/or lawful.
Automated Lyric Annotation
Comprehending lyrics, as found in songs and poems, can pose a challenge to human and machine readers alike. This motivates the need for systems that can understand the ambiguity and jargon found in such creative texts, and provide commentary to aid readers in reaching the correct interpretation. We introduce the task of automated lyric annotation (ALA). Like text simplification, a goal of ALA is to rephrase the original text in a more easily understandable manner. However, in ALA the system must often include additional information to clarify niche terminology and abstract concepts. To stimulate research on this task, we release a large collection of crowdsourced annotations for song lyrics. We analyze the performance of translation and retrieval models on this task, measuring performance with both automated and human evaluation. We find that each model captures a unique type of information important to the task.
Automated Machine Learning
The Current State of Automated Machine Learning
Automated Model Order Selection Algorithm
One of the longstanding problems in spectral graph clustering (SGC) is the so-called model order selection problem: automated selection of the correct number of clusters. This is equivalent to the problem of finding the number of connected components or communities in an undirected graph. In this paper, we propose AMOS, an automated model order selection algorithm for SGC. Based on a recent analysis of clustering reliability for SGC under the random interconnection model, AMOS works by incrementally increasing the number of clusters, estimating the quality of identified clusters, and providing a series of clustering reliability tests. Consequently, AMOS outputs clusters of minimal model order with statistical clustering reliability guarantees. Comparing to three other automated graph clustering methods on real-world datasets, AMOS shows superior performance in terms of multiple external and internal clustering metrics.
Automated Predictive Library
Automated Predictive Library (APL) is a HANA Application Function Library (AFL) which is meant to expose the features of InfiniteInsight (aka Kxen) inside HANA.
Automatic Bayesian Density Analysis
Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for density estimation, even when taking into account mixtures of probabilistic models, are not flexible enough to deal with the uncertainty inherent to real-world data: they are generally restricted to a priori fixed homogeneous likelihood model and to latent variable structures where expressiveness comes at the price of tractability. We propose Automatic Bayesian Density Analysis (ABDA) to go beyond classical mixture model density estimation, casting uncertainty estimation on both the underlying structure in the data, as well as the selection of adequate likelihood models for the data—thus statistical data types of the variable in the data—into a joint inference problem. Specifically, ABDA relies on a hierarchical model explicitly incorporating arbitrarily rich collections of likelihood models at a local level, while capturing global variable interactions by an expressive deep structure built on a sum-product network. Extensive empirical evidence shows that ABDA is more accurate than density estimators in the literature at dealing with both kinds of uncertainties, at modeling and predicting real-world (mixed continuous and discrete) data in both transductive and inductive scenarios, and at recovering the statistical data types.
Automatic Derivation Machine This paper presents an artificial intelligence algorithm that can be used to derive formulas from various scientific disciplines called automatic derivation machine. First, the formula is abstractly expressed as a multiway tree model, and then each step of the formula derivation transformation is abstracted as a mapping of multiway trees. Derivation steps similar can be expressed as a reusable formula template by a multiway tree map. After that, the formula multiway tree is eigen-encoded to feature vectors construct the feature space of formulas, the Q-learning model using in this feature space can achieve the derivation by making training data from derivation process. Finally, an automatic formula derivation machine is made to choose the next derivation step based on the current state and object. We also make an example about the nuclear reactor physics problem to show how the automatic derivation machine works.
Automatic Gradient Boosting Automatic machine learning performs predictive modeling with high performing machine learning tools without human interference. This is achieved by making machine learning applications parameter-free, i.e. only a dataset is provided while the complete model selection and model building process is handled internally through (often meta) optimization. Projects like Auto-WEKA and auto-sklearn aim to solve the Combined Algorithm Selection and Hyperparameter optimization (CASH) problem resulting in huge configuration spaces. However, for most real-world applications, the optimization over only a few different key learning algorithms can not only be sufficient, but also potentially beneficial. The latter becomes apparent when one considers that models have to be validated, explained, deployed and maintained. Here, less complex model are often preferred, for validation or efficiency reasons, or even a strict requirement. Automatic gradient boosting simplifies this idea one step further, using only gradient boosting as a single learning algorithm in combination with model-based hyperparameter tuning, threshold optimization and encoding of categorical features. We introduce this general framework as well as a concrete implementation called autoxgboost. It is compared to current AutoML projects on 16 datasets and despite its simplicity is able to achieve comparable results on about half of the datasets as well as performing best on two.
Automatic Interactive Data Exploration
In this paper, we argue that database systems be augmented with an automated data exploration service that methodically steers users through the data in a meaningful way. Such an automated system is crucial for deriving insights from complex datasets found in many big data applications such as scientific and healthcare applications as well as for reducing the human effort of data exploration. Towards this end, we present AIDE, an Automatic Interactive Data Exploration framework that assists users in discovering new interesting data patterns and eliminate expensive ad-hoc exploratory queries. AIDE relies on a seamless integration of classification algorithms and data management optimization techniques that collectively strive to accurately learn the user interests based on his relevance feedback on strategically collected samples. We present a number of exploration techniques as well as optimizations that minimize the number of samples presented to the user while offering interactive performance. AIDE can deliver highly accurate query predictions for very common conjunctive queries with small user effort while, given a reasonable number of samples, it can predict with high accuracy complex disjunctive queries. It provides interactive performance as it limits the user wait time per iteration of exploration to less than a few seconds.
Automatic License Plate Recognition
Automatic License Plate Recognition (ALPR) has been a frequent topic of research due to many practical applications. However, many of the current solutions are still not robust in real-world situations, commonly depending on many constraints. This paper presents a robust and efficient ALPR system based on the state-of-the-art YOLO object detection. The Convolutional Neural Networks (CNNs) are trained and fine-tuned for each ALPR stage so that they are robust under different conditions (e.g., variations in camera, lighting, and background). Specially for character segmentation and recognition, we design a two-stage approach employing simple data augmentation tricks such as inverted License Plates (LPs) and flipped characters. The resulting ALPR approach achieved impressive results in two datasets. First, in the SSIG dataset, composed of 2,000 frames from 101 vehicle videos, our system achieved a recognition rate of 93.53% and 47 Frames Per Second (FPS), performing better than both Sighthound and OpenALPR commercial systems (89.80% and 93.03%, respectively) and considerably outperforming previous results (81.80%). Second, targeting a more realistic scenario, we introduce a larger public dataset, called UFPR-ALPR dataset, designed to ALPR. This dataset contains 150 videos and 4,500 frames captured when both camera and vehicles are moving and also contains different types of vehicles (cars, motorcycles, buses and trucks). In our proposed dataset, the trial versions of commercial systems achieved recognition rates below 70%. On the other hand, our system performed better, with recognition rate of 78.33% and 35 FPS.
Automatic Machine Learning
The availability of large data sets and computational resources have encouraged the development of machine learning and data-driven models which pose an interesting alternative to explicit and fully structured models of behaviour. A battery of tools are now available which can automatically learn interesting mappings and simulate complex phenomena. These include techniques such as Hidden Markov Models, Neural Networks, Support Vector Machines, Mixture Models, Decision Trees and Bayesian Network Inference . In general, these tools have fallen into two classes: discriminative and generative models. The first attempts to optimize the learning for a particular task while the second models a phenomenon in its entirety. This difference between the two approaches will be addressed in this thesis in particular detail and the above probabilistic formalisms will be employed in deriving a machine learning system for our purposes.
ChaLearn Automatic Machine Learning Challenge
<a href="'Automate” target=”top”>https://…/
Automatic Multiscale-Based Peak Detection
A method for automatic detection of peaks in noisy periodic and quasi-periodic signals. This method, called automatic multiscale-based peak detection (AMPD), is based on the calculation and analysis of the local maxima scalogram, a matrix comprising the scale-dependent occurrences of local maxima.
Automatic Query Expansion
Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations. In the context of search engines, query expansion involves evaluating a user’s input (what words were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional documents. Query expansion involves techniques such as:
· Finding synonyms of words, and searching for the synonyms as well
· Finding all the various morphological forms of words by stemming each word in the search query
· Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results
· Re-weighting the terms in the original query
Query expansion is a methodology studied in the field of computer science, particularly within the realm of natural language processing and information retrieval.
Automatic Short Answer Grading
Automatic short answer grading (ASAG) techniques are designed to automatically assess short answers to questions in natural language, having a length of a few words to a few sentences. Supervised ASAG techniques have been demonstrated to be effective but suffer from a couple of key practical limitations. They are greatly reliant on instructor provided model answers and need labeled training data in the form of graded student answers for every assessment task. To overcome these, in this paper, we introduce an ASAG technique with two novel features. We propose an iterative technique on an ensemble of (a) a text classifier of student answers and (b) a classifier using numeric features derived from various similarity measures with respect to model answers. Second, we employ canonical correlation analysis based transfer learning on a common feature representation to build the classifier ensemble for questions having no labelled data. The proposed technique handsomely beats all winning supervised entries on the SCIENTSBANK dataset from the Student Response Analysis task of SemEval 2013. Additionally, we demonstrate generalizability and benefits of the proposed technique through evaluation on multiple ASAG datasets from different subject topics and standards.
Automatic Speech Recognition
In computer science and electrical engineering, speech recognition (SR) is the translation of spoken words into text. It is also known as ‘automatic speech recognition’ (ASR), ‘computer speech recognition’, or just ‘speech to text’ (STT). Some SR systems use ‘training’ (also called ‘enrolment’) where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person’s specific voice and uses it to fine-tune the recognition of that person’s speech, resulting in increased accuracy. Systems that do not use training are called ‘speaker independent’ systems. Systems that use training are called ‘speaker dependent’. Speech recognition applications include voice user interfaces such as voice dialling (e.g. ‘Call home’), call routing (e.g. ‘I would like to make a collect call’), domotic appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input). The term voice recognition or speaker identification refers to identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person’s voice or it can be used to authenticate or verify the identity of a speaker as part of a security process. From the technology perspective, speech recognition has a long history with several waves of major innovations. Most recently, the field has benefited from advances in deep learning and big data. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the world-wide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems. These speech industry players include Microsoft, Google, IBM, Baidu (China), Apple, Amazon, Nuance, IflyTek (China), many of which have publicized the core technology in their speech recognition systems being based on deep learning.
Automatic Summarization Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. As the problem of information overload has grown, and as the quantity of data has increased, so has interest in automatic summarization. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. An example of the use of summarization technology is search engines such as Google. Document summarization is another. Generally, there are two approaches to automatic summarization: extraction and abstraction. Extractive methods work by selecting a subset of existing words, phrases, or sentences in the original text to form the summary. In contrast, abstractive methods build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might generate. Such a summary might contain words not explicitly present in the original. Research into abstractive methods is an increasingly important and active research area, however due to complexity constraints, research to date has focused primarily on extractive methods.
AutoPruner Channel pruning is an important family of methods to speedup deep model’s inference. Previous filter pruning algorithms regard channel pruning and model fine-tuning as two independent steps. This paper argues that combining them in a single end-to-end trainable system will lead to better results. We propose an efficient channel selection layer, namely AutoPruner, to find less important filters automatically in a joint training manner. AutoPruner takes previous activation responses as input and generates a true binary index code for pruning. Hence, all the filters corresponding to zero index values can be removed safely after training. We empirically demonstrate that the gradient information of this channel selection layer is also helpful for the whole model training. Compared with previous state-of-the-art pruning algorithms, AutoPruner achieves significantly better performance. Furthermore, ablation experiments show that the proposed novel mini-batch pooling and binarization operations are vital for the success of filter pruning.
Autoregressive Conditional Duration
In financial econometrics, an autoregressive conditional duration (ACD, Engle and Russell (1998)) model considers irregularly spaced and autocorrelated intertrade durations. ACD is analogous to GARCH. Indeed, in a continuous double auction (a common trading mechanism in many financial markets) waiting times between two consecutive trades vary at random.
“Generalized Autoregressive Conditional Heteroscedasticity”
Autoregressive Conditional Heteroskedasticity
In econometrics, autoregressive conditional heteroskedasticity (ARCH) models are used to characterize and model observed time series. They are used whenever there is reason to believe that, at any point in a series, the error terms will have a characteristic size or variance. In particular ARCH models assume the variance of the current error term or innovation to be a function of the actual sizes of the previous time periods’ error terms: often the variance is related to the squares of the previous innovations. Such models are often called ARCH models (Engle, 1982), although a variety of other acronyms are applied to particular structures of model which have a similar basis. ARCH models are employed commonly in modeling financial time series that exhibit time-varying volatility clustering, i.e. periods of swings followed by periods of relative calm. ARCH-type models are sometimes considered to be part of the family of stochastic volatility models but strictly this is incorrect since at time t the volatility is completely pre-determined (deterministic) given previous values.
Autoregressive Conditional Poisson
When modelling count data that comes in the form of a time series, the static Poisson regression and standard time series models are often not appropriate. A current study therefore involves the evaluation of several observation-driven and parameter-driven time series models for count data. In the observation-driven class of models, a fairly simple model is the Autoregressive Conditional Poisson (ACP) model.
Autoregressive Distributed Lag Model
A distributed-lag model is a dynamic model in which the effect of a regressor x on y occurs over time rather than all at once. In the simple case of one explanatory variable and a linear relationship. This form is very similar to the infinite-moving-average representation of an ARMA process, except that the lag polynomial on the right-hand side is applied to the explanatory variable x rather than to a white-noise process e. The individual coefficients ßs are called lag weights and the collectively comprise the lag distribution. They define the pattern of how x affects y over time.
Autoregressive Fractionally Integrated Moving Average
In statistics, autoregressive fractionally integrated moving average models are time series models that generalize ARIMA (autoregressive integrated moving average) models by allowing non-integer values of the differencing parameter. These models are useful in modeling time series with long memory – that is, in which deviations from the long-run mean decay more slowly than an exponential decay. The acronyms “ARFIMA” or “FARIMA” are often used, although it is also conventional to simply extend the “ARIMA(p,d,q)” notation for models, by simply allowing the order of differencing, d, to take fractional values.
Autoregressive Integrated Moving Average
In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). They are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the “integrated” part of the model) can be applied to remove the non-stationarity.
The model is generally referred to as an ARIMA(p,d,q) model where parameters p, d, and q are non-negative integers that refer to the order of the autoregressive, integrated, and moving average parts of the model respectively. ARIMA models form an important part of the Box-Jenkins approach to time-series modelling.
When one of the three terms is zero, it is usual to drop “AR”, “I” or “MA” from the acronym describing the model. For example, ARIMA(0,1,0) is I(1), and ARIMA(0,0,1) is MA(1).
Autoregressive Integrated Moving Average With Explanatory Variable
Matlab: ARIMAX Model Specifications
ARIMA vs. ARIMAX – which approach is better to analyze and forecast macroeconomic time series?
Autoregressive Novelty Detector
We propose an unsupervised model for novelty detection. The subject is treated as a density estimation problem, in which a deep neural network is employed to learn a parametric function that maximizes probabilities of training samples. This is achieved by equipping an autoencoder with a novel module, responsible for the maximization of compressed codes’ likelihood by means of autoregression. We illustrate design choices and proper layers to perform autoregressive density estimation when dealing with both image and video inputs. Despite a very general formulation, our model shows promising results in diverse one-class novelty detection and video anomaly detection benchmarks.
Auto-Sklearn auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction. Learn more about the technology behind auto-sklearn by reading this paper published at the NIPS 2015 .
AutoSpearman The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated to defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.
Autostacker We introduce an automatic machine learning (AutoML) modeling architecture called Autostacker, which combines an innovative hierarchical stacking architecture and an Evolutionary Algorithm (EA) to perform efficient parameter search. Neither prior domain knowledge about the data nor feature preprocessing is needed. Using EA, Autostacker quickly evolves candidate pipelines with high predictive accuracy. These pipelines can be used as is or as a starting point for human experts to build on. Autostacker finds innovative combinations and structures of machine learning models, rather than selecting a single model and optimizing its hyperparameters. Compared with other AutoML systems on fifteen datasets, Autostacker achieves state-of-art or competitive performance both in terms of test accuracy and time cost.
“Automatic Machine Learning”
AutoTune Big data analytics frameworks (BDAFs) have been widely used for data processing applications. These frameworks provide a large number of configuration parameters to users, which leads to a tuning issue that overwhelms users. To address this issue, many automatic tuning approaches have been proposed. However, it remains a critical challenge to generate enough samples in a high-dimensional parameter space within a time constraint. In this paper, we present AutoTune–an automatic parameter tuning system that aims to optimize application execution time on BDAFs. AutoTune first constructs a smaller-scale testbed from the production system so that it can generate more samples, and thus train a better prediction model, under a given time constraint. Furthermore, the AutoTune algorithm produces a set of samples that can provide a wide coverage over the high-dimensional parameter space, and searches for more promising configurations using the trained prediction model. AutoTune is implemented and evaluated using the Spark framework and HiBench benchmark deployed on a public cloud. Extensive experimental results illustrate that AutoTune improves on default configurations by 63.70% on average, and on the five state-of-the-art tuning algorithms by 6%-23%.
Auxiliary Inference Divergence Estimator
Approximate probabilistic inference algorithms are central to many fields. Examples include sequential Monte Carlo inference in robotics, variational inference in machine learning, and Markov chain Monte Carlo inference in statistics. A key problem faced by practitioners is measuring the accuracy of an approximate inference algorithm on a specific dataset. This paper introduces the auxiliary inference divergence estimator (AIDE), an algorithm for measuring the accuracy of approximate inference algorithms. AIDE is based on the observation that inference algorithms can be treated as probabilistic models and the random variables used within the inference algorithm can be viewed as auxiliary variables. This view leads to a new estimator for the symmetric KL divergence between the output distributions of two inference algorithms. The paper illustrates application of AIDE to algorithms for inference in regression, hidden Markov, and Dirichlet process mixture models. The experiments show that AIDE captures the qualitative behavior of a broad class of inference algorithms and can detect failure modes of inference algorithms that are missed by standard heuristics.
Ava Enterprises increasingly employ a wide array of tools and processes to make data-driven decisions. However, there are large ine ciencies in the enterprise-wide work ow that stem from the fact that business work ows are expressed in natural language but the actual computational work ow has to be manually translated into computational programs. In this paper, we present an initial approach to bridge this gap by targeting the data science component of enterprise workflows. In many cases, this component is the slowest part of the overall enterprise process, and focusing on it allows us to take an initial step in solving the larger enterprise-wide productivity problem. In this initial approach, we propose using a chatbot to allow a data scientist to assemble data analytics pipelines. A crucial insight is that while precise interpretation of general natural language continues to be challenging, controlled natural language methods are starting to become practical as natural interfaces in complex decision-making domains. In addition, we recognize that data science workflow components are often templatized. Putting these two insights together, we develop a practical system, called Ava, that uses (controlled) natural language to program data science work ows. We have an initial proof-of-concept that demonstrates the potential of our approach.
Average Causal Effect
The average causal effect (ACE) is defined as the average increase or decrease in value caused by the intervention.
Average Hazard Ratio
“Hazard Ratio”
Estimation of the Average Hazard Ratio
Average Nearest Neighbor Degree
Average Nearest Neighbor Rank
The average nearest neighbor degree (ANND) of a node of degree $k$, as a function of $k$, is often used to characterize dependencies between degrees of a node and its neighbors in a network. We study the limiting behavior of the ANND in undirected random graphs with general i.i.d. degree sequences and arbitrary joint degree distribution of neighbor nodes, when the graph size tends to infinity. When the degree distribution has finite variance, the ANND converges to a deterministic function and we prove that for the configuration model, where nodes are connected at random, this, naturally, is a constant. For degree distributions with infinite variance, the ANND in the configuration model scales with the size of the graph and we prove a central limit theorem that characterizes this behavior. As a result, the ANND is uninformative for graphs with infinite variance degree distributions. We propose an alternative measure, the average nearest neighbor rank (ANNR) and prove its convergence to a deterministic function whenever the degree distribution has finite mean. In addition to our theoretical results we provide numerical experiments to show the convergence of both functions in the configuration model and the erased configuration model, where self-loops and multiple edges are removed. These experiments also shed new light on the well-known `structural negative correlations’, or `finite-size effects’, that arise in simple graphs, because large nodes can only have a limited number of large neighbors. In particular we show that the majority of such effects for regularly varying distributions are due to a sampling bias.
“Average Nearest Neighbor Degree”
Average Shifted Histogram
A simple device has been proposed for eliminating the bin edge problem of the frequency polygon while retaining many of the computational advantages of a density estimate based on bin counts. Scott (1983, 1985b) considered the problem of choosing among the collection of multivariate frequency polygons, each with the same smoothing parameter but di ering bin origins. Rather than choosing the \smoothest’ such curve or surface, he proposed averaging several of the shifted frequency polygons. As the average of piecewise linear curves is also piecewise linear, the resulting curve appears to be a frequency polygon as well. If the weights are nonnegative and sum to 1, the resulting \averaged shifted frequency polygon’ (ASFP) is nonnegative and integrates to 1. A nearly equivalent device is to average several shifted histograms, which is just as general but simpler to describe and analyze. The result is the \averaged shifted histogram’ (ASH). Since the average of piecewise constant functions such as the histogram is also piecewise constant, the ASH appears to be a histogram as well. In practice, the ASH is made continuous using either of the linear interpolation schemes described for the frequency polygon in Chapter 4 and will be referred to as the frequency polygon of the averaged shifted histogram (FP-ASH). The ASH is the practical choice for computationally and statistically e cient density estimation.
Average Shifted Visual Predictive Checks
“Visual Predictive Checks”
Average Top-k Loss In this work, we introduce the average top-$k$ (AT$_k$) loss as a new ensemble loss for supervised learning, which is the average over the $k$ largest individual losses over a training dataset. We show that the AT$_k$ loss is a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss, but can combines their advantages and mitigate their drawbacks to better adapt to different data distributions. Furthermore, it remains a convex function over all individual losses, which can lead to convex optimization problems that can be solved effectively with conventional gradient-based method. We provide an intuitive interpretation of the AT$_k$ loss based on its equivalent effect on the continuous individual loss functions, suggesting that it can reduce the penalty on correctly classified data. We further give a learning theory analysis of MAT$_k$ learning on the classification calibration of the AT$_k$ loss and the error bounds of AT$_k$-SVM. We demonstrate the applicability of minimum average top-$k$ learning for binary classification and regression using synthetic and real datasets.
Average Treatment Effect
The average treatment effect (ATE) is a measure used to compare treatments (or interventions) in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial (i.e., an experimental study), the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter (i.e., an estimand or property of a population) that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational and experimental study designs may enable one to estimate an ATE in a variety of ways.
Averaged One-Dependence Estimators
Averaged one-dependence estimators (AODE) is a probabilistic classification learning technique. It was developed to address the attribute-independence problem of the popular naive Bayes classifier. It frequently develops substantially more accurate classifiers than naive Bayes at the cost of a modest increase in the amount of computation.
Averaged Shifted Frequency Polygon
Awk AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems. AWK was created at Bell Labs in the 1970s, and its name is derived from the family names of its authors – Alfred Aho, Peter Weinberger, and Brian Kernighan. The acronym is pronounced the same as the name of the bird, auk (which acts as an emblem of the language such as on The AWK Programming Language book cover – the book is often referred to by the abbreviation TAPL). When written in all lowercase letters, as awk, it refers to the Unix or Plan 9 program that runs scripts written in the AWK programming language. The AWK language is a data-driven scripting language consisting of a set of actions to be taken against streams of textual data – either run directly on files or used as part of a pipeline – for purposes of extracting or transforming text, such as producing formatted reports. The language extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. While AWK has a limited intended application domain, and was especially designed to support one-liner programs, the language is Turing-complete, and even the early Bell Labs users of AWK often wrote well-structured large AWK programs.
AxTrain The intrinsic error tolerance of neural network (NN) makes approximate computing a promising technique to improve the energy efficiency of NN inference. Conventional approximate computing focuses on balancing the efficiency-accuracy trade-off for existing pre-trained networks, which can lead to suboptimal solutions. In this paper, we propose AxTrain, a hardware-oriented training framework to facilitate approximate computing for NN inference. Specifically, AxTrain leverages the synergy between two orthogonal methods—one actively searches for a network parameters distribution with high error tolerance, and the other passively learns resilient weights by numerically incorporating the noise distributions of the approximate hardware in the forward pass during the training phase. Experimental results from various datasets with near-threshold computing and approximation multiplication strategies demonstrate AxTrain’s ability to obtain resilient neural network parameters and system energy efficiency improvement.
Azure Machine Learning Azure Machine Learning, a fully-managed service in the cloud that enables you to easily build, deploy and share advanced analytics solutions. No software to download, no servers to manage – all you need to start is a browser and internet connectivity.