Zap-Q Learning We propose a novel reinforcement learning algorithm that approximates solutions to the problem of discounted optimal stopping in an irreducible, uniformly ergodic Markov chain evolving on a compact subset of $\mathbb R^n$. A dynamic programming approach has been taken by Tsitsikilis and Van Roy to solve this problem, wherein they propose a Q-learning algorithm to estimate the value function, in a linear function approximation setting. The Zap-Q learning algorithm proposed in this work is the first algorithm that is designed to achieve {optimal asymptotic variance}. We prove convergence of the algorithm using ODE analysis, and the optimal asymptotic variance property is reflected via fast convergence in a finance example.
Zelig A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the collective library. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical work-flow–procedures and algorithms that may be essential to one user’s application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and re-weighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.
ZenLDA This paper presents our recent efforts, zenLDA, an efficient and scalable Collapsed Gibbs Sampling system for Latent Dirichlet Allocation training, which is thought to be challenging that both data parallelism and model parallelism are required because of the Big sampling data with up to billions of documents and Big model size with up to trillions of parameters. zenLDA combines both algorithm level improvements and system level optimizations. It first presents a novel CGS algorithm that balances the time complexity, model accuracy and parallelization flexibility. The input corpus in zenLDA is represented as a directed graph and model parameters are annotated as the corresponding vertex attributes. The distributed training is parallelized by partitioning the graph that in each iteration it first applies CGS step for all partitions in parallel, followed by synchronizing the computed model each other. In this way, both data parallelism and model parallelism are achieved by converting them to graph parallelism. We revisited the tradeoff between system efficiency and model accuracy and presented approximations such as unsynchronized model, sparse model initialization and ‘converged’ token exclusion. zenLDA is built on GraphX in Spark that provides distributed data abstraction (RDD) and expressive APIs to simplify the programming efforts and simultaneously hides the system complexities. This enables us to implement other CGS algorithm with a few lines of code change. To better fit in distributed data-parallel framework and achieve comparable performance with contemporary systems, we also presented several system level optimizations to push the performance limit. zenLDA was evaluated it against web-scale corpus, and the result indicates that zenLDA can achieve about much better performance than other CGS algorithm we implemented, and simultaneously achieve better model accuracy.
Zeno++ We propose Zeno++, a new robust asynchronous synchronous Stochastic Gradient Descent~(SGD) under a general Byzantine failure model with unbounded number of Byzantine workers.
Zero Inflation In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.
The zero-inflated Poisson model concerns a random event containing excess zero-count data in unit time. For example, the number of insurance claims within a population for a certain type of risk would be zero-inflated by those people who have not taken out insurance against the risk and thus are unable to claim. The zero-inflated Poisson (ZIP) model employs two components that correspond to two zero generating processes. The first process is governed by a binary distribution that generates structural zeros. The second process is governed by a Poisson distribution that generates counts, some of which may be zero.
Zero Initialization
Single layer Feedforward Neural Network(FNN) is used many a time as a last layer in models such as seq2seq or could be a simple RNN network. The importance of such layer is to transform the output to our required dimensions. When it comes to weights and biases initialization, there is no such specific technique that could speed up the learning process. We could depend on deep network initialization techniques such as Xavier or He initialization. But such initialization fails to show much improvement in learning speed or accuracy. In this paper we propose Zero Initialization (ZI) for weights of a single layer network. We first test this technique with on a simple RNN network and compare the results against Xavier, He and Identity initialization. As a final test we implement it on a seq2seq network. It was found that ZI considerably reduces the number of epochs used and improve the accuracy. The developed model has been applied for short-term load forecasting using the load data of Australian Energy Market. The model is able to forecast the day ahead load accurately with error of 0.94%.
Zero/One Inflated Beta Regression
A general class of regression models for continuous proportions when the data contain zeros or ones. The proposed class of models assumes that the response variable has a mixed continuous-discrete distribution with probability mass at zero or one. The beta distribution is used to describe the continuous component of the model, since its density has a wide range of different shapes depending on the values of the two parameters that index the distribution. We use a suitable parameterization of the beta law in terms of its mean and a precision parameter. The parameters of the mixture distribution are modeled as functions of regression parameters.
“Beta Regression”
Zeros Ones Inflated Proportional The ZOIP distribution (Zeros Ones Inflated Proportional) is a proportional data distribution inflated with zeros and/or ones, this distribution is defined on the most known proportional data distributions, the beta and simplex distribution, Jørgensen and Barndorff-Nielsen (1991) <doi:10.1016/0047-259X(91)90008-P>, also allows it to have different parameterizations of the beta distribution, Ferrari and Cribari-Neto (2004) <doi:10.1080/0266476042000214501>, Rigby and Stasinopoulos (2005) <doi:10.18637/jss.v023.i07>. The ZOIP distribution has four parameters, two of which correspond to the proportion of zeros and ones, and the other two correspond to the distribution of the proportional data of your choice. The ‘ZOIP’ package allows adjustments of regression models for fixed and mixed effects for proportional data inflated with zeros and/or ones.
Zero-Shot Detection
Current Zero-Shot Learning (ZSL) approaches are restricted to recognition of a single dominant unseen object category in a test image. We hypothesize that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complex scene, warranting both the `recognition’ and `localization’ of an unseen category. To address this limitation, we introduce a new \emph{`Zero-Shot Detection’} (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories without any training examples. We also propose a new experimental protocol for ZSD based on the highly challenging ILSVRC dataset, adhering to practical issues, e.g., the rarity of unseen objects. To the best of our knowledge, this is the first end-to-end deep network for ZSD that jointly models the interplay between visual and semantic domain information. To overcome the noise in the automatically derived semantic descriptions, we utilize the concept of meta-classes to design an original loss function that achieves synergy between max-margin class separation and semantic space clustering. Furthermore, we present a baseline approach extended from recognition to detection setting. Our extensive experiments show significant performance boost over the baseline on the imperative yet difficult ZSD problem.
Zero-Shot Knowledge Distillation Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method ‘Zero-Shot Knowledge Distillation’ and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.
Zero-Shot Learning
Zero-shot learning (ZSL) is a challenging task aiming at recognizing novel classes without any training instances.
Zero-Shot Learning by Generating Pseudo Feature Representations
Zest Programs expecting structured inputs often consist of both a syntactic analysis stage in which raw input is parsed into an internal data structure and a semantic analysis stage which conducts checks on this data structure and executes the core logic of the program. Existing random testing methodologies, like coverage-guided fuzzing (CGF) and generator-based fuzzing, tend to produce inputs that are rejected early in one of these two stages. We propose Zest, a random testing methodology that effectively explores the semantic analysis stages of such programs. Zest combines two key innovations to achieve this. First, we introduce validity fuzzing, which biases CGF towards generating semantically valid inputs. Second, we introduce parametric generators, which convert input from a simple parameter domain, such as a sequence of numbers, into a more structured domain, such as syntactically valid XML. These generators enable parameter-level mutations to map to structural mutations in syntactically valid test inputs. We implement Zest in Java and evaluate it against AFL and QuickCheck, popular CGF and generator-based fuzzing tools, on six real-world benchmarks: Apache Maven, Ant, and BCEL, ScalaChess, the Google Closure compiler, and Mozilla Rhino. We find that Zest achieves the highest coverage of the semantic analysis stage for five of these benchmarks. Further, we find 18 new bugs across the benchmarks, including 7 bugs that are uniquely found by Zest.
Z-Forcing Many efforts have been devoted to training generative latent variable models with autoregressive decoders, such as recurrent neural networks (RNN). Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech. We unify successful ideas from recently proposed architectures into a stochastic recurrent model: each step in the sequence is associated with a latent variable that is used to condition the recurrent dynamics for future steps. Training is performed with amortized variational inference where the approximate posterior is augmented with a RNN that runs backward through the sequence. In addition to maximizing the variational lower bound, we ease training of the latent variables by adding an auxiliary cost which forces them to reconstruct the state of the backward recurrent network. This provides the latent variables with a task-independent objective that enhances the performance of the overall model. We found this strategy to perform better than alternative approaches such as KL annealing. Although being conceptually simple, our model achieves state-of-the-art results on standard speech benchmarks such as TIMIT and Blizzard and competitive performance on sequential MNIST. Finally, we apply our model to language modeling on the IMDB dataset where the auxiliary cost helps in learning interpretable latent variables. Source Code: \url{https://…/zforcing_nips17}
ZhuSuan In this paper we introduce ZhuSuan, a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning. ZhuSuan is built upon Tensorflow. Unlike existing deep learning libraries, which are mainly designed for deterministic neural networks and supervised tasks, ZhuSuan is featured for its deep root into Bayesian inference, thus supporting various kinds of probabilistic models, including both the traditional hierarchical Bayesian models and recent deep generative models. We use running examples to illustrate the probabilistic programming on ZhuSuan, including Bayesian logistic regression, variational auto-encoders, deep sigmoid belief networks and Bayesian recurrent neural networks.
Zigzag Learning This paper addresses weakly supervised object detection with only image-level supervision at training stage. Previous approaches train detection models with entire images all at once, making the models prone to being trapped in sub-optimums due to the introduced false positive examples. Unlike them, we propose a zigzag learning strategy to simultaneously discover reliable object instances and prevent the model from overfitting initial seeds. Towards this goal, we first develop a criterion named mean Energy Accumulation Scores (mEAS) to automatically measure and rank localization difficulty of an image containing the target object, and accordingly learn the detector progressively by feeding examples with increasing difficulty. In this way, the model can be well prepared by training on easy examples for learning from more difficult ones and thus gain a stronger detection ability more efficiently. Furthermore, we introduce a novel masking regularization strategy over the high level convolutional feature maps to avoid overfitting initial samples. These two modules formulate a zigzag learning process, where progressive learning endeavors to discover reliable object instances, and masking regularization increases the difficulty of finding object instances properly. We achieve 47.6% mAP on PASCAL VOC 2007, surpassing the state-of-the-arts by a large margin.
Zig-Zag Sampler RZigZag
Zipf’s Law Zipf’s law, an empirical law formulated using mathematical statistics, refers to the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution, one of a family of related discrete power law probability distributions.
ZK-GanDef Neural Network classifiers have been used successfully in a wide range of applications. However, their underlying assumption of attack free environment has been defied by adversarial examples. Researchers tried to develop defenses; however, existing approaches are still far from providing effective solutions to this evolving problem. In this paper, we design a generative adversarial net (GAN) based zero knowledge adversarial training defense, dubbed ZK-GanDef, which does not consume adversarial examples during training. Therefore, ZK-GanDef is not only efficient in training but also adaptive to new adversarial examples. This advantage comes at the cost of small degradation in test accuracy compared to full knowledge approaches. Our experiments show that ZK-GanDef enhances test accuracy on adversarial examples by up-to 49.17% compared to zero knowledge approaches. More importantly, its test accuracy is close to that of the state-of-the-art full knowledge approaches (maximum degradation of 8.46%), while taking much less training time.
ZNN Convolutional networks (ConvNets) have become a popular approach to computer vision. It is important to accelerate ConvNet training, which is computationally costly. We propose a novel parallel algorithm based on decomposition into a set of tasks, most of which are convolutions or FFTs. Applying Brent’s theorem to the task dependency graph implies that linear speedup with the number of processors is attainable within the PRAM model of parallel computation, for wide network architectures. To attain such performance on real shared-memory machines, our algorithm computes convolutions converging on the same node of the network with temporal locality to reduce cache misses, and sums the convergent convolution outputs via an almost wait-free concurrent method to reduce time spent in critical sections. We implement the algorithm with a publicly available software package called ZNN. Benchmarking with multi-core CPUs shows that ZNN can attain speedup roughly equal to the number of physical cores. We also show that ZNN can attain over 90x speedup on a many-core CPU (Xeon Phi Knights Corner). These speedups are achieved for network architectures with widths that are in common use. The task parallelism of the ZNN algorithm is suited to CPUs, while the SIMD parallelism of previous algorithms is compatible with GPUs. Through examples, we show that ZNN can be either faster or slower than certain GPU implementations depending on specifics of the network architecture, kernel sizes, and density and size of the output patch. ZNN may be less costly to develop and maintain, due to the relative ease of general-purpose CPU programming.
Zolotarev Distance In this paper the concept of a metric in the space of random variables defined on a probability space is introduced. The principle of three stages in the study of approximation problems is formulated, in particular problems of approximating distributions. Various facts connected with the use of metrics in these three stages are presented and proved. In the second part of the paper a series of results is introduced which are related to stability problems in characterizing distributions and to problems of estimating the remainder terms in limiting approximations of distributions of sums of independent random variables.
Rate of convergence to alpha stable law using Zolotarev distance : technical report
zoNNscan The training of deep neural network classifiers results in decision boundaries which geometry is still not well understood. This is in direct relation with classification problems such as so called adversarial examples. We introduce zoNNscan, an index that is intended to inform on the boundary uncertainty (in terms of the presence of other classes) around one given input datapoint. It is based on confidence entropy, and is implemented through sampling in the multidimensional ball surrounding that input. We detail the zoNNscan index, give an algorithm for approximating it, and finally illustrate its benefits on four applications, including two important problems for the adoption of deep networks in critical systems: adversarial examples and corner case inputs. We highlight that zoNNscan exhibits significantly higher values than for standard inputs in those two problem classes.
Zoom With the advancement of machine learning and deep learning, vector search becomes instrumental to many information retrieval systems, to search and find best matches to user queries based on their semantic similarities.These online services require the search architecture to be both effective with high accuracy and efficient with low latency and memory footprint, which existing work fails to offer. We develop, Zoom, a new vector search solution that collaboratively optimizes accuracy, latency and memory based on a multiview approach. (1) A ‘preview’ step generates a small set of good candidates, leveraging compressed vectors in memory for reduced footprint and fast lookup. (2) A ‘fullview’ step on SSDs reranks those candidates with their full-length vector, striking high accuracy. Our evaluation shows that, Zoom achieves an order of magnitude improvements on efficiency while attaining equal or higher accuracy, comparing with the state-of-the-art.
Zooming Network
Structural information is important in natural language understanding. Although some current neural net-based models have a limited ability to take local syntactic information, they fail to use high-level and large-scale structures of documents. This information is valuable for text understanding since it contains the author’s strategy to express information, in building an effective representation and forming appropriate output modes. We propose a neural net-based model, Zooming Network, capable of representing and leveraging text structure of long document and developing its own analyzing rhythm to extract critical information. Generally, ZN consists of an encoding neural net that can build a hierarchical representation of a document, and an interpreting neural model that can read the information at multi-levels and issuing labeling actions through a policy-net. Our model is trained with a hybrid paradigm of supervised learning (distinguishing right and wrong decision) and reinforcement learning (determining the goodness among multiple right paths). We applied the proposed model to long text sequence labeling tasks, with performance exceeding baseline model (biLSTM-crf) by 10 F1-measure.
Zoom-Net Recognizing visual relationships <subject-predicate-object> among any pair of localized objects is pivotal for image understanding. Previous studies have shown remarkable progress in exploiting linguistic priors or external textual information to improve the performance. In this work, we investigate an orthogonal perspective based on feature interactions. We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors. To this end, we present two new pooling cells to encourage feature interactions: (i) Contrastive ROI Pooling Cell, which has a unique deROI pooling that inversely pools local object features to the corresponding area of global predicate features. (ii) Pyramid ROI Pooling Cell, which broadcasts global predicate features to reinforce local object features.The two cells constitute a Spatiality-Context-Appearance Module (SCA-M), which can be further stacked consecutively to form our final Zoom-Net.We further shed light on how one could resolve ambiguous and noisy object and predicate annotations by Intra-Hierarchical trees (IH-tree). Extensive experiments conducted on Visual Genome dataset demonstrate the effectiveness of our feature-oriented approach compared to state-of-the-art methods (Acc@1 11.42% from 8.16%) that depend on explicit modeling of linguistic interactions. We further show that SCA-M can be incorporated seamlessly into existing approaches to improve the performance by a large margin. The source code will be released on https://…/ZoomNet.