W2VLDA  With the increase of online customer opinions in specialised websites and social networks, the necessity of automatic systems to help to organise and classify customer reviews by domainspecific aspect/categories and sentiment polarity is more important than ever. Supervised approaches to Aspect Based Sentiment Analysis obtain good results for the domain/language their are trained on, but having manually labelled data for training supervised systems for all domains and languages use to be very costly and time consuming. In this work we describe W2VLDA, an unsupervised system based on topic modelling, that combined with some other unsupervised methods and a minimal configuration, performs aspect/category classifiation, aspectterms/opinionwords separation and sentiment polarity classification for any given domain and language. We also evaluate the performance of the aspect and sentiment classification in the multilingual SemEval 2016 task 5 (ABSA) dataset. We show competitive results for several languages (English, Spanish, French and Dutch) and domains (hotels, restaurants, electronicdevices). 
Waffle Chart / Square Pie Chart  A littleknown alternative to the round pie chart is the square pie or waffle chart. It consists of a square that is divided into 10×10 cells, making it possible to read values precisely down to a single percent. Depending on how the areas are laid out (as square as possible seems to be the best idea), it is very easy to compare parts to the whole. http://…echartsinrwiththenewwafflepackage waffle 
Waikato Environment for Knowledge Analysis (WEKA) 
Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. Weka is free software available under the GNU General Public License. 
WaitifDiff and WaitifWorse Agent  (Cho and Esipova, 2016) Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation 
wakefield  wakefield is a Github based R package which is designed to quickly generate random data sets. The user passes n (number of rows) and predefined vectors to the r_data_frame function to produce a dplyr::tbl_df object. 
WakeSleep Algorithm  The wakesleep algorithm is an unsupervised learning algorithm for a multilayer neural network (e.g. sigmoid belief net). Training is divided into two phases, ‘wake’ and ‘sleep’. In the ‘wake’ phase, neurons are driven by recognition connections (connections from what would normally be considered an input to what is normally considered an output), while generative connections (those from outputs to inputs) are modified to increase the probability that they would reconstruct the correct activity in the layer below (closer to the sensory input). In the ‘sleep’ phase the process is reversed: neurons are driven by generative connections, while recognition connections are modified to increase the probability that they would produce the correct activity in the layer above (further from sensory input). GitXiv 
WalkSteered Convolution (WSC) 
Graph classification is a fundamental but challenging problem due to the nonEuclidean property of graph. In this work, we jointly leverage the powerful representation ability of random walk and the essential success of standard convolutional network work (CNN), to propose a random walk based convolutional network, called walksteered convolution (WSC). Different from those existing graph CNNs with deterministic neighbor searching, we randomly sample multiscale walk fields by using random walk, which is more flexible to the scalability of graph. To encode eachscale walk field consisting of several walk paths, specifically, we characterize the directions of walk field by multiple Gaussian models so as to better analogize the standard CNNs on images. Each Gaussian implicitly defines a directions and all of them properly encode the spatial layout of walks after the gradient projecting to the space of Gaussian parameters. Further, a graph coarsening layer using dynamical clustering is stacked upon the Gaussian encoding to capture highlevel semantics of graph. Comprehensive evaluations on several public datasets well demonstrate the superiority of our proposed graph learning method over other stateofthearts for graph classification. 
Walktrap Community Algorithm  Tries to find densely connected subgraphs, also called communities in a graph via random walks. The idea is that short random walks tend to stay in the same community. igraph 
Wallaroo  Wallaroo is a fast, elastic data processing engine that rapidly takes you from prototype to production by eliminating infrastructure complexity. Wallaroo is a fast and elastic data processing engine that rapidly takes you from prototype to production by making the infrastructure virtually disappear. We´ve designed it to handle demanding highthroughput, lowlatency tasks where the accuracy of results is essential. Wallaroo takes care of mechanics of scaling, resilience, state management, and message delivery. We’ve designed Wallaroo to make it easy scale applications with no code changes, and allow programmers to focus on business logic. 
Walsh Figure of Merit  LowWAFOMNX 
Ward Hierarchical Clustering  ➘ “Ward’s Method” Ward’s Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm 
Ward’s Method  In statistics, Ward’s method is a criterion applied in hierarchical cluster analysis. Ward’s minimum variance method is a special case of the objective function approach originally presented by Joe H. Ward, Jr. Ward suggested a general agglomerative hierarchical clustering procedure, where the criterion for choosing the pair of clusters to merge at each step is based on the optimal value of an objective function. This objective function could be ‘any function that reflects the investigator’s purpose.’ Many of the standard clustering procedures are contained in this very general class. To illustrate the procedure, Ward used the example where the objective function is the error sum of squares, and this example is known as Ward’s method or more precisely Ward’s minimum variance method. Ward’s Method 
WarpFlow  WarpFlow is a fast, interactive data querying and processing system with a focus on petabytescale spatiotemporal datasets and Tesseract queries. With the rapid growth in smartphones and mobile navigation services, we now have an opportunity to radically improve urban mobility and reduce friction in how people and packages move globally every minutemile, with data. WarpFlow speeds up three key metrics for data engineers working on such datasets — timetofirstresult, timetofullscaleresult, and timetotrainedmodel for machine learning. 
WarpLDA  Developing efficient and scalable algorithms for Latent Dirichlet Allocation (LDA) is of wide interest for many applications. Previous work has developed an $O(1)$ MetropolisHastings sampling method for each token. However, the performance is far from being optimal due to random accesses to the parameter matrices and frequent cache misses. In this paper, we propose WarpLDA, a novel $O(1)$ sampling algorithm for LDA. WarpLDA is a MetropolisHastings based algorithm which is designed to optimize the cache hit rate. Advantages of WarpLDA include 1) Efficiency and scalability: WarpLDA has good locality and carefully designed partition method, and can be scaled to hundreds of machines; 2) Simplicity: WarpLDA does not have any complicated modules such as alias tables, hybrid data structures, or parameter servers, making it easy to understand and implement; 3) Robustness: WarpLDA is consistently faster than other algorithms, under various settings from smallscale to massivescale dataset and model. WarpLDA is 515x faster than stateoftheart LDA samplers, implying less cost of time and money. With WarpLDA users can learn up to one million topics from hundreds of millions of documents in a few hours, at the speed of 2G tokens per second, or learn topics from smallscale datasets in seconds. 
Wasserstein AutoEncoder (WAE) 
We propose the Wasserstein AutoEncoder (WAE)—a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads to a different regularizer than the one used by the Variational AutoEncoder (VAE). This regularizer encourages the encoded training distribution to match the prior. We compare our algorithm with several other techniques and show that it is a generalization of adversarial autoencoders (AAE). Our experiments show that WAE shares many of the properties of VAEs (stable training, encoderdecoder architecture, nice latent manifold structure) while generating samples of better quality, as measured by the FID score. 
Wasserstein Barycenter  Wasserstein barycenter is a single distribution that summarizes a collection of input measures while respecting their geometry. 
Wasserstein CNN (WCNN) 
Heterogeneous face recognition (HFR) aims to match facial images acquired from different sensing modalities with missioncritical applications in forensics, security and commercial sectors. However, HFR is a much more challenging problem than traditional face recognition because of large intraclass variations of heterogeneous face images and limited training samples of crossmodality face image pairs. This paper proposes a novel approach namely Wasserstein CNN (convolutional neural networks, or WCNN for short) to learn invariant features between nearinfrared and visual face images (i.e. NIRVIS face recognition). The lowlevel layers of WCNN are trained with widely available face images in visual spectrum. The highlevel layer is divided into three parts, i.e., NIR layer, VIS layer and NIRVIS shared layer. The first two layers aims to learn modalityspecific features and NIRVIS shared layer is designed to learn modalityinvariant feature subspace. Wasserstein distance is introduced into NIRVIS shared layer to measure the dissimilarity between heterogeneous feature distributions. So WCNN learning aims to achieve the minimization of Wasserstein distance between NIR distribution and VIS distribution for invariant deep feature representation of heterogeneous face images. To avoid the overfitting problem on smallscale heterogeneous face data, a correlation prior is introduced on the fullyconnected layers of WCNN network to reduce parameter space. This prior is implemented by a lowrank constraint in an endtoend network. The joint formulation leads to an alternating minimization for deep feature representation at training stage and an efficient computation for heterogeneous data at testing stage. Extensive experiments on three challenging NIRVIS face recognition databases demonstrate the significant superiority of Wasserstein CNN over stateoftheart methods. 
Wasserstein Discriminant Analysis (WDA) 
Wasserstein Discriminant Analysis (WDA) is a new supervised method that can improve classification of highdimensional data by computing a suitable linear map onto a lower dimensional subspace. Following the blueprint of classical Linear Discriminant Analysis (LDA), WDA selects the projection matrix that maximizes the ratio of two quantities: the dispersion of projected points coming from different classes, divided by the dispersion of projected points coming from the same class. To quantify dispersion, WDA uses regularized Wasserstein distances, rather than crossvariance measures which have been usually considered, notably in LDA. Thanks to the the underlying principles of optimal transport, WDA is able to capture both global (at distribution scale) and local (at samples scale) interactions between classes. Regularized Wasserstein distances can be computed using the Sinkhorn matrix scaling algorithm; We show that the optimization of WDA can be tackled using automatic differentiation of Sinkhorn iterations. Numerical experiments show promising results both in terms of prediction and visualization on toy examples and real life datasets such as MNIST and on deep features obtained from a subset of the Caltech dataset. 
Wasserstein Distance  ➘ “Wasserstein Metric” 
Wasserstein GAN (WGAN) 
Despite being impactful on a variety of problems and applications, the generative adversarial nets (GANs) are remarkably difficult to train. This issue is formally analyzed by \cite{arjovsky2017towards}, who also propose an alternative direction to avoid the caveats in the minmax twoplayer training of GANs. The corresponding algorithm, called Wasserstein GAN (WGAN), hinges on the 1Lipschitz continuity of the discriminator. In this paper, we propose a novel approach to enforcing the Lipschitz continuity in the training procedure of WGANs. Our approach seamlessly connects WGAN with one of the recent semisupervised learning methods. As a result, it gives rise to not only better photorealistic samples than the previous methods but also stateoftheart semisupervised learning results. In particular, our approach gives rise to the inception score of more than 5.0 with only 1,000 CIFAR10 images and is the first that exceeds the accuracy of 90% on the CIFAR10 dataset using only 4,000 labeled images, to the best of our knowledge. 
Wasserstein Identity Testing Problem  Uniformity testing and the more general identity testing are well studied problems in distributional property testing. Most previous work focuses on testing under $L_1$distance. However, when the support is very large or even continuous, testing under $L_1$distance may require a huge (even infinite) number of samples. Motivated by such issues, we consider the identity testing in Wasserstein distance (a.k.a. transportation distance and earthmover distance) on a metric space (discrete or continuous). In this paper, we propose the Wasserstein identity testing problem (Identity Testing in Wasserstein distance). We obtain nearly optimal worstcase sample complexity for the problem. Moreover, for a large class of probability distributions satisfying the socalled ‘Doubling Condition’, we provide nearly instanceoptimal sample complexity. 
Wasserstein Introspective Neural Network (WINN) 
We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model. WINN provides a significant improvement over the recent introspective neural networks (INN) method by enhancing INN’s generative modeling capability. WINN has three interesting properties: (1) A mathematical connection between the formulation of Wasserstein generative adversarial networks (WGAN) and the INN algorithm is made; (2) The explicit adoption of the WGAN term into INN results in a large enhancement to INN, achieving compelling results even with a single classifier on e.g., providing a 20 times reduction in model size over INN within texture modeling; (3) When applied to supervised classification, WINN also gives rise to greater robustness with an $88\%$ reduction of errors against adversarial examples — improved over the result of $39\%$ by an INNfamily algorithm. In the experiments, we report encouraging results on unsupervised learning problems including texture, face, and object modeling, as well as a supervised classification task against adversarial attack. 
Wasserstein Metric  In mathematics, the Wasserstein (or Vasershtein) metric is a distance function defined between probability distributions on a given metric space M. Intuitively, if each distribution is viewed as a unit amount of ‘dirt’ piled on M, the metric is the minimum ‘cost’ of turning one pile into the other, which is assumed to be the amount of dirt that needs to be moved times the distance it has to be moved. Because of this analogy, the metric is known in computer science as the earth mover’s distance. The name ‘Wasserstein distance’ was coined by R. L. Dobrushin in 1970, after the Russian mathematician Leonid Vaseršteĭn who introduced the concept in 1969. Most Englishlanguage publications use the German spelling ‘Wasserstein’ (attributed to the name ‘Vasershtein’ being of German origin). ➚ “Earth Mover’s Distance” Wasserstein Distance D3M 
Wasserstein Transform  We introduce the Wasserstein transform, a method for enhancing and denoising datasets defined on general metric spaces. The construction draws inspiration from Optimal Transportation ideas. We establish precise connections with the mean shift family of algorithms and establish the stability of both our method and mean shift under data perturbation. 
Wasserstein Variational Gradient Descent  Particlebased variational inference offers a flexible way of approximating complex posterior distributions with a set of particles. In this paper we introduce a new particlebased variational inference method based on the theory of semidiscrete optimal transport. Instead of minimizing the KL divergence between the posterior and the variational approximation, we minimize a semidiscrete optimal transport divergence. The solution of the resulting optimal transport problem provides both a particle approximation and a set of optimal transportation densities that map each particle to a segment of the posterior distribution. We approximate these transportation densities by minimizing the KL divergence between a truncated distribution and the optimal transport solution. The resulting algorithm can be interpreted as a form of ensemble variational inference where each particle is associated with a local variational approximation. 
Wasserstein Variational Inference  This paper introduces Wasserstein variational inference, a new form of approximate Bayesian inference based on optimal transport theory. Wasserstein variational inference uses a new family of divergences that includes both fdivergences and the Wasserstein distance as special cases. The gradients of the Wasserstein variational loss are obtained by backpropagating through the Sinkhorn iterations. This technique results in a very stable likelihoodfree training method that can be used with implicit distributions and probabilistic programs. Using the Wasserstein variational inference framework, we introduce several new forms of autoencoders and test their robustness and performance against existing variational autoencoding techniques. 
WassersteinWasserstein AutoEncoder (WWAE) 
To address the challenges in learning deep generative models (e.g.,the blurriness of variational autoencoder and the instability of training generative adversarial networks, we propose a novel deep generative model, named WassersteinWasserstein autoencoders (WWAE). We formulate WWAE as minimization of the penalized optimal transport between the target distribution and the generated distribution. By noticing that both the prior $P_Z$ and the aggregated posterior $Q_Z$ of the latent code Z can be well captured by Gaussians, the proposed WWAE utilizes the closedform of the squared Wasserstein2 distance for two Gaussians in the optimization process. As a result, WWAE does not suffer from the sampling burden and it is computationally efficient by leveraging the reparameterization trick. Numerical results evaluated on multiple benchmark datasets including MNIST, fashion MNIST and CelebA show that WWAE learns better latent structures than VAEs and generates samples of better visual quality and higher FID scores than VAEs and GANs. 
WatanabeAkaike Information Criteria (WAIC) 
WAIC (the WatanabeAkaike or widely applicable information criterion; Watanabe, 2010) can be viewed as an improvement on the deviance information criterion (DIC) for Bayesian models. DIC has gained popularity in recent years in part through its implementation in the graphical modeling package BUGS (Spiegelhalter, Best, et al., 2002; Spiegelhalter, Thomas, et al., 1994, 2003), but is known to have some problems, arising in part from it not being fully Bayesian in that it is based on a point estimate (van der Linde, 2005, Plummer, 2008). For example, DIC can produce negative estimates of the effective number of parameters in a model and it is not defined for singular models. WAIC is fully Bayesian and closely approximates Bayesian crossvalidation. Unlike DIC, WAIC is invariant to parametrization and also works for singular models. A Widely Applicable Bayesian Information Criterion 
Watchdog AI (WAI) 
Artificial Intelligence (AI) technologies could be broadly categorised into Analytics and Autonomy. Analytics focuses on algorithms offering perception, comprehension, and projection of knowledge gleaned from sensorial data. Autonomy revolves around decision making, and influencing and shaping the environment through action production. A smart autonomous system (SAS) combines analytics and autonomy to understand, learn, decide and act autonomously. To be useful, SAS must be trusted and that requires testing. Lifelong learning of a SAS compounds the testing process. In the remote chance that it is possible to fully test and certify the system prerelease, which is theoretically an undecidable problem, it is near impossible to predict the future behaviours that these systems, alone or collectively, will exhibit. While it may be feasible to severely restrict such systems\textquoteright \ learning abilities to limit the potential unpredictability of their behaviours, an undesirable consequence may be severely limiting their utility. In this paper, we propose the architecture for a watchdog AI (WAI) agent dedicated to lifelong functional testing of SAS. We further propose system specifications including a level of abstraction whereby humans shepherd a swarm of WAI agents to oversee an ecosystem made of humans and SAS. The discussion extends to the challenges, pros, and cons of the proposed concept. 
Waterfall Bandits  A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that order. The publisher picks the order and prices to maximize her revenue. A traditional solution is to learn the demand model and then subsequently solve the optimization problem for the given demand model. This will incur a linear regret. We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret. We evaluate the algorithm on both synthetic and realworld data, and show that it quickly learns high quality pricing strategies. This is the first principled study of learning a waterfall design online by sequential experimentation. 
Waterfall Chart  A waterfall chart is a form of data visualization that helps in understanding the cumulative effect of sequentially introduced positive or negative values. The waterfall chart is also known as a flying bricks chart or Mario chart due to the apparent suspension of columns (bricks) in midair. Often in finance, it will be referred to as a bridge. Waterfall charts were popularized by the strategic consulting firm McKinsey & Company in its presentations to clients. The waterfall chart is normally used for understanding how an initial value is affected by a series of intermediate positive or negative values. Usually the initial and the final values are represented by whole columns, while the intermediate values are denoted by floating columns. The columns are colorcoded for distinguishing between positive and negative values. ➘ “Waterfall Chart” Understanding Waterfall Plots Waterfall plots – what and how? 
Waterfall Plot  A waterfall plot is a threedimensional plot in which multiple curves of data, typically spectra, are displayed simultaneously. Typically the curves are staggered both across the screen and vertically, with ‘nearer’ curves masking the ones behind. The result is a series of ‘mountain’ shapes that appear to be side by side. The waterfall plot is often used to show how twodimensional information changes over time or some other variable such as rpm. The term ‘waterfall plot’ is sometimes used interchangeably with ‘spectrogram’ or ‘Cumulative Spectral Decay’ (CSD) plot. 
wav2letter++  This paper introduces wav2letter++, the fastest opensource deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major opensource speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training endtoend neural networks for speech recognition. We also show that wav2letter++’s training times scale linearly to 64 GPUs, the highest we tested, for models with 100 million parameters. Highperformance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks. Introducing Wav2letter++ 
Wav2Pix  Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an endtoend fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or onehot encoding). Our model is trained in a selfsupervised approach by exploiting the audio and visual signals naturally aligned in videos. With the purpose of training from video data, we present a novel dataset collected for this work, with highquality videos of youtubers with notable expressiveness in both the speech and visual signals. 
wav2vec  We explore unsupervised pretraining for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model training. We pretrain a simple multilayer convolutional neural network optimized via a noise contrastive binary classification task. Our experiments on WSJ reduce WER of a strong characterbased logmel filterbank baseline by up to 32% when only a few hours of transcribed data is available. Our approach achieves 2.78% WER on the nov92 test set. This outperforms Deep Speech 2, the best reported characterbased system in the literature while using three orders of magnitude less labeled training data. 
Wave Oriented Swarm Programming Paradigm (WOSPP) 
In this work, we present a programming paradigm allowing the control of swarms with a minimum communication bandwidth in a simple manner, yet allowing the emergence of diverse complex behaviors and autonomy of the swarm. Communication in the proposed paradigm is based on single bit ‘ping’signals propagating as informationwaves throughout the swarm. We show that even this minimum bandwidth communication between agents suffices for the design of a substantial set of behaviors in the domain of essential behaviors of a collective, including locomotion and self awareness of the swarm. 
WaveGlow  In this paper we propose WaveGlow: a flowbased network capable of generating high quality speech from melspectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and highquality audio synthesis, without the need for autoregression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable. Our PyTorch implementation produces audio samples at a rate of more than 500 kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio quality as good as the best publicly available WaveNet implementation. All code will be made publicly available online. 
Wavelet Convolutional Neural Network  Spatial and spectral approaches are two major approaches for image processing tasks such as image classification and object recognition. Among many such algorithms, convolutional neural networks (CNNs) have recently achieved significant performance improvement in many challenging tasks. Since CNNs process images directly in the spatial domain, they are essentially spatial approaches. Given that spatial and spectral approaches are known to have different characteristics, it will be interesting to incorporate a spectral approach into CNNs. We propose a novel CNN architecture, wavelet CNNs, which combines a multiresolution analysis and CNNs into one model. Our insight is that a CNN can be viewed as a limited form of a multiresolution analysis. Based on this insight, we supplement missing parts of the multiresolution analysis via wavelet transform and integrate them as additional components in the entire architecture. Wavelet CNNs allow us to utilize spectral information which is mostly lost in conventional CNNs but useful in most image processing tasks. We evaluate the practical performance of wavelet CNNs on texture classification and image annotation. The experiments show that wavelet CNNs can achieve better accuracy in both tasks than existing models while having significantly fewer parameters than conventional CNNs. 
WaveletFCNN  Wind power, as an alternative to burning fossil fuels, is plentiful and renewable. Datadriven approaches are increasingly popular for inspecting the wind turbine failures. In this paper, we propose a novel classificationbased anomaly detection system for icing detection of the wind turbine blades. We effectively combine the deep neural networks and wavelet transformation to identify such failures sequentially across the time. In the training phase, we present a wavelet based fully convolutional neural network (FCNN), namely WaveletFCNN, for the time series classification. We improve the original (FCNN) by augmenting features with the wavelet coefficients. WaveletFCNN outperforms the stateoftheart FCNN for the univariate time series classification on the UCR time series archive benchmarks. In the detecting phase, we combine the sliding window and majority vote algorithms to provide the timely monitoring of the anomalies. The system has been successfully implemented on a realworld dataset from Goldwind Inc, where the classifier is trained on a multivariate time series dataset and the monitoring algorithm is implemented to capture the abnormal condition on signals from a wind farm. 
Waveletlike AutoEncoder (WAE) 
Accelerating deep neural networks (DNNs) has been attracting increasing attention as it can benefit a wide range of applications, e.g., enabling mobile systems with limited computing resources to own powerful visual recognition ability. A practical strategy to this goal usually relies on a twostage process: operating on the trained DNNs (e.g., approximating the convolutional filters with tensor decomposition) and finetuning the amended network, leading to difficulty in balancing the tradeoff between acceleration and maintaining recognition performance. In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Waveletlike AutoEncoder (WAE) that decomposes the original input image into two lowresolution channels (subimages) and incorporate the WAE into the classification neural networks for joint training. The two decomposed channels, in particular, are encoded to carry the lowfrequency information (e.g., image profiles) and highfrequency (e.g., image details or noises), respectively, and enable reconstructing the original input image through the decoding process. Then, we feed the lowfrequency channel into a standard classification network such as VGG or ResNet and employ a very lightweight network to fuse with the highfrequency channel to obtain the classification result. Compared to existing DNN acceleration solutions, our framework has the following advantages: i) it is tolerant to any existing convolutional neural networks for classification without amending their structures; ii) the WAE provides an interpretable way to preserve the main components of the input image for classification. 
WaveletNet  We present a logarithmicscale efficient convolutional neural network architecture for edge devices, named WaveletNet. Our model is based on the wellknown depthwise convolution, and on two new layers, which we introduce in this work: a wavelet convolution and a depthwise fast wavelet transform. By breaking the symmetry in channel dimensions and applying a fast algorithm, WaveletNet shrinks the complexity of convolutional blocks by an O(logD/D) factor, where D is the number of channels. Experiments on CIFAR10 and ImageNet classification show superior and comparable performances of WaveletNet compared to stateoftheart models such as MobileNetV2. 
WaveNet  Various sources have reported the WaveNet deep learning architecture being able to generate highquality speech, but to our knowledge there haven’t been studies on the interpretation or visualization of trained WaveNets. This study investigates the possibility that WaveNet understands speech by unsupervisedly learning an acoustically meaningful latent representation of the speech signals in its receptive field; we also attempt to interpret the mechanism by which the feature extraction is performed. Suggested by singular value decomposition and linear regression analysis on the activations and known acoustic features (e.g. F0), the key findings are (1) activations in the higher layers are highly correlated with spectral features; (2) WaveNet explicitly performs pitch extraction despite being trained to directly predict the next audio sample and (3) for the said feature analysis to take place, the latent signal representation is converted back and forth between baseband and wideband components. How WaveNet Works 
Wavenilm  Nonintrusive load monitoring (NILM) helps meet energy conservation goals by estimating individual appliance power usage from a single aggregate measurement. Deep neural networks have become increasingly popular in attempting to solve NILM problems; however, many of them are not causal which is important for realtime application. We present a causal 1D convolutional neural network inspired by WaveNet for NILM on lowfrequency data. We also study using various components of the complex power signal for NILM, and demonstrate that using all four components available in a popular NILM dataset (current, active power, reactive power, and apparent power) we achieve faster convergence and higher performance than stateoftheart results for the same dataset. 
WDecorrelation  Estimators computed from adaptively collected data do not behave like their nonadaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method decorrelation procedure — Wdecorrelation — for transforming the bias of adaptive linear regression estimators into variance. The method uses only coarsegrained information about the data collection policy and does not need access to propensity scores or exact knowledge of the policy. We bound the finitesample bias and variance of the Westimator and develop asymptotically correct confidence intervals based on a novel martingale central limit theorem. We then demonstrate the empirical benefits of the generic Wdecorrelation procedure in two different adaptive data settings: the multiarmed bandits and autoregressive time series models. 
Weakly Structured Information Processing and Exploration (WIPE) 
WIPE is used for managing the graph traversal manipulation with BIlike data aggregation. WIPE stands for “Weaklystructured Information Processing and Exploration”. It is a data manipulation and query language built on top of the graph functionality in the SAP HANA Database. Like other domain specific languages provided by SAP HANA Database, WIPE is embedded in transactional context, which means that multiple WIPE statements can be executed concurrently, guaranteeing the atomicity, consistency, isolation and durability. With the help of this language, multiple graph operations such as inserting, updating or deleting a node and other query operations can be declared in one complex statement. It is the graph abstraction layer in the SAP HANA Database that provides interaction with the graph data stored in the database by exposing graph concepts directly to the application developer. The application developer can create or delete graphs, access the existing graphs, modify the vertices and edges of the graphs, or retrieve a set of vertices and edges based on their attributes. Besides retrieval and manipulation functions, a set of builtin graph operators are also provided by the SAP HANA Database. These operators, such as breadthfirst or depthfirst traversal algorithms, interact with the column store of the relational engine to execute efficiently and in a highly optimum manner. 
WeaklySupervised Hierarchical Text Classification  Hierarchical text classification, which aims to classify text documents into a given hierarchy, is an important task in many realworld applications. Recently, deep neural models are gaining increasing popularity for text classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical text classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weaklysupervised neural method for hierarchical text classification. Our method does not require a large amount of training data but requires only easytoprovide weak supervision signals such as a few classrelated documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pretraining, and then performs selftraining on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines. 
WeaklySupervised Neural Text Classification  Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many realworld applications. Although many semisupervised and weaklysupervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weaklysupervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudodocument generator that leverages seed information to generate pseudolabeled documents for model pretraining, and (2) a selftraining module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three realworld datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly. 
Weaklysupervised Temporal Activity Localization (WTALC) 
Most activity localization methods in the literature suffer from the burden of framewise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weaklysupervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present WTALC, a Weaklysupervised Temporal Activity Localization and Classification framework using only videolevel labels. The proposed network can be divided into two subnetworks, namely the TwoStream based feature extractor network and a weaklysupervised module, which we learn by optimizing two complimentary loss functions. Qualitative and quantitative results on two challenging datasets – Thumos14 and ActivityNet1.2, demonstrate that the proposed method is able to detect activities at a fine granularity and achieve better performance than current stateoftheart methods. 
Weaver  We introduce a new distributed graph store, called Weaver, which enables efficient, transactional graph analyses as well as strictly serializable readwrite transactions on dynamic graphs. The key insight that enables Weaver to combine strict serializability with horizontal scalability and high performance is a novel request ordering mechanism called refinable timestamps. This technique couples coarsegrained vector timestamps with a finegrained timeline oracle to pay the overhead of strong consistency only when needed. 
Web Analytics  Web analytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage. Web analytics is not just a tool for measuring web traffic but can be used as a tool for business and market research, and to assess and improve the effectiveness of a website. Web analytics applications can also help companies measure the results of traditional print or broadcast advertising campaigns. It helps one to estimate how traffic to a website changes after the launch of a new advertising campaign. Web analytics provides information about the number of visitors to a website and the number of page views. It helps gauge traffic and popularity trends which is useful for market research. There are two categories of web analytics; offsite and onsite web analytics. Offsite web analytics refers to web measurement and analysis regardless of whether you own or maintain a website. It includes the measurement of a website’s potential audience (opportunity), share of voice (visibility), and buzz (comments) that is happening on the Internet as a whole. Onsite web analytics measure a visitor’s behavior once on your website. This includes its drivers and conversions; for example, the degree to which different landing pages are associated with online purchases. Onsite web analytics measures the performance of your website in a commercial context. This data is typically compared against key performance indicators for performance, and used to improve a website or marketing campaign’s audience response. Google Analytics is the most widely used onsite web analytics service; although new tools are emerging that provide additional layers of information, including heat maps and session replay. Historically, web analytics has been used to refer to onsite visitor measurement. However, in recent years this meaning has become blurred, mainly because vendors are producing tools that span both categories. 
Web Data Commons  The Web Data Commons project extracts structured data from the Common Crawl, the largest web corpus available to the public, and provides the extracted data for public download in order to support researchers and companies in exploiting the wealth of information that is available on the Web. http://…webdatacommonsdatawebscalemining.html 
Web Mining  Web mining – is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining. 
Web of Data (WoD) 
The Semantic Web is a Web of Data – of dates and titles and part numbers and chemical properties and any other data one might conceive of. The collection of Semantic Web technologies (RDF, OWL, SKOS, SPARQL, etc.) provides an environment where application can query that data, draw inferences using vocabularies, etc. However, to make the Web of Data a reality, it is important to have the huge amount of data on the Web available in a standard format, reachable and manageable by Semantic Web tools. Furthermore, not only does the Semantic Web need access to data, but relationships among data should be made available, too, to create a Web of Data (as opposed to a sheer collection of datasets). This collection of interrelated datasets on the Web can also be referred to as Linked Data. To achieve and create Linked Data, technologies should be available for a common format (RDF), to make either conversion or onthefly access to existing databases (relational, XML, HTML, etc). It is also important to be able to setup query endpoints to access that data more conveniently. W3C provides a palette of technologies (RDF, GRDDL, POWDER, RDFa, the upcoming R2RML, RIF, SPARQL) to get access to the data. 
Web Ontology Language (OWL) 
The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects. Ontologies resemble class hierarchies in objectoriented programming but there are several critical differences. Class hierarchies are meant to represent structures used in source code that evolve fairly slowly (typically monthly revisions) where as ontologies are meant to represent information on the Internet and are expected to be evolving almost constantly. Similarly, ontologies are typically far more flexible as they are meant to represent information on the Internet coming from all sorts of heterogeneous data sources. Class hierarchies on the other hand are meant to be fairly static and rely on far less diverse and more structured sources of data such as corporate databases. The OWL languages are characterized by formal semantics. They are built upon a W3C XML standard for objects called the Resource Description Framework (RDF). OWL and RDF have attracted significant academic, medical and commercial interest. In October 2007, a new W3C working group was started to extend OWL with several new features as proposed in the OWL 1.1 member submission. W3C announced the new version of OWL on 27 October 2009. This new version, called OWL 2, soon found its way into semantic editors such as Protégé and semantic reasoners such as Pellet, RacerPro, FaCT++ and HermiT. The OWL family contains many species, serializations, syntaxes and specifications with similar names. OWL and OWL2 are used to refer to the 2004 and 2009 specifications, respectively. Full species names will be used, including specification version (for example, OWL2 EL). When referring more generally, OWL Family will be used. 
Web Scraping  Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing lowlevel Hypertext Transfer Protocol (HTTP), or embedding a fullyfledged web browser, such as Internet Explorer or Mozilla Firefox. scrapeR 
WebSeg  In this paper, we improve semantic segmentation by automatically learning from Flickr images associated with a particular keyword, without relying on any explicit user annotations, thus substantially alleviating the dependence on accurate annotations when compared to previous weakly supervised methods. To solve such a challenging problem, we leverage several lowlevel cues (such as saliency, edges, etc.) to help generate a proxy ground truth. Due to the diversity of webcrawled images, we anticipate a large amount of ‘label noise’ in which other objects might be present. We design an online noise filtering scheme which is able to deal with this label noise, especially in cluttered images. We use this filtering strategy as an auxiliary module to help assist the segmentation network in learning cleaner proxy annotations. Extensive experiments on the popular PASCAL VOC 2012 semantic segmentation benchmark show surprising good results in both our WebSeg (mIoU = 57.0%) and weakly supervised (mIoU = 63.3%) settings. 
WeCURE  Missing data recovery is an important and yet challenging problem in imaging and data science. Successful models often adopt certain carefully chosen regularization. Recently, the low dimension manifold model (LDMM) was introduced by S.Osher et al. and shown effective in image inpainting. They observed that enforcing low dimensionality on image patch manifold serves as a good image regularizer. In this paper, we observe that having only the low dimension manifold regularization is not enough sometimes, and we need smoothness as well. For that, we introduce a new regularization by combining the low dimension manifold regularization with a higher order Curvature Regularization, and we call this new regularization CURE for short. The key step of solving CURE is to solve a biharmonic equation on a manifold. We further introduce a weighted version of CURE, called WeCURE, in a similar manner as the weighted nonlocal Laplacian (WNLL) method. Numerical experiments for image inpainting and semisupervised learning show that the proposed CURE and WeCURE significantly outperform LDMM and WNLL respectively. 
Weibull Distribution  In probability theory and statistics, the Weibull distribution /ˈveɪbʊl/ is a continuous probability distribution. It is named after Waloddi Weibull, who described it in detail in 1951, although it was first identified by Fréchet (1927) and first applied by Rosin & Rammler (1933) to describe a particle size distribution. 
Weibull Hybrid Autoencoding Inference (WHAI) 
To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in outofsample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochasticgradient MCMC and autoencoding variational Bayes. The generative network of WHAI has a hierarchy of gamma distributions, while the inference network of WHAI is a Weibull upwarddownward variational autoencoder, which integrates a deterministicupward deep neural network, and a stochasticdownward deep generative model based on a hierarchy of Weibull distributions. The Weibull distribution can be used to well approximate a gamma distribution with an analytic KullbackLeibler divergence, and has a simple reparameterization via the uniform noise, which help efficiently compute the gradients of the evidence lower bound with respect to the parameters of the inference network. The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora. 
Weibull Time To Event Recurrent Neural Network (WTTERNN) 
In this thesis we propose a new model for predicting time to events: the Weibull Time To Event RNN. This is a simple framework for timeseries prediction of the time to the next event applicable when we have any or all of the problems of continuous or discrete time, right censoring, recurrent events, temporal patterns, time varying covariates or time series of varying lengths. All these problems are frequently encountered in customer churn, remaining useful life, failure, spiketrain and event prediction. The proposed model estimates the distribution of time to the next event as having a discrete or continuous Weibull distribution with parameters being the output of a recurrent neural network. The model is trained using a special objective function (loglikelihoodloss for censored data) commonly used in survival analysis. The Weibull distribution is simple enough to avoid sparsity and can easily be regularized to avoid overfitting but is still expressive enough to encode concepts like increasing, stationary or decreasing risk and can converge to a pointestimate if allowed. The predicted Weibullparameters can be used to predict expected value and quantiles of the time to the next event. It also leads to a natural 2dembedding of future risk which can be used for monitoring and exploratory analysis. We describe the WTTERNN using a general framework for censored data which can easily be extended with other distributions and adapted for multivariate prediction. We show that the common Proportional Hazards model and the Weibull Accelerated Failure time model are special cases of the WTTERNN. The proposed model is evaluated on simulated data with varying degrees of censoring and temporal resolution. We compared it to binary fixed window forecast models and naive ways of handling censored data. The model outperforms naive methods and is found to have many advantages and comparable performance to binary fixedwindow RNNs without the need to specify window size and the ability to train on more data. Application to the CMAPSSdataset for PHMruntofailure of simulated JetEngines gives promising results. 
Weight of Evidence (WoE) 
The Weight of Evidence or WoE value is a widely used measure of the ‘strength’ of a grouping for separating good and bad risk (default). It is computed from the basic odds ratio: (Distribution of Good Credit Outcomes) / (Distribution of Bad Credit Outcomes). Or the ratios of Distr Goods / Distr Bads for short, where Distr refers to the proportion of Goods or Bads in the respective group, relative to the column totals, i.e., expressed as relative proportions of the total number of Goods and Bads. Why Use Weight of Evidence? woe 
Weight Standardization (WS) 
In this paper, we propose Weight Standardization (WS) to accelerate deep network training. WS is targeted at the microbatch training setting where each GPU typically has only 12 images for training. The microbatch training setting is hard because small batch sizes are not enough for training networks with Batch Normalization (BN), while other normalization methods that do not rely on batch knowledge still have difficulty matching the performances of BN in largebatch training. Our WS ends this problem because when used with Group Normalization and trained with 1 image/GPU, WS is able to match or outperform the performances of BN trained with large batch sizes with only 2 more lines of code. In microbatch training, WS significantly outperforms other normalization methods. WS achieves these superior results by standardizing the weights in the convolutional layers, which we show is able to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients. The effectiveness of WS is verified on many tasks, including image classification, object detection, instance segmentation, video recognition, semantic segmentation, and point cloud recognition. The code is available here: https://…/WeightStandardization. 
Weighted Balanced Distribution Adaptation (WBDA) 
➚ “Balanced Distribution Adaptation” 
Weighted Bootstrap Markov Chain Monte Carlo  Many data sets, especially from surveys, are made available to users with weights. Where the derivation of such weights is known, this information can often be incorporated in the user’s substantive model (model of interest). When the derivation is unknown, the established procedure is to carry out a weighted analysis. However, with nontrivial proportions of missing data this is inefficient and may be biased when data are not missing at random. Bayesian approaches provide a natural approach for the imputation of missing data, but it is unclear how to handle the weights. We propose a weighted bootstrap Markov chain Monte Carlo algorithm for estimation and inference. A simulation study shows that it has good inferential properties. We illustrate its utility with an analysis of data from the Millennium Cohort Study. 
Weighted Effect Coding  Weighted effect coding refers to a specific coding matrix to include factor variables in generalised linear regression models. With weighted effect coding, the effect for each category represents the deviation of that category from the weighted mean (which corresponds to the sample mean). This technique has particularly attractive properties when analysing observational data, that commonly are unbalanced. The wec package is introduced, that provides functions to apply weighted effect coding to factor variables, and to interactions between (a.) a factor variable and a continuous variable and between (b.) two factor variables. wec 
Weighted Entropy  The concept of weighted entropy takes into account values of different outcomes, i.e., makes entropy contextdependent, through the weight function. 
Weighted Finite Automata (WFA) 
Approximating probabilistic models as weighted finite automata 
Weighted Hausdorff Distance  Recent advances in Convolutional Neural Networks (CNN) have achieved remarkable results in localizing objects in images. In these networks, the training procedure usually requires providing bounding boxes or the maximum number of expected objects. In this paper, we address the task of estimating object locations without annotated bounding boxes, which are typically handdrawn and time consuming to label. We propose a loss function that can be used in any Fully Convolutional Network (FCN) to estimate object locations. This loss function is a modification of the Average Hausdorff Distance between two unordered sets of points. The proposed method does not require one to ‘guess’ the maximum number of objects in the image, and has no notion of bounding boxes, region proposals, or sliding windows. We evaluate our method with three datasets designed to locate people’s heads, pupil centers and plant centers. We report an average precision and recall of 94% for the three datasets, and an average location error of 6 pixels in 256×256 images. 
Weighted Inverse Laplacian (WILL) 
Community detection was a hot topic on network analysis, where the main aim is to perform unsupervised learning or clustering in networks. Recently, semisupervised learning has received increasing attention among researchers. In this paper, we propose a new algorithm, called weighted inverse Laplacian (WIL), for predicting labels in partially labeled networks. The idea comes from the first hitting time in random walk, and it also has nice explanations both in information propagation and the regularization framework. We propose a partially labeled degreecorrected block model (pDCBM) to describe the generation of partially labeled networks. We show that WIL ensures the misclassification rate is of order $O(\frac{1}{d})$ for the pDCBM with average degree $d=\Omega(\log n),$ and that it can handle situations with greater unbalanced than traditional Laplacian methods. WIL outperforms other stateoftheart methods in most of our simulations and real datasets, especially in unbalanced networks and heterogeneous networks. 
Weighted Label Smoothing Regularization (WLSR) 
Conventional approaches used supervised learning to estimate offline writer identifications. In this study, we improved the offline writer identifications by semisupervised feature learning pipeline, which trained the extra unlabeled data and the original labeled data simultaneously. In specific, we proposed a weighted label smoothing regularization (WLSR) method, which assigned the weighted uniform label distribution to the extra unlabeled data. We regularized the convolutional neural network (CNN) baseline, which allows learning more discriminative features to represent the properties of different writing styles. Based on experiments on ICDAR2013, CVL and IAM benchmark datasets, our results showed that semisupervised feature learning improved the baseline measurement and achieved better performance compared with existing writer identifications approaches. 
Weighted Majority Algorithm (WMA) 
In machine learning, Weighted Majority Algorithm (WMA) is a metalearning algorithm used to construct a compound algorithm from a pool of prediction algorithms, which could be any type of learning algorithms, classifiers, or even real human experts. The algorithm assumes that we have no prior knowledge about the accuracy of the algorithms in the pool, but there are sufficient reasons to believe that one or more will perform well. There are many variations of the Weighted Majority Algorithm to handle different situations, like shifting targets, infinite pools, or randomized predictions. The core mechanism remain similar, with the final performances of the compound algorithm bounded by a function of the performance of the specialist (best performing algorithm) in the pool. 
Weighted Mean Curvature  In image processing tasks, spatial priors are essential for robust computations, regularization, algorithmic design and Bayesian inference. In this paper, we introduce weighted mean curvature (WMC) as a novel image prior and present an efficient computation scheme for its discretization in practical image processing applications. We first demonstrate the favorable properties of WMC, such as sampling invariance, scale invariance, and contrast invariance with Gaussian noise model; and we show the relation of WMC to area regularization. We further propose an efficient computation scheme for discretized WMC, which is demonstrated herein to process over 33.2 gigapixels/second on GPU. This scheme yields itself to a convolutional neural network representation. Finally, WMC is evaluated on synthetic and real images, showing its superiority quantitatively to totalvariation and mean curvature. 
Weighted Multisource Tradaboost  In this paper we propose an improved method for transfer learning that takes into account the balance between target and source data. This method builds on the stateoftheart Multisource Tradaboost, but weighs the importance of each datapoint taking into account the amount of target and source data available. A comparative study is then presented exposing the performance of four transfer learning methods as well as the proposed Weighted Multisource Tradaboost. The experimental results show that the proposed method is able to outperform the base method as the number of target samples increase. These results are promising in the sense that sourcetarget ratio weighing may be a path to improve current methods of transfer learning. However, against the asymptotic conjecture, all transfer learning methods tested in this work get outperformed by a notransfer SVM for large number on target samples. 
Weighted Network (WeNet) 
In recent years, there has been increasing demand for automatic architecture search in deep learning. Numerous approaches have been proposed and led to stateoftheart results in various applications, including image classification and language modeling. In this paper, we propose a novel way of architecture search by means of weighted networks (WeNet), which consist of a number of networks, with each assigned a weight. These weights are updated with backpropagation to reflect the importance of different networks. Such weighted networks bear similarity to mixture of experts. We conduct experiments on Penn Treebank and WikiText2. We show that the proposed WeNet can find recurrent architectures which result in stateoftheart performance. 
Weighted Nonlinear Regression  Nonlinear Least Squares 
Weighted Object kMeans  Weighted object version of kmeans algorithm, robust against outlier data. RWeightedKmeans 
Weighted Ontology Approximation Heuristic (WOAH) 
The present paper presents the Weighted Ontology Approximation Heuristic (WOAH), a novel zeroshot approach to ontology estimation for conversational agents development environments. This methodology extracts verbs and nouns separately from data by distilling the dependencies obtained and applying similarity and sparsity metrics to generate an ontology estimation configurable in terms of the level of generalization. 
Weighted Ordered Weighted Aggregation (WOWA) 
From a formal point of view, the WOWA operator is a particular case of Choquet integral (using a particular type of measure: a distorted probability). 
Weighted Orthogonal Components Regression Analysis (WOCR) 
In the multiple linear regression setting, we propose a general framework, termed weighted orthogonal components regression (WOCR), which encompasses many known methods as special cases, including ridge regression and principal components regression. WOCR makes use of the monotonicity inherent in orthogonal components to parameterize the weight function. The formulation allows for efficient determination of tuning parameters and hence is computationally advantageous. Moreover, WOCR offers insights for deriving new better variants. Specifically, we advocate weighting components based on their correlations with the response, which leads to enhanced predictive performance. Both simulated studies and real data examples are provided to assess and illustrate the advantages of the proposed methods. 
Weighted Parallel SGD (WPSGD) 
Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD, often require all nodes to have the same performance or to consume equal quantities of data. However, these requirements are difficult to satisfy when the parallel SGD algorithms run in a heterogeneous computing environment; lowperformance nodes will exert a negative influence on the final result. In this paper, we propose an algorithm called weighted parallel SGD (WPSGD). WPSGD combines weighted model parameters from different nodes in the system to produce the final output. WPSGD makes use of the reduction in standard deviation to compensate for the loss from the inconsistency in performance of nodes in the cluster, which means that WPSGD does not require that all nodes consume equal quantities of data. We also analyze the theoretical feasibility of running two other parallel SGD algorithms combined with WPSGD in a heterogeneous environment. The experimental results show that WPSGD significantly outperforms the traditional parallel SGD algorithms on distributed training systems with an unbalanced workload. 
Weighted Quantile Sum (WQS) 
wqs 
Weighted Random Survival Forest  A weighted random survival forest is presented in the paper. It can be regarded as a modification of the random forest improving its performance. The main idea underlying the proposed model is to replace the standard procedure of averaging used for estimation of the random survival forest hazard function by weighted avaraging where the weights are assigned to every tree and can be veiwed as training paremeters which are computed in an optimal way by solving a standard quadratic optimization problem maximizing Harrell’s Cindex. Numerical examples with real data illustrate the outperformance of the proposed model in comparison with the original random survival forest. 
Weighted Score Table  
Weighted Sigmoid Gate Unit (WiG) 
An activation function has crucial role in a deep neural network. A simple rectified linear unit (ReLU) are widely used for the activation function. In this paper, a weighted sigmoid gate unit (WiG) is proposed as the activation function. The proposed WiG consists of a multiplication of inputs and the weighted sigmoid gate. It is shown that the WiG includes the ReLU and same activation functions as a special case. Many activation functions have been proposed to overcome the performance of the ReLU. In the literature, the performance is mainly evaluated with an object recognition task. The proposed WiG is evaluated with the object recognition task and the image restoration task. Then, the expeirmental comparisons demonstrate the proposed WiG overcomes the existing activation functions including the ReLU. 
Weighted SourcetoDistortion Ratio (wSDR) 
Most deep learningbased models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction. This is due to the difficulty of estimating the phase of clean speech. To improve speech enhancement performance, we tackle the phase estimation problem in three ways. First, we propose Deep Complex UNet, an advanced UNet structured model incorporating welldefined complexvalued building blocks to deal with complexvalued spectrograms. Second, we propose a polar coordinatewise complexvalued masking method to reflect the distribution of complex ideal ratio masks. Third, we define a novel loss function, weighted sourcetodistortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure. Our model was evaluated on a mixture of the Voice Bank corpus and DEMAND database, which has been widely used by many deep learning models for speech enhancement. Ablation experiments were conducted on the mixed dataset showing that all three proposed approaches are empirically valid. Experimental results show that the proposed method achieves stateoftheart performance in all metrics, outperforming previous approaches by a large margin. 
Weighted Topological Overlaps (wTO) 
wTO 
WeightedSVD  The Matrix Factorization models, sometimes called the latent factor models, are a family of methods in the recommender system research area to (1) generate the latent factors for the users and the items and (2) predict users’ ratings on items based on their latent factors. However, current Matrix Factorization models presume that all the latent factors are equally weighted, which may not always be a reasonable assumption in practice. In this paper, we propose a new model, called WeightedSVD, to integrate the linear regression model with the SVD model such that each latent factor accompanies with a corresponding weight parameter. This mechanism allows the latent factors have different weights to influence the final ratings. The complexity of the WeightedSVD model is slightly larger than the SVD model but much smaller than the SVD++ model. We compared the WeightedSVD model with several latent factor models on five public datasets based on the RootMeanSquaredErrors (RMSEs). The results show that the WeightedSVD model outperforms the baseline methods in all the experimental datasets under almost all settings. 
WeightMedian Sketch  We introduce a new sublinear space data structure—the WeightMedian Sketch—that captures the most heavily weighted features in linear classifiers trained over data streams. This enables memorylimited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. In contrast with related sketches that capture the most commonly occurring features (or items) in a data stream, the WeightMedian Sketch captures the features that are most discriminative of one stream (or class) compared to another. The WeightMedian sketch adopts the core data structure used in the CountSketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis of this approach that establishes recovery guarantees in the online learning setting, and demonstrate substantial empirical improvements in accuracymemory tradeoffs over alternatives, including countbased sketches and feature hashing. 
Weka  Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data preprocessing, classification, regression, clustering, association rules, and visualization. It is also wellsuited for developing new machine learning schemes. ➚ “Waikato Environment for Knowledge Analysis” 
Whale Optimization Algorithm (WOA) 
Whale Optimization Algorithm (WOA) is a recently proposed (2016) optimization algorithm mimicking the hunting mechanism of humpback whales in nature. It is worth mentioning here that bubblenet feeding is a unique behavior that can only be observed in humpback whales. In WOA the spiral bubblenet feeding maneuver is mathematically modeled in order to perform optimization. A Systematic and Metaanalysis Survey of Whale Optimization Algorithm 
WhatIf Tool  What If… you could inspect a machine learning model, with no coding required? Building effective machine learning systems means asking a lot of questions. It’s not enough to train a model and walk away. Instead, good practitioners act as detectives, probing to understand their model better. But answering these kinds of questions isn’t easy. Probing ‘what if’ scenarios often means writing custom, oneoff code to analyze a specific model. Not only is this process inefficient, it makes it hard for nonprogrammers to participate in the process of shaping and improving machine learning models. For us, making it easier for a broad set of people to examine, evaluate, and debug machine learning systems is a key concern. That’s why we built the WhatIf Tool. Built into the opensource TensorBoard web application – a standard part of the TensorFlow platform – the tool allows users to analyze an machine learning model without the need for writing any further code. Given pointers to a TensorFlow model and a dataset, the WhatIf Tool offers an interactive visual interface for exploring model results. 
WHInter  Learning sparse linear models with twoway interactions is desirable in many application domains such as genomics. l1regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate twoway interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features. Here we present WHInter, a working set algorithm to solve large l1regularised problems with twoway interactions for binary design matrices. The novelty of WHInter stems from a new bound to efficiently identify working sets while avoiding to scan all features, and on fast computations inspired from solutions to the maximum inner product search problem. We apply WHInter to simulated and real genetic data and show that it is more scalable and two orders of magnitude faster than the state of the art. 
White Noise  In signal processing, white noise is a random signal with a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, including physics, acoustic engineering, telecommunications, statistical forecasting, and many more. White noise refers to a statistical model for signals and signal sources, rather than to any specific signal. A ‘white noise’ image. In discrete time, white noise is a discrete signal whose samples are regarded as a sequence of serially uncorrelated random variables with zero mean and finite variance; a single realization of white noise is a random shock. Depending on the context, one may also require that the samples be independent and have the same probability distribution (in other words i.i.d is a simplest representative of the white noise). In particular, if each sample has a normal distribution with zero mean, the signal is said to be Gaussian white noise. The samples of a white noise signal may be sequential in time, or arranged along one or more spatial dimensions. In digital image processing, the pixels of a white noise image are typically arranged in a rectangular grid, and are assumed to be independent random variables with uniform probability distribution over some interval. The concept can be defined also for signals spread over more complicated domains, such as a sphere or a torus. Some ‘white noise’ sound. An infinitebandwidth white noise signal is a purely theoretical construction. The bandwidth of white noise is limited in practice by the mechanism of noise generation, by the transmission medium and by finite observation capabilities. Thus, a random signal is considered ‘white noise’ if it is observed to have a flat spectrum over the range of frequencies that is relevant to the context. For an audio signal, for example, the relevant range is the band of audible sound frequencies, between 20 to 20,000 Hz. Such a signal is heard as a hissing sound, resembling the /sh/ sound in ‘ash’. In music and acoustics, the term ‘white noise’ may be used for any signal that has a similar hissing sound. White noise draws its name from white light, although light that appears white generally does not have a flat spectral power density over the visible band. The term white noise is sometimes used in the context of phylogenetically based statistical methods to refer to a lack of phylogenetic pattern in comparative data. It is sometimes used in non technical contexts, in the metaphoric sense of ‘random talk without meaningful contents’. 
White Noise Test  
Whitening Transformation  A whitening transformation is a decorrelation transformation that transforms a set of random variables having a known covariance matrix into a set of new random variables whose covariance is the identity matrix (meaning that they are uncorrelated and all have variance 1). The transformation is called “whitening” because it changes the input vector into a white noise vector. It differs from a general decorrelation transformation in that the latter only makes the covariances equal to zero, so that the correlation matrix may be any diagonal matrix. The inverse coloring transformation transforms a vector of uncorrelated variables (a white random vector) into a vector with a specified covariance matrix. 
Whittemore  This paper introduces Whittemore, a language for causal programming. Causal programming is based on the theory of structural causal models and consists of two primary operations: identification, which finds formulas that compute causal queries, and estimation, which applies formulas to transform probability distributions to other probability distribution. Causal programming provides abstractions to declare models, queries, and distributions with syntax similar to standard mathematical notation, and conducts rigorous causal inference, without requiring detailed knowledge of the underlying algorithms. Examples of causal inference with real data are provided, along with discussion of the implementation and possibilities for future extension. 
Widely Applicable Bayesian Information Criterion (WBIC) 
A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is onetoone and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/logn, where n is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical model. Since WBIC can be numerically calculated without any information about a true distribution, it is a generalized version of BIC onto singular statistical models. ➚ “WatanabeAkaike Information Criteria” 
Widely Applicable Information Criterion (WAIC) 
➚ “WatanabeAkaike Information Criteria” loo 
Width of the Language  We consider the problem of quantifying information flow in interactive systems, modelled as finitestate transducers in the style of Goguen and Meseguer. Our main result is that if the system is deterministic then the information flow is either logarithmic or linear, and there is a polynomialtime algorithm to distinguish the two cases and compute the rate of logarithmic flow. To achieve this we first extend the theory of information leakage through channels to the case of interactive systems, and establish a number of results which greatly simplify computation. We then show that for deterministic systems the information flow corresponds to the growth rate of antichains inside a certain regular language, a property called the width of the language. In a companion work we have shown that there is a dichotomy between polynomial and exponential antichain growth, and a polynomial time algorithm to distinguish the two cases and to compute the order of polynomial growth. We observe that these two cases correspond to logarithmic and linear information flow respectively. Finally, we formulate several attractive open problems, covering the cases of probabilistic systems, systems with more than two users and nondeterministic systems where the nondeterminism is assumed to be innocent rather than demonic. 
Wiener Polarity Index  The Wiener polarity index Wp(G) of a graph G is the number of unordered pairs of vertices {u,v} in G such that the distance between u and v is equal to 3. 
Wiener Process  In mathematics, the Wiener process is a continuoustime stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown. It is one of the best known Lévy processes (càdlàg stochastic processes with stationary independent increments) and occurs frequently in pure and applied mathematics, economics, quantitative finance, and physics. The Wiener process plays an important role both in pure and applied mathematics. In pure mathematics, the Wiener process gave rise to the study of continuous time martingales. It is a key process in terms of which more complicated stochastic processes can be described. As such, it plays a vital role in stochastic calculus, diffusion processes and even potential theory. It is the driving process of SchrammLoewner evolution. In applied mathematics, the Wiener process is used to represent the integral of a Gaussian white noise process, and so is useful as a model of noise in electronics engineering, instrument errors in filtering theory and unknown forces in control theory. The Wiener process has applications throughout the mathematical sciences. In physics it is used to study Brownian motion, the diffusion of minute particles suspended in fluid, and other types of diffusion via the FokkerPlanck and Langevin equations. It also forms the basis for the rigorous path integral formulation of quantum mechanics (by the FeynmanKac formula, a solution to the Schrödinger equation can be represented in terms of the Wiener process) and the study of eternal inflation in physical cosmology. It is also prominent in the mathematical theory of finance, in particular the BlackScholes option pricing model. 
WienerFilter  In signal processing, the Wiener Filter (WienerKolmogorov Filter) is a filter used to produce an estimate of a desired or target random process by linear timeinvariant filtering of an observed noisy process, assuming known stationary signal and noise spectra, and additive noise. The Wiener filter minimizes the mean square error between the estimated random process and the desired process. 
WikiAtomicEdits  We release a corpus of 43 million atomic edits across 8 languages. These edits are mined from Wikipedia edit history and consist of instances in which a human editor has inserted a single contiguous phrase into, or deleted a single contiguous phrase from, an existing sentence. We use the collected data to show that the language generated during editing differs from the language that we observe in standard corpora, and that models trained on edits encode different aspects of semantics and discourse than models trained on raw, unstructured text. We release the full corpus as a resource to aid ongoing research in semantics, discourse, and representation learning. 
WikibookBot  A Wikipedia book (known as Wikibook) is a collection of Wikipedia articles on a particular theme that is organized as a book. We propose WikibookBot, a machinelearning based technique for automatically generating high quality Wikibooks based on a concept provided by the user. In order to create the Wikibook we apply machine learning algorithms to the different steps of the proposed technique. Firs, we need to decide whether an article belongs to a specific Wikibook – a classification task. Then, we need to divide the chosen articles into chapters – a clustering task – and finally, we deal with the ordering task which includes two subtasks: order articles within each chapter and order the chapters themselves. We propose a set of structural, textbased and unique Wikipedia features, and we show that by using these features, a machine learning classifier can successfully address the above challenges. The predictive performance of the proposed method is evaluated by comparing the autogenerated books to existing 407 Wikibooks which were manually generated by humans. For all the tasks we were able to obtain high and statistically significant results when comparing the Wikibookbot books to books that were manually generated by Wikipedia contributors 
WikiConv  We present a corpus that encompasses the complete history of conversations between contributors to Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversations—including not only comments and replies, but also their modifications, deletions and restorations—this data offers an unprecedented view of online conversation. This level of detail supports new research questions pertaining to the process (and challenges) of largescale online collaboration. We illustrate the corpus’ potential with two case studies that highlight new perspectives on earlier work. First, we explore how a person’s conversational behavior depends on how they relate to the discussion’s venue. Second, we show that community moderation of toxic behavior happens at a higher rate than previously estimated. Finally the reconstruction framework is designed to be language agnostic, and we show that it can extract high quality conversational data in both Chinese and English. 
WikiLinkGraphs  Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the $9$ largest language editions. The dataset contains yearly snapshots of the network and spans $17$ years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset. 
Wikipedia WordNet Based QE Technique (WWQE) 
Query expansion (QE) is a well known technique to enhance the effectiveness of information retrieval (IR). QE reformulates the initial query by adding similar terms that helps in retrieving more relevant results. Several approaches have been proposed with remarkable outcome, but they are not evenly favorable for all types of queries. One of the main reasons for this is the use of the same data source while expanding both the individual and the phrase query terms. As a result, the holistic relationship among the query terms is not well captured. To address this issue, we have selected separate data sources for individual and phrase terms. Specifically, we have used WordNet for expanding individual terms and Wikipedia for expanding phrase terms. We have also proposed novel schemes for weighting expanded terms: inlink score (for terms extracted from Wikipedia) and a tfidf based scheme (for terms extracted from WordNet). In the proposed Wikipedia WordNet based QE technique (WWQE), we weigh the expansion terms twice: first, they are scored by the weighting scheme individually, and then, the weighting scheme scores the selected expansion terms in relation to the entire query using correlation score. The experimental results show that the proposed approach successfully combines Wikipedia and WordNet as demonstrated through a better performance on standard evaluation metrics on FIRE dataset. The proposed WWQE approach is also suitable with other standard weighting models for improving the effectiveness of IR. 
Wikipedia2Vec  We present Wikipedia2Vec, an open source tool for learning embeddings of words and entities from Wikipedia. This tool enables users to easily obtain highquality embeddings of words and entities from a Wikipedia dump with a single command. The learned embeddings can be used as features in downstream natural language processing (NLP) models. The tool can be installed via PyPI. The source code, documentation, and pretrained embeddings for 12 major languages can be obtained at http://wikipedia2vec.github.io. 
WikiRank  Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the document. Then we transform the keyphrase extraction problem into an optimization problem on the graph. Finally, we get the optimal keyphrase set to be the output. Our method obtains improvements over other stateofart models by more than 2% in F1score. 
Wikistat 2.0  Big data, data science, deep learning, artificial intelligence are the key words of intense hype related with a job market in full evolution, that impose to adapt the contents of our university professional trainings. Which artificial intelligence is mostly concerned by the job offers? Which methodologies and technologies should be favored in the training pprograms? Which objectives, tools and educational resources do we needed to put in place to meet these pressing needs? We answer these questions in describing the contents and operational ressources in the Data Science orientation of the speciality Applied Mathematics at INSA Toulouse. We focus on basic mathematics training (Optimization, Probability, Statistics), associated with the practical implementation of the most performing statistical learning algorithms, with the most appropriate technologies and on real examples. Considering the huge volatility of the technologies, it is imperative to train students in sefttraining, this will be their technological watch tool when they will be in professional activity. This explains the structuring of the educational site https://…/wikistat into a set of tutorials. Finally, to motivate the thorough practice of these tutorials, a serious game is organized each year in the form of a prediction contest between students of Master degrees in Applied Mathematics for IA. 
Wild ScaleEnhanced Bootstrap (WiSE) 
WiSEBoot 
WildlyUnsupervised Domain Adaptation (WUDA) 
Unsupervised domain adaptation (UDA) trains with clean labeled data in source domain and unlabeled data in target domain to classify targetdomain data. However, in realworld scenarios, it is hard to acquire fullyclean labeled data in source domain due to the expensive labeling cost. This brings us a new but practical adaptation called wildlyunsupervised domain adaptation (WUDA), which aims to transfer knowledge from noisy labeled data in source domain to unlabeled data in target domain. To tackle the WUDA, we present a robust onestep approach called Butterfly, which trains four networks. Specifically, two networks are jointly trained on noisy labeled data in source domain and pseudolabeled data in target domain (i.e., data in mixture domain). Meanwhile, the other two networks are trained on pseudolabeled data in target domain. By using dualchecking principle, Butterfly can obtain highquality targetspecific representations. We conduct experiments to demonstrate that Butterfly significantly outperforms other baselines on simulated and realworld WUDA tasks in most cases. 
Windowbased Sentence Boundary Evaluation (WiSeBE) 
Sentence Boundary Detection (SBD) has been a major research topic since Automatic Speech Recognition transcripts have been used for further Natural Language Processing tasks like Part of Speech Tagging, Question Answering or Automatic Summarization. But what about evaluation? Do standard evaluation metrics like precision, recall, Fscore or classification error; and more important, evaluating an automatic system against a unique reference is enough to conclude how well a SBD system is performing given the final application of the transcript? In this paper we propose Windowbased Sentence Boundary Evaluation (WiSeBE), a semisupervised metric for evaluating Sentence Boundary Detection systems based on multireference (dis)agreement. We evaluate and compare the performance of different SBD systems over a set of Youtube transcripts using WiSeBE and standard metrics. This double evaluation gives an understanding of how WiSeBE is a more reliable metric for the SBD task. 
WindowBounded coOccurrence  This paper focuses on a traditional relation extraction task in the context of limited annotated data and a narrow knowledge domain. We explore this task with a clinical corpus consisting of 200 breast cancer followup treatment letters in which 16 distinct types of relations are annotated. We experiment with an approach to extracting typed relations called windowbounded cooccurrence (WBC), which uses an adjustable context window around entity mentions of a relevant type, and compare its performance with a more typical intrasentential cooccurrence baseline. We further introduce a new bagofconcepts (BoC) approach to feature engineering based on the stateoftheart word embeddings and word synonyms. We demonstrate the competitiveness of BoC by comparing with methods of higher complexity, and explore its effectiveness on this small dataset. 
Windowed Fourier Filtering  Interferometric phase (InPhase) imaging is an important part of many presentday coherent imaging technologies. Often in such imaging techniques, the acquired images, known as interferograms, suffer from two major degradations: 1) phase wrapping caused by the fact that the sensing mechanism can only measure sinusoidal $2\pi$periodic functions of the actual phase, and 2) noise introduced by the acquisition process or the system. This work focusses on InPhase denoising which is a fundamental restoration step to many posterior applications of InPhase, namely to phase unwrapping. The presence of sharp fringes that arises from phase wrapping makes InPhase denoising a hardinverse problem. Motivated by the fact that the InPhase images are often locally sparse in Fourier domain, we propose a multiresolution windowed Fourier filtering (WFF) analysis that fuses WFF estimates with different resolutions, thus overcoming the WFF fixed resolution limitation. The proposed fusion relies on an unbiased estimate of the mean square error derived using the Stein’s lemma adapted to complexvalued signals. This estimate, known as SURE, is minimized using an optimization framework to obtain the fusion weights. Strong experimental evidence, using synthetic and real (InSAR & MRI) data, that the developed algorithm, termed as SUREfuse WFF, outperforms the best handtuned fixed resolution WFF as well as other stateoftheart InPhase denoising algorithms, is provided. 
Wire Data  Wire data is the information that passes over computer and telecommunication networks defining communications between client and server devices. It is the result of decoding wire and transport protocols containing the bidirectional data payload. More precisely, wire data is the information that is communicated in each layer of the OSI model (Layer 1 not being included because those protocols are used to establish connections and do not communicate information). 
Wisdom of Crowds (WOC) 
The wisdom of the crowd is the collective opinion of a group of individuals rather than that of a single expert. A large group’s aggregated answers to questions involving quantity estimation, general world knowledge, and spatial reasoning has generally been found to be as good as, and often better than, the answer given by any of the individuals within the group. An explanation for this phenomenon is that there is idiosyncratic noise associated with each individual judgment, and taking the average over a large number of responses will go some way toward canceling the effect of this noise.[1] This process, while not new to the Information Age, has been pushed into the mainstream spotlight by social information sites such as Wikipedia, Yahoo! Answers, Quora, and other web resources that rely on human opinion.[2] Trial by jury can be understood as wisdom of the crowd, especially when compared to the alternative, trial by a judge, the single expert. In politics, sometimes sortition is held as an example of what wisdom of the crowd would look like. Decisionmaking would happen by a diverse group instead of by a fairly homogenous political group or party. Research within cognitive science has sought to model the relationship between wisdom of the crowd effects and individual cognition. WoCE: a framework for clustering ensemble by exploiting the wisdom of Crowds theory 
Wishart Distribution  In statistics, the Wishart distribution is a generalization to multiple dimensions of the chisquared distribution, or, in the case of noninteger degrees of freedom, of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928. It is a family of probability distributions defined over symmetric, nonnegativedefinite matrixvalued random variables (‘random matrices’). These distributions are of great importance in the estimation of covariance matrices in multivariate statistics. In Bayesian statistics, the Wishart distribution is the conjugate prior of the inverse covariancematrix of a multivariatenormal randomvector. 
Wishart Matrix  ➘ “Wishart Distribution” rWishart 
WitnessCounting Problem  Fast Witness Counting 
WNet  Crowd management is of paramount importance when it comes to preventing stampedes and saving lives, especially in a country like China and India where the combined population is a third of the global population. Millions of people convene annually all around the nation to celebrate a myriad of events and crowd count estimation is the linchpin of the crowd management system that could prevent stampedes and save lives. We present a network for crowd counting which reports state of the art results on crowd counting benchmarks. Our contributions are, first, a UNet inspired model which affords us to report state of the art results. Second, we propose an independent decoding Reinforcement branch which helps the network converge much earlier and also enables the network to estimate density maps with high Structural Similarity Index (SSIM). Third, we discuss the drawbacks of the contemporary architectures and empirically show that even though our architecture achieves state of the art results, the merit may be due to the encoderdecoder pipeline instead. Finally, we report the error analysis which shows that the contemporary line of work is at saturation and leaves certain prominent problems unsolved. 
Wolfson Polarization Index  affluenceIndex 
Word Embedding Association Test (WEAT) 
Universal Sentence Encoder 
Word Embedding Attention Network (WEAN) 
Most recent approaches use the sequencetosequence model for paraphrase generation. The existing sequencetosequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words. Therefore, the generated sentences are often grammatically correct but semantically improper. In this work, we introduce a novel model based on the encoderdecoder framework, called Word Embedding Attention Network (WEAN). Our proposed model generates the words by querying distributed word representations (i.e. neural word embeddings), hoping to capturing the meaning of the according words. Following previous work, we evaluate our model on two paraphraseoriented tasks, namely text simplification and short text abstractive summarization. Experimental results show that our model outperforms the sequencetosequence baseline by the BLEU score of 6.3 and 5.5 on two English text simplification datasets, and the ROUGE2 F1 score of 5.7 on a Chinese summarization dataset. Moreover, our model achieves stateoftheart performances on these three benchmark datasets. 
Word Encoded Sequence Transducer (WEST) 
Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights. This is a bottleneck in memory constraint ondevice training applications like federated learning and ondevice inference applications like automatic speech recognition (ASR). One way of compressing the embedding and softmax layers is to substitute larger units such as words with smaller subunits such as characters. However, often the subunit models perform poorly compared to the larger unit models. We propose WEST, an algorithm for encoding categorical features and output classes with a sequence of random or domain dependent subunits and demonstrate that this transduction can lead to significant compression without compromising performance. WEST bridges the gap between larger unit and subunit models and can be interpreted as a MaxEnt model over subunit features, which can be of independent interest. 
Word ExtrAction for time SEries cLassification (WEASEL) 
Time series (TS) occur in many scientific and commercial applications, ranging from earth surveillance to industry automation to the smart grids. An important type of TS analysis is classification, which can, for instance, improve energy load forecasting in smart grids by detecting the types of electronic devices based on their energy consumption profiles recorded by automatic sensors. Such sensordriven applications are very often characterized by (a) very long TS and (b) very large TS datasets needing classification. However, current methods to time series classification (TSC) cannot cope with such data volumes at acceptable accuracy; they are either scalable but offer only inferior classification quality, or they achieve stateoftheart classification quality but cannot scale to large data volumes. In this paper, we present WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is both scalable and accurate. Like other stateoftheart TSC methods, WEASEL transforms time series into feature vectors, using a slidingwindow approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set. On the popular UCR benchmark of 85 TS datasets, WEASEL is more accurate than the best current nonensemble algorithms at ordersofmagnitude lower classification and training times, and it is almost as accurate as ensemble classifiers, whose computational complexity makes them inapplicable even for midsize datasets. The outstanding robustness of WEASEL is also confirmed by experiments on two real smart grid datasets, where it outofthebox achieves almost the same accuracy as highly tuned, domainspecific methods. 
Word Sense Induction (WSI) 
In computational linguistics, wordsense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word (i.e. meanings). Given that the output of wordsense induction is a set of senses for the target word (sense inventory), this task is strictly related to that of wordsense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context. 
Word Vectors  Word vectors (also referred to as distributed representations) are an amazing alternative that sweep away most of the issues of dealing with NLP. They let us ignore the difficulttounderstand grammar & syntax of language while retaining the ability to ask and answer simple questions about a text. https://…/word2vec 
Word2Bits  Word vectors require significant amounts of memory and storage, posing issues to resource limited devices like mobile phones and GPUs. We show that high quality quantized word vectors using 12 bits per parameter can be learned by introducing a quantization function into Word2Vec. We furthermore show that training with the quantization function acts as a regularizer. We train word vectors on English Wikipedia (2017) and evaluate them on standard word similarity and analogy tasks and on question answering (SQuAD). Our quantized word vectors not only take 816x less space than full precision (32 bit) word vectors but also outperform them on word similarity tasks and question answering. 
word2vec  This tool provides an efficient implementation of the continuous bagofwords and skipgram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. http://…/w2vexp.pdf DL4J: Word2Vec 
Wordcloud  A tag cloud (word cloud, or weighted list in visual design) is a visual representation for text data, typically used to depict keyword metadata (tags) on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color. This format is useful for quickly perceiving the most prominent terms and for locating a term alphabetically to determine its relative prominence. When used as website navigation aids, the terms are hyperlinked to items associated with the tag. tagcloud 
WordNet  WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and are freely available for download from the WordNet website. Both the lexicographic data (lexicographer files) and the compiler (called grind) for producing the distributed database are available. 
Wordswarm  WordSwarm generates dynamic word clouds in which the word size changes as the animation moves forward through the corpus. The top words from the preprocessing are colored randomly or from an assigned pallet, sized according to their magnitude at the first date, and then displayed in a pseudorandom location on the screen. The animation progresses into the future by growing or shrinking each word according to its frequency in the corpus at the next date. Clash detection is achieved using a 2D physics engine, which also applies ‘gravitational force’ to each word, bringing the larger words closer to the center of the screen. 
Work Stealing Load Balancing Algorithm  A methodology for efficient load balancing of computational problems that can be easily decomposed into multiple tasks, but where it is hard to predict the computation cost of each task, and where new tasks are created dynamically during runtime. We present this methodology and its exploitation and feasibility in the context of graphics processors. Workstealing allows an idle core to acquire tasks from a core that is overloaded, causing the total work to be distributed evenly among cores, while minimizing the communication costs, as tasks are only redistributed when required. This will often lead to higher throughput than using static partitioning. Work Stealing with latency 
Workflow Satisfiability Problem  The Workflow Satisfiability Problem (WSP) Asks Whether There Exists an Assignment of Authorized Users to the Steps in a Workflow Specification That Satisfies the Constraints in the Specification. The Problem is NPHard in General, but Several Subclasses of the Problem are Known to be FixedParameter Tractable (FPT) When Parameterized by the Number of Steps in the Specification. Bounded and Approximate Strong Satisfiability in Workflows 
Workforce Analytics  Workforce analytics is a combination of software and methodology that applies statistical models to workerrelated data, allowing enterprise leaders to optimize human resource management (HRM). 
Working Memory Network  During the last years, there has been a lot of interest in achieving some kind of complex reasoning using deep neural networks. To do that, models like Memory Networks (MemNNs) have combined external memory storages and attention mechanisms. These architectures, however, lack of more complex reasoning mechanisms that could allow, for instance, relational reasoning. Relation Networks (RNs), on the other hand, have shown outstanding results in relational reasoning tasks. Unfortunately, their computational cost grows quadratically with the number of memories, something prohibitive for larger problems. To solve these issues, we introduce the Working Memory Network, a MemNN architecture with a novel working memory storage and reasoning module. Our model retains the relational reasoning abilities of the RN while reducing its computational complexity from quadratic to linear. We tested our model on the text QA dataset bAbI and the visual QA dataset NLVR. In the jointly trained bAbI10k, we set a new stateoftheart, achieving a mean error of less than 0.5%. Moreover, a simple ensemble of two of our models solves all 20 tasks in the joint version of the benchmark. 
WorkloadAware AutoParallelization Framework (WAP) 
Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. MultiGPU parallelization is a popular option to accelerate demanding computations in DNN training, but most stateoftheart multiGPU deep learning frameworks not only require users to have an indepth understanding of the implementation of the frameworks themselves, but also apply parallelization in a straightforward way without optimizing GPU utilization. In this work, we propose a workloadaware autoparallelization framework (WAP) for DNN training, where the work is automatically distributed to multiple GPUs based on the workload characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks (AlexNet and VGG16), and show competitive training throughput compared with the stateoftheart frameworks, and also demonstrate that WAP automatically optimizes GPU assignment based on the workload’s compute requirements, thereby improving energy efficiency. 
WPUNet  Deep learning has driven great progress in natural and biological image processing. However, in materials science and engineering, there are often some flaws and indistinctions in material microscopic images induced from complex sample preparation, even due to the material itself, hindering the detection of target objects. In this work, we propose WPUnet that redesign the architecture and weighted loss of UNet to force the network to integrate information from adjacent slices and pay more attention to the topology in this boundary detection task. Then, the WPUnet was applied into a typical material example, i.e., the grain boundary detection of polycrystalline material. Experiments demonstrate that the proposed method achieves promising performance compared to stateoftheart methods. Besides, we propose a new method for object tracking between adjacent slices, which can effectively reconstruct the 3D structure of the whole material while maintaining relative accuracy. 
Write Once, Deploy Anywhere (WODA) 

Write Once, Run Anywhere (WORA) 
Write once, run anywhere’ (WORA), or sometimes write once, run everywhere (WORE), is a slogan created by Sun Microsystems to illustrate the crossplatform benefits of the Java language. Ideally, this means Java can be developed on any device, compiled into a standard bytecode and be expected to run on any device equipped with a Java virtual machine (JVM). The installation of a JVM or Java interpreter on chips, devices or software packages has become an industry standard practice. This means a programmer can develop code on a PC and can expect it to run on Java enabled cell phones, as well as on routers and mainframes equipped with Java, without any adjustments. This is intended to save software developers the effort of writing a different version of their software for each platform or operating system they intend to deploy on. This idea originated as early as in the late 1970s, when the UCSD Pascal system was developed to produce and interpret pcode. UCSD Pascal (along with the Smalltalk virtual machine) was a key influence on the design of the Java virtual machine, as is cited by James Gosling. The catch is that since there are multiple JVM implementations, on top of a wide variety of different operating systems such as Windows, Linux, Solaris, NetWare, HPUX, and Mac OS, there can be subtle differences in how a program may execute on each JVM/OS combination, which may require an application to be tested on various target platforms. This has given rise to a joke among Java developers, ‘Write Once, Debug Everywhere’. This architecture has sometimes been criticized as ‘Saying that Java is better because it works in all platforms is like saying that Anal Sex is better because it works with all genders.’. In comparison, the Squeak Smalltalk programming language and environment, boasts as being, ‘truly write once run anywhere’, because it ‘runs bitidentical images across its wide portability base’ 
WStream  In the recent years, the scale of graph datasets has increased to such a degree that a single machine is not capable of efficiently processing large graphs. Thereby, efficient graph partitioning is necessary for those large graph applications. Traditional graph partitioning generally loads the whole graph data into the memory before performing partitioning; this is not only a time consuming task but it also creates memory bottlenecks. These issues of memory limitation and enormous time complexity can be resolved using streambased graph partitioning. A streaming graph partitioning algorithm reads vertices once and assigns that vertex to a partition accordingly. This is also called an onepass algorithm. This paper proposes an efficient windowbased streaming graph partitioning algorithm called WStream. The WStream algorithm is an edgecut partitioning algorithm, which distributes a vertex among the partitions. Our results suggest that the WStream algorithm is able to partition large graph data efficiently while keeping the load balanced across different partitions, and communication to a minimum. Evaluation results with real workloads also prove the effectiveness of our proposed algorithm, and it achieves a significant reduction in load imbalance and edgecut with different ranges of dataset. 