W2VLDA With the increase of online customer opinions in specialised websites and social networks, the necessity of automatic systems to help to organise and classify customer reviews by domain-specific aspect/categories and sentiment polarity is more important than ever. Supervised approaches to Aspect Based Sentiment Analysis obtain good results for the domain/language their are trained on, but having manually labelled data for training supervised systems for all domains and languages use to be very costly and time consuming. In this work we describe W2VLDA, an unsupervised system based on topic modelling, that combined with some other unsupervised methods and a minimal configuration, performs aspect/category classifiation, aspectterms/opinion-words separation and sentiment polarity classification for any given domain and language. We also evaluate the performance of the aspect and sentiment classification in the multilingual SemEval 2016 task 5 (ABSA) dataset. We show competitive results for several languages (English, Spanish, French and Dutch) and domains (hotels, restaurants, electronic-devices).
Waffle Chart / Square Pie Chart A little-known alternative to the round pie chart is the square pie or waffle chart. It consists of a square that is divided into 10×10 cells, making it possible to read values precisely down to a single percent. Depending on how the areas are laid out (as square as possible seems to be the best idea), it is very easy to compare parts to the whole.
Waikato Environment for Knowledge Analysis
Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. Weka is free software available under the GNU General Public License.
Wait-if-Diff and Wait-if-Worse Agent (Cho and Esipova, 2016)
Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation
wakefield wakefield is a Github based R package which is designed to quickly generate random data sets. The user passes n (number of rows) and predefined vectors to the r_data_frame function to produce a dplyr::tbl_df object.
Wake-Sleep Algorithm The wake-sleep algorithm is an unsupervised learning algorithm for a multilayer neural network (e.g. sigmoid belief net). Training is divided into two phases, ‘wake’ and ‘sleep’. In the ‘wake’ phase, neurons are driven by recognition connections (connections from what would normally be considered an input to what is normally considered an output), while generative connections (those from outputs to inputs) are modified to increase the probability that they would reconstruct the correct activity in the layer below (closer to the sensory input). In the ‘sleep’ phase the process is reversed: neurons are driven by generative connections, while recognition connections are modified to increase the probability that they would produce the correct activity in the layer above (further from sensory input).
Walk-Steered Convolution
Graph classification is a fundamental but challenging problem due to the non-Euclidean property of graph. In this work, we jointly leverage the powerful representation ability of random walk and the essential success of standard convolutional network work (CNN), to propose a random walk based convolutional network, called walk-steered convolution (WSC). Different from those existing graph CNNs with deterministic neighbor searching, we randomly sample multi-scale walk fields by using random walk, which is more flexible to the scalability of graph. To encode each-scale walk field consisting of several walk paths, specifically, we characterize the directions of walk field by multiple Gaussian models so as to better analogize the standard CNNs on images. Each Gaussian implicitly defines a directions and all of them properly encode the spatial layout of walks after the gradient projecting to the space of Gaussian parameters. Further, a graph coarsening layer using dynamical clustering is stacked upon the Gaussian encoding to capture high-level semantics of graph. Comprehensive evaluations on several public datasets well demonstrate the superiority of our proposed graph learning method over other state-of-the-arts for graph classification.
Walktrap Community Algorithm Tries to find densely connected subgraphs, also called communities in a graph via random walks. The idea is that short random walks tend to stay in the same community.
Wallaroo Wallaroo is a fast, elastic data processing engine that rapidly takes you from prototype to production by eliminating infrastructure complexity. Wallaroo is a fast and elastic data processing engine that rapidly takes you from prototype to production by making the infrastructure virtually disappear. We´ve designed it to handle demanding high-throughput, low-latency tasks where the accuracy of results is essential. Wallaroo takes care of mechanics of scaling, resilience, state management, and message delivery. We’ve designed Wallaroo to make it easy scale applications with no code changes, and allow programmers to focus on business logic.
Walsh Figure of Merit LowWAFOMNX
Ward Hierarchical Clustering “Ward’s Method”
Ward’s Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm
Ward’s Method In statistics, Ward’s method is a criterion applied in hierarchical cluster analysis. Ward’s minimum variance method is a special case of the objective function approach originally presented by Joe H. Ward, Jr. Ward suggested a general agglomerative hierarchical clustering procedure, where the criterion for choosing the pair of clusters to merge at each step is based on the optimal value of an objective function. This objective function could be ‘any function that reflects the investigator’s purpose.’ Many of the standard clustering procedures are contained in this very general class. To illustrate the procedure, Ward used the example where the objective function is the error sum of squares, and this example is known as Ward’s method or more precisely Ward’s minimum variance method.
Ward’s Method
WarpFlow WarpFlow is a fast, interactive data querying and processing system with a focus on petabyte-scale spatiotemporal datasets and Tesseract queries. With the rapid growth in smartphones and mobile navigation services, we now have an opportunity to radically improve urban mobility and reduce friction in how people and packages move globally every minute-mile, with data. WarpFlow speeds up three key metrics for data engineers working on such datasets — time-to-first-result, time-to-full-scale-result, and time-to-trained-model for machine learning.
WarpLDA Developing efficient and scalable algorithms for Latent Dirichlet Allocation (LDA) is of wide interest for many applications. Previous work has developed an $O(1)$ Metropolis-Hastings sampling method for each token. However, the performance is far from being optimal due to random accesses to the parameter matrices and frequent cache misses. In this paper, we propose WarpLDA, a novel $O(1)$ sampling algorithm for LDA. WarpLDA is a Metropolis-Hastings based algorithm which is designed to optimize the cache hit rate. Advantages of WarpLDA include 1) Efficiency and scalability: WarpLDA has good locality and carefully designed partition method, and can be scaled to hundreds of machines; 2) Simplicity: WarpLDA does not have any complicated modules such as alias tables, hybrid data structures, or parameter servers, making it easy to understand and implement; 3) Robustness: WarpLDA is consistently faster than other algorithms, under various settings from small-scale to massive-scale dataset and model. WarpLDA is 5-15x faster than state-of-the-art LDA samplers, implying less cost of time and money. With WarpLDA users can learn up to one million topics from hundreds of millions of documents in a few hours, at the speed of 2G tokens per second, or learn topics from small-scale datasets in seconds.
Wasserstein Auto-Encoder
We propose the Wasserstein Auto-Encoder (WAE)—a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads to a different regularizer than the one used by the Variational Auto-Encoder (VAE). This regularizer encourages the encoded training distribution to match the prior. We compare our algorithm with several other techniques and show that it is a generalization of adversarial auto-encoders (AAE). Our experiments show that WAE shares many of the properties of VAEs (stable training, encoder-decoder architecture, nice latent manifold structure) while generating samples of better quality, as measured by the FID score.
Wasserstein Barycenter Wasserstein barycenter is a single distribution that summarizes a collection of input measures while respecting their geometry.
Wasserstein CNN
Heterogeneous face recognition (HFR) aims to match facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR is a much more challenging problem than traditional face recognition because of large intra-class variations of heterogeneous face images and limited training samples of cross-modality face image pairs. This paper proposes a novel approach namely Wasserstein CNN (convolutional neural networks, or WCNN for short) to learn invariant features between near-infrared and visual face images (i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with widely available face images in visual spectrum. The high-level layer is divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer. The first two layers aims to learn modality-specific features and NIR-VIS shared layer is designed to learn modality-invariant feature subspace. Wasserstein distance is introduced into NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. So W-CNN learning aims to achieve the minimization of Wasserstein distance between NIR distribution and VIS distribution for invariant deep feature representation of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected layers of WCNN network to reduce parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at training stage and an efficient computation for heterogeneous data at testing stage. Extensive experiments on three challenging NIR-VIS face recognition databases demonstrate the significant superiority of Wasserstein CNN over state-of-the-art methods.
Wasserstein Discriminant Analysis
Wasserstein Discriminant Analysis (WDA) is a new supervised method that can improve classification of high-dimensional data by computing a suitable linear map onto a lower dimensional subspace. Following the blueprint of classical Linear Discriminant Analysis (LDA), WDA selects the projection matrix that maximizes the ratio of two quantities: the dispersion of projected points coming from different classes, divided by the dispersion of projected points coming from the same class. To quantify dispersion, WDA uses regularized Wasserstein distances, rather than cross-variance measures which have been usually considered, notably in LDA. Thanks to the the underlying principles of optimal transport, WDA is able to capture both global (at distribution scale) and local (at samples scale) interactions between classes. Regularized Wasserstein distances can be computed using the Sinkhorn matrix scaling algorithm; We show that the optimization of WDA can be tackled using automatic differentiation of Sinkhorn iterations. Numerical experiments show promising results both in terms of prediction and visualization on toy examples and real life datasets such as MNIST and on deep features obtained from a subset of the Caltech dataset.
Wasserstein Distance “Wasserstein Metric”
Wasserstein GAN
Despite being impactful on a variety of problems and applications, the generative adversarial nets (GANs) are remarkably difficult to train. This issue is formally analyzed by \cite{arjovsky2017towards}, who also propose an alternative direction to avoid the caveats in the minmax two-player training of GANs. The corresponding algorithm, called Wasserstein GAN (WGAN), hinges on the 1-Lipschitz continuity of the discriminator. In this paper, we propose a novel approach to enforcing the Lipschitz continuity in the training procedure of WGANs. Our approach seamlessly connects WGAN with one of the recent semi-supervised learning methods. As a result, it gives rise to not only better photo-realistic samples than the previous methods but also state-of-the-art semi-supervised learning results. In particular, our approach gives rise to the inception score of more than 5.0 with only 1,000 CIFAR-10 images and is the first that exceeds the accuracy of 90% on the CIFAR-10 dataset using only 4,000 labeled images, to the best of our knowledge.
Wasserstein Identity Testing Problem Uniformity testing and the more general identity testing are well studied problems in distributional property testing. Most previous work focuses on testing under $L_1$-distance. However, when the support is very large or even continuous, testing under $L_1$-distance may require a huge (even infinite) number of samples. Motivated by such issues, we consider the identity testing in Wasserstein distance (a.k.a. transportation distance and earthmover distance) on a metric space (discrete or continuous). In this paper, we propose the Wasserstein identity testing problem (Identity Testing in Wasserstein distance). We obtain nearly optimal worst-case sample complexity for the problem. Moreover, for a large class of probability distributions satisfying the so-called ‘Doubling Condition’, we provide nearly instance-optimal sample complexity.
Wasserstein Introspective Neural Network
We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model. WINN provides a significant improvement over the recent introspective neural networks (INN) method by enhancing INN’s generative modeling capability. WINN has three interesting properties: (1) A mathematical connection between the formulation of Wasserstein generative adversarial networks (WGAN) and the INN algorithm is made; (2) The explicit adoption of the WGAN term into INN results in a large enhancement to INN, achieving compelling results even with a single classifier on e.g., providing a 20 times reduction in model size over INN within texture modeling; (3) When applied to supervised classification, WINN also gives rise to greater robustness with an $88\%$ reduction of errors against adversarial examples — improved over the result of $39\%$ by an INN-family algorithm. In the experiments, we report encouraging results on unsupervised learning problems including texture, face, and object modeling, as well as a supervised classification task against adversarial attack.
Wasserstein Metric In mathematics, the Wasserstein (or Vasershtein) metric is a distance function defined between probability distributions on a given metric space M. Intuitively, if each distribution is viewed as a unit amount of ‘dirt’ piled on M, the metric is the minimum ‘cost’ of turning one pile into the other, which is assumed to be the amount of dirt that needs to be moved times the distance it has to be moved. Because of this analogy, the metric is known in computer science as the earth mover’s distance. The name ‘Wasserstein distance’ was coined by R. L. Dobrushin in 1970, after the Russian mathematician Leonid Vaseršteĭn who introduced the concept in 1969. Most English-language publications use the German spelling ‘Wasserstein’ (attributed to the name ‘Vasershtein’ being of German origin).
“Earth Mover’s Distance”
Wasserstein Distance
Wasserstein Transform We introduce the Wasserstein transform, a method for enhancing and denoising datasets defined on general metric spaces. The construction draws inspiration from Optimal Transportation ideas. We establish precise connections with the mean shift family of algorithms and establish the stability of both our method and mean shift under data perturbation.
Wasserstein Variational Gradient Descent Particle-based variational inference offers a flexible way of approximating complex posterior distributions with a set of particles. In this paper we introduce a new particle-based variational inference method based on the theory of semi-discrete optimal transport. Instead of minimizing the KL divergence between the posterior and the variational approximation, we minimize a semi-discrete optimal transport divergence. The solution of the resulting optimal transport problem provides both a particle approximation and a set of optimal transportation densities that map each particle to a segment of the posterior distribution. We approximate these transportation densities by minimizing the KL divergence between a truncated distribution and the optimal transport solution. The resulting algorithm can be interpreted as a form of ensemble variational inference where each particle is associated with a local variational approximation.
Wasserstein Variational Inference This paper introduces Wasserstein variational inference, a new form of approximate Bayesian inference based on optimal transport theory. Wasserstein variational inference uses a new family of divergences that includes both f-divergences and the Wasserstein distance as special cases. The gradients of the Wasserstein variational loss are obtained by backpropagating through the Sinkhorn iterations. This technique results in a very stable likelihood-free training method that can be used with implicit distributions and probabilistic programs. Using the Wasserstein variational inference framework, we introduce several new forms of autoencoders and test their robustness and performance against existing variational autoencoding techniques.
Wasserstein-Wasserstein Auto-Encoder
To address the challenges in learning deep generative models (e.g.,the blurriness of variational auto-encoder and the instability of training generative adversarial networks, we propose a novel deep generative model, named Wasserstein-Wasserstein auto-encoders (WWAE). We formulate WWAE as minimization of the penalized optimal transport between the target distribution and the generated distribution. By noticing that both the prior $P_Z$ and the aggregated posterior $Q_Z$ of the latent code Z can be well captured by Gaussians, the proposed WWAE utilizes the closed-form of the squared Wasserstein-2 distance for two Gaussians in the optimization process. As a result, WWAE does not suffer from the sampling burden and it is computationally efficient by leveraging the reparameterization trick. Numerical results evaluated on multiple benchmark datasets including MNIST, fashion- MNIST and CelebA show that WWAE learns better latent structures than VAEs and generates samples of better visual quality and higher FID scores than VAEs and GANs.
Watanabe-Akaike Information Criteria
WAIC (the Watanabe-Akaike or widely applicable information criterion; Watanabe, 2010) can be viewed as an improvement on the deviance information criterion (DIC) for Bayesian models. DIC has gained popularity in recent years in part through its implementation in the graphical modeling package BUGS (Spiegelhalter, Best, et al., 2002; Spiegelhalter, Thomas, et al., 1994, 2003), but is known to have some problems, arising in part from it not being fully Bayesian in that it is based on a point estimate (van der Linde, 2005, Plummer, 2008). For example, DIC can produce negative estimates of the effective number of parameters in a model and it is not defined for singular models. WAIC is fully Bayesian and closely approximates Bayesian cross-validation. Unlike DIC, WAIC is invariant to parametrization and also works for singular models.
A Widely Applicable Bayesian Information Criterion
Watchdog AI
Artificial Intelligence (AI) technologies could be broadly categorised into Analytics and Autonomy. Analytics focuses on algorithms offering perception, comprehension, and projection of knowledge gleaned from sensorial data. Autonomy revolves around decision making, and influencing and shaping the environment through action production. A smart autonomous system (SAS) combines analytics and autonomy to understand, learn, decide and act autonomously. To be useful, SAS must be trusted and that requires testing. Lifelong learning of a SAS compounds the testing process. In the remote chance that it is possible to fully test and certify the system pre-release, which is theoretically an undecidable problem, it is near impossible to predict the future behaviours that these systems, alone or collectively, will exhibit. While it may be feasible to severely restrict such systems\textquoteright \ learning abilities to limit the potential unpredictability of their behaviours, an undesirable consequence may be severely limiting their utility. In this paper, we propose the architecture for a watchdog AI (WAI) agent dedicated to lifelong functional testing of SAS. We further propose system specifications including a level of abstraction whereby humans shepherd a swarm of WAI agents to oversee an ecosystem made of humans and SAS. The discussion extends to the challenges, pros, and cons of the proposed concept.
Waterfall Bandits A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that order. The publisher picks the order and prices to maximize her revenue. A traditional solution is to learn the demand model and then subsequently solve the optimization problem for the given demand model. This will incur a linear regret. We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret. We evaluate the algorithm on both synthetic and real-world data, and show that it quickly learns high quality pricing strategies. This is the first principled study of learning a waterfall design online by sequential experimentation.
Waterfall Chart A waterfall chart is a form of data visualization that helps in understanding the cumulative effect of sequentially introduced positive or negative values. The waterfall chart is also known as a flying bricks chart or Mario chart due to the apparent suspension of columns (bricks) in mid-air. Often in finance, it will be referred to as a bridge. Waterfall charts were popularized by the strategic consulting firm McKinsey & Company in its presentations to clients. The waterfall chart is normally used for understanding how an initial value is affected by a series of intermediate positive or negative values. Usually the initial and the final values are represented by whole columns, while the intermediate values are denoted by floating columns. The columns are color-coded for distinguishing between positive and negative values.
“Waterfall Chart”
Understanding Waterfall Plots
Waterfall plots – what and how?
Waterfall Plot A waterfall plot is a three-dimensional plot in which multiple curves of data, typically spectra, are displayed simultaneously. Typically the curves are staggered both across the screen and vertically, with ‘nearer’ curves masking the ones behind. The result is a series of ‘mountain’ shapes that appear to be side by side. The waterfall plot is often used to show how two-dimensional information changes over time or some other variable such as rpm. The term ‘waterfall plot’ is sometimes used interchangeably with ‘spectrogram’ or ‘Cumulative Spectral Decay’ (CSD) plot.
wav2letter++ This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++’s training times scale linearly to 64 GPUs, the highest we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks.
Introducing Wav2letter++
Wav2Pix Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or one-hot encoding). Our model is trained in a self-supervised approach by exploiting the audio and visual signals naturally aligned in videos. With the purpose of training from video data, we present a novel dataset collected for this work, with high-quality videos of youtubers with notable expressiveness in both the speech and visual signals.
wav2vec We explore unsupervised pre-training for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model training. We pre-train a simple multi-layer convolutional neural network optimized via a noise contrastive binary classification task. Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 32% when only a few hours of transcribed data is available. Our approach achieves 2.78% WER on the nov92 test set. This outperforms Deep Speech 2, the best reported character-based system in the literature while using three orders of magnitude less labeled training data.
Wave Oriented Swarm Programming Paradigm
In this work, we present a programming paradigm allowing the control of swarms with a minimum communication bandwidth in a simple manner, yet allowing the emergence of diverse complex behaviors and autonomy of the swarm. Communication in the proposed paradigm is based on single bit ‘ping’-signals propagating as information-waves throughout the swarm. We show that even this minimum bandwidth communication between agents suffices for the design of a substantial set of behaviors in the domain of essential behaviors of a collective, including locomotion and self awareness of the swarm.
WaveGlow In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable. Our PyTorch implementation produces audio samples at a rate of more than 500 kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio quality as good as the best publicly available WaveNet implementation. All code will be made publicly available online.
Wavelet Convolutional Neural Network Spatial and spectral approaches are two major approaches for image processing tasks such as image classification and object recognition. Among many such algorithms, convolutional neural networks (CNNs) have recently achieved significant performance improvement in many challenging tasks. Since CNNs process images directly in the spatial domain, they are essentially spatial approaches. Given that spatial and spectral approaches are known to have different characteristics, it will be interesting to incorporate a spectral approach into CNNs. We propose a novel CNN architecture, wavelet CNNs, which combines a multiresolution analysis and CNNs into one model. Our insight is that a CNN can be viewed as a limited form of a multiresolution analysis. Based on this insight, we supplement missing parts of the multiresolution analysis via wavelet transform and integrate them as additional components in the entire architecture. Wavelet CNNs allow us to utilize spectral information which is mostly lost in conventional CNNs but useful in most image processing tasks. We evaluate the practical performance of wavelet CNNs on texture classification and image annotation. The experiments show that wavelet CNNs can achieve better accuracy in both tasks than existing models while having significantly fewer parameters than conventional CNNs.
WaveletFCNN Wind power, as an alternative to burning fossil fuels, is plentiful and renewable. Data-driven approaches are increasingly popular for inspecting the wind turbine failures. In this paper, we propose a novel classification-based anomaly detection system for icing detection of the wind turbine blades. We effectively combine the deep neural networks and wavelet transformation to identify such failures sequentially across the time. In the training phase, we present a wavelet based fully convolutional neural network (FCNN), namely WaveletFCNN, for the time series classification. We improve the original (FCNN) by augmenting features with the wavelet coefficients. WaveletFCNN outperforms the state-of-the-art FCNN for the univariate time series classification on the UCR time series archive benchmarks. In the detecting phase, we combine the sliding window and majority vote algorithms to provide the timely monitoring of the anomalies. The system has been successfully implemented on a real-world dataset from Goldwind Inc, where the classifier is trained on a multivariate time series dataset and the monitoring algorithm is implemented to capture the abnormal condition on signals from a wind farm.
Wavelet-like Auto-Encoder
Accelerating deep neural networks (DNNs) has been attracting increasing attention as it can benefit a wide range of applications, e.g., enabling mobile systems with limited computing resources to own powerful visual recognition ability. A practical strategy to this goal usually relies on a two-stage process: operating on the trained DNNs (e.g., approximating the convolutional filters with tensor decomposition) and fine-tuning the amended network, leading to difficulty in balancing the trade-off between acceleration and maintaining recognition performance. In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels (sub-images) and incorporate the WAE into the classification neural networks for joint training. The two decomposed channels, in particular, are encoded to carry the low-frequency information (e.g., image profiles) and high-frequency (e.g., image details or noises), respectively, and enable reconstructing the original input image through the decoding process. Then, we feed the low-frequency channel into a standard classification network such as VGG or ResNet and employ a very lightweight network to fuse with the high-frequency channel to obtain the classification result. Compared to existing DNN acceleration solutions, our framework has the following advantages: i) it is tolerant to any existing convolutional neural networks for classification without amending their structures; ii) the WAE provides an interpretable way to preserve the main components of the input image for classification.
WaveletNet We present a logarithmic-scale efficient convolutional neural network architecture for edge devices, named WaveletNet. Our model is based on the well-known depthwise convolution, and on two new layers, which we introduce in this work: a wavelet convolution and a depthwise fast wavelet transform. By breaking the symmetry in channel dimensions and applying a fast algorithm, WaveletNet shrinks the complexity of convolutional blocks by an O(logD/D) factor, where D is the number of channels. Experiments on CIFAR-10 and ImageNet classification show superior and comparable performances of WaveletNet compared to state-of-the-art models such as MobileNetV2.
WaveNet Various sources have reported the WaveNet deep learning architecture being able to generate high-quality speech, but to our knowledge there haven’t been studies on the interpretation or visualization of trained WaveNets. This study investigates the possibility that WaveNet understands speech by unsupervisedly learning an acoustically meaningful latent representation of the speech signals in its receptive field; we also attempt to interpret the mechanism by which the feature extraction is performed. Suggested by singular value decomposition and linear regression analysis on the activations and known acoustic features (e.g. F0), the key findings are (1) activations in the higher layers are highly correlated with spectral features; (2) WaveNet explicitly performs pitch extraction despite being trained to directly predict the next audio sample and (3) for the said feature analysis to take place, the latent signal representation is converted back and forth between baseband and wideband components.
How WaveNet Works
Wavenilm Non-intrusive load monitoring (NILM) helps meet energy conservation goals by estimating individual appliance power usage from a single aggregate measurement. Deep neural networks have become increasingly popular in attempting to solve NILM problems; however, many of them are not causal which is important for real-time application. We present a causal 1-D convolutional neural network inspired by WaveNet for NILM on low-frequency data. We also study using various components of the complex power signal for NILM, and demonstrate that using all four components available in a popular NILM dataset (current, active power, reactive power, and apparent power) we achieve faster convergence and higher performance than state-of-the-art results for the same dataset.
W-Decorrelation Estimators computed from adaptively collected data do not behave like their non-adaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method decorrelation procedure — W-decorrelation — for transforming the bias of adaptive linear regression estimators into variance. The method uses only coarse-grained information about the data collection policy and does not need access to propensity scores or exact knowledge of the policy. We bound the finite-sample bias and variance of the W-estimator and develop asymptotically correct confidence intervals based on a novel martingale central limit theorem. We then demonstrate the empirical benefits of the generic W-decorrelation procedure in two different adaptive data settings: the multi-armed bandits and autoregressive time series models.
Weakly Structured Information Processing and Exploration
WIPE is used for managing the graph traversal manipulation with BI-like data aggregation. WIPE stands for “Weakly-structured Information Processing and Exploration”. It is a data manipulation and query language built on top of the graph functionality in the SAP HANA Database. Like other domain specific languages provided by SAP HANA Database, WIPE is embedded in transactional context, which means that multiple WIPE statements can be executed concurrently, guaranteeing the atomicity, consistency, isolation and durability. With the help of this language, multiple graph operations such as inserting, updating or deleting a node and other query operations can be declared in one complex statement. It is the graph abstraction layer in the SAP HANA Database that provides interaction with the graph data stored in the database by exposing graph concepts directly to the application developer. The application developer can create or delete graphs, access the existing graphs, modify the vertices and edges of the graphs, or retrieve a set of vertices and edges based on their attributes. Besides retrieval and manipulation functions, a set of built-in graph operators are also provided by the SAP HANA Database. These operators, such as breadth-first or depth-first traversal algorithms, interact with the column store of the relational engine to execute efficiently and in a highly optimum manner.
Weakly-Supervised Hierarchical Text Classification Hierarchical text classification, which aims to classify text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for text classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical text classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical text classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.
Weakly-Supervised Neural Text Classification Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.
Weakly-supervised Temporal Activity Localization
Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weakly-supervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. The proposed network can be divided into two sub-networks, namely the Two-Stream based feature extractor network and a weakly-supervised module, which we learn by optimizing two complimentary loss functions. Qualitative and quantitative results on two challenging datasets – Thumos14 and ActivityNet1.2, demonstrate that the proposed method is able to detect activities at a fine granularity and achieve better performance than current state-of-the-art methods.
Weaver We introduce a new distributed graph store, called Weaver, which enables efficient, transactional graph analyses as well as strictly serializable read-write transactions on dynamic graphs. The key insight that enables Weaver to combine strict serializability with horizontal scalability and high performance is a novel request ordering mechanism called refinable timestamps. This technique couples coarse-grained vector timestamps with a fine-grained timeline oracle to pay the overhead of strong consistency only when needed.
Web Analytics Web analytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage. Web analytics is not just a tool for measuring web traffic but can be used as a tool for business and market research, and to assess and improve the effectiveness of a website. Web analytics applications can also help companies measure the results of traditional print or broadcast advertising campaigns. It helps one to estimate how traffic to a website changes after the launch of a new advertising campaign. Web analytics provides information about the number of visitors to a website and the number of page views. It helps gauge traffic and popularity trends which is useful for market research. There are two categories of web analytics; off-site and on-site web analytics. Off-site web analytics refers to web measurement and analysis regardless of whether you own or maintain a website. It includes the measurement of a website’s potential audience (opportunity), share of voice (visibility), and buzz (comments) that is happening on the Internet as a whole. On-site web analytics measure a visitor’s behavior once on your website. This includes its drivers and conversions; for example, the degree to which different landing pages are associated with online purchases. On-site web analytics measures the performance of your website in a commercial context. This data is typically compared against key performance indicators for performance, and used to improve a website or marketing campaign’s audience response. Google Analytics is the most widely used on-site web analytics service; although new tools are emerging that provide additional layers of information, including heat maps and session replay. Historically, web analytics has been used to refer to on-site visitor measurement. However, in recent years this meaning has become blurred, mainly because vendors are producing tools that span both categories.
Web Data Commons The Web Data Commons project extracts structured data from the Common Crawl, the largest web corpus available to the public, and provides the extracted data for public download in order to support researchers and companies in exploiting the wealth of information that is available on the Web.
Web Mining Web mining – is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining.
Web of Data
The Semantic Web is a Web of Data – of dates and titles and part numbers and chemical properties and any other data one might conceive of. The collection of Semantic Web technologies (RDF, OWL, SKOS, SPARQL, etc.) provides an environment where application can query that data, draw inferences using vocabularies, etc. However, to make the Web of Data a reality, it is important to have the huge amount of data on the Web available in a standard format, reachable and manageable by Semantic Web tools. Furthermore, not only does the Semantic Web need access to data, but relationships among data should be made available, too, to create a Web of Data (as opposed to a sheer collection of datasets). This collection of interrelated datasets on the Web can also be referred to as Linked Data. To achieve and create Linked Data, technologies should be available for a common format (RDF), to make either conversion or on-the-fly access to existing databases (relational, XML, HTML, etc). It is also important to be able to setup query endpoints to access that data more conveniently. W3C provides a palette of technologies (RDF, GRDDL, POWDER, RDFa, the upcoming R2RML, RIF, SPARQL) to get access to the data.
Web Ontology Language
The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects. Ontologies resemble class hierarchies in object-oriented programming but there are several critical differences. Class hierarchies are meant to represent structures used in source code that evolve fairly slowly (typically monthly revisions) where as ontologies are meant to represent information on the Internet and are expected to be evolving almost constantly. Similarly, ontologies are typically far more flexible as they are meant to represent information on the Internet coming from all sorts of heterogeneous data sources. Class hierarchies on the other hand are meant to be fairly static and rely on far less diverse and more structured sources of data such as corporate databases. The OWL languages are characterized by formal semantics. They are built upon a W3C XML standard for objects called the Resource Description Framework (RDF). OWL and RDF have attracted significant academic, medical and commercial interest. In October 2007, a new W3C working group was started to extend OWL with several new features as proposed in the OWL 1.1 member submission. W3C announced the new version of OWL on 27 October 2009. This new version, called OWL 2, soon found its way into semantic editors such as Protégé and semantic reasoners such as Pellet, RacerPro, FaCT++ and HermiT. The OWL family contains many species, serializations, syntaxes and specifications with similar names. OWL and OWL2 are used to refer to the 2004 and 2009 specifications, respectively. Full species names will be used, including specification version (for example, OWL2 EL). When referring more generally, OWL Family will be used.
Web Scraping Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox.
WebSeg In this paper, we improve semantic segmentation by automatically learning from Flickr images associated with a particular keyword, without relying on any explicit user annotations, thus substantially alleviating the dependence on accurate annotations when compared to previous weakly supervised methods. To solve such a challenging problem, we leverage several low-level cues (such as saliency, edges, etc.) to help generate a proxy ground truth. Due to the diversity of web-crawled images, we anticipate a large amount of ‘label noise’ in which other objects might be present. We design an online noise filtering scheme which is able to deal with this label noise, especially in cluttered images. We use this filtering strategy as an auxiliary module to help assist the segmentation network in learning cleaner proxy annotations. Extensive experiments on the popular PASCAL VOC 2012 semantic segmentation benchmark show surprising good results in both our WebSeg (mIoU = 57.0%) and weakly supervised (mIoU = 63.3%) settings.
WeCURE Missing data recovery is an important and yet challenging problem in imaging and data science. Successful models often adopt certain carefully chosen regularization. Recently, the low dimension manifold model (LDMM) was introduced by S.Osher et al. and shown effective in image inpainting. They observed that enforcing low dimensionality on image patch manifold serves as a good image regularizer. In this paper, we observe that having only the low dimension manifold regularization is not enough sometimes, and we need smoothness as well. For that, we introduce a new regularization by combining the low dimension manifold regularization with a higher order Curvature Regularization, and we call this new regularization CURE for short. The key step of solving CURE is to solve a biharmonic equation on a manifold. We further introduce a weighted version of CURE, called WeCURE, in a similar manner as the weighted nonlocal Laplacian (WNLL) method. Numerical experiments for image inpainting and semi-supervised learning show that the proposed CURE and WeCURE significantly outperform LDMM and WNLL respectively.
Weibull Distribution In probability theory and statistics, the Weibull distribution /ˈveɪbʊl/ is a continuous probability distribution. It is named after Waloddi Weibull, who described it in detail in 1951, although it was first identified by Fréchet (1927) and first applied by Rosin & Rammler (1933) to describe a particle size distribution.
Weibull Hybrid Autoencoding Inference
To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes. The generative network of WHAI has a hierarchy of gamma distributions, while the inference network of WHAI is a Weibull upward-downward variational autoencoder, which integrates a deterministic-upward deep neural network, and a stochastic-downward deep generative model based on a hierarchy of Weibull distributions. The Weibull distribution can be used to well approximate a gamma distribution with an analytic Kullback-Leibler divergence, and has a simple reparameterization via the uniform noise, which help efficiently compute the gradients of the evidence lower bound with respect to the parameters of the inference network. The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora.
Weibull Time To Event Recurrent Neural Network
In this thesis we propose a new model for predicting time to events: the Weibull Time To Event RNN. This is a simple framework for time-series prediction of the time to the next event applicable when we have any or all of the problems of continuous or discrete time, right censoring, recurrent events, temporal patterns, time varying covariates or time series of varying lengths. All these problems are frequently encountered in customer churn, remaining useful life, failure, spike-train and event prediction. The proposed model estimates the distribution of time to the next event as having a discrete or continuous Weibull distribution with parameters being the output of a recurrent neural network. The model is trained using a special objective function (log-likelihood-loss for censored data) commonly used in survival analysis. The Weibull distribution is simple enough to avoid sparsity and can easily be regularized to avoid overfitting but is still expressive enough to encode concepts like increasing, stationary or decreasing risk and can converge to a point-estimate if allowed. The predicted Weibull-parameters can be used to predict expected value and quantiles of the time to the next event. It also leads to a natural 2d-embedding of future risk which can be used for monitoring and exploratory analysis. We describe the WTTE-RNN using a general framework for censored data which can easily be extended with other distributions and adapted for multivariate prediction. We show that the common Proportional Hazards model and the Weibull Accelerated Failure time model are special cases of the WTTE-RNN. The proposed model is evaluated on simulated data with varying degrees of censoring and temporal resolution. We compared it to binary fixed window forecast models and naive ways of handling censored data. The model outperforms naive methods and is found to have many advantages and comparable performance to binary fixed-window RNNs without the need to specify window size and the ability to train on more data. Application to the CMAPSS-dataset for PHM-run-to-failure of simulated Jet-Engines gives promising results.
Weight of Evidence
The Weight of Evidence or WoE value is a widely used measure of the ‘strength’ of a grouping for separating good and bad risk (default). It is computed from the basic odds ratio: (Distribution of Good Credit Outcomes) / (Distribution of Bad Credit Outcomes). Or the ratios of Distr Goods / Distr Bads for short, where Distr refers to the proportion of Goods or Bads in the respective group, relative to the column totals, i.e., expressed as relative proportions of the total number of Goods and Bads.
Why Use Weight of Evidence?
Weight Standardization
In this paper, we propose Weight Standardization (WS) to accelerate deep network training. WS is targeted at the micro-batch training setting where each GPU typically has only 1-2 images for training. The micro-batch training setting is hard because small batch sizes are not enough for training networks with Batch Normalization (BN), while other normalization methods that do not rely on batch knowledge still have difficulty matching the performances of BN in large-batch training. Our WS ends this problem because when used with Group Normalization and trained with 1 image/GPU, WS is able to match or outperform the performances of BN trained with large batch sizes with only 2 more lines of code. In micro-batch training, WS significantly outperforms other normalization methods. WS achieves these superior results by standardizing the weights in the convolutional layers, which we show is able to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients. The effectiveness of WS is verified on many tasks, including image classification, object detection, instance segmentation, video recognition, semantic segmentation, and point cloud recognition. The code is available here: https://…/WeightStandardization.
Weighted Balanced Distribution Adaptation
“Balanced Distribution Adaptation”
Weighted Bootstrap Markov Chain Monte Carlo Many data sets, especially from surveys, are made available to users with weights. Where the derivation of such weights is known, this information can often be incorporated in the user’s substantive model (model of interest). When the derivation is unknown, the established procedure is to carry out a weighted analysis. However, with non-trivial proportions of missing data this is inefficient and may be biased when data are not missing at random. Bayesian approaches provide a natural approach for the imputation of missing data, but it is unclear how to handle the weights. We propose a weighted bootstrap Markov chain Monte Carlo algorithm for estimation and inference. A simulation study shows that it has good inferential properties. We illustrate its utility with an analysis of data from the Millennium Cohort Study.
Weighted Effect Coding Weighted effect coding refers to a specific coding matrix to include factor variables in generalised linear regression models. With weighted effect coding, the effect for each category represents the deviation of that category from the weighted mean (which corresponds to the sample mean). This technique has particularly attractive properties when analysing observational data, that commonly are unbalanced. The wec package is introduced, that provides functions to apply weighted effect coding to factor variables, and to interactions between (a.) a factor variable and a continuous variable and between (b.) two factor variables.
Weighted Entropy The concept of weighted entropy takes into account values of different outcomes, i.e., makes entropy context-dependent, through the weight function.
Weighted Finite Automata
Approximating probabilistic models as weighted finite automata
Weighted Hausdorff Distance Recent advances in Convolutional Neural Networks (CNN) have achieved remarkable results in localizing objects in images. In these networks, the training procedure usually requires providing bounding boxes or the maximum number of expected objects. In this paper, we address the task of estimating object locations without annotated bounding boxes, which are typically hand-drawn and time consuming to label. We propose a loss function that can be used in any Fully Convolutional Network (FCN) to estimate object locations. This loss function is a modification of the Average Hausdorff Distance between two unordered sets of points. The proposed method does not require one to ‘guess’ the maximum number of objects in the image, and has no notion of bounding boxes, region proposals, or sliding windows. We evaluate our method with three datasets designed to locate people’s heads, pupil centers and plant centers. We report an average precision and recall of 94% for the three datasets, and an average location error of 6 pixels in 256×256 images.
Weighted Inverse Laplacian
Community detection was a hot topic on network analysis, where the main aim is to perform unsupervised learning or clustering in networks. Recently, semi-supervised learning has received increasing attention among researchers. In this paper, we propose a new algorithm, called weighted inverse Laplacian (WIL), for predicting labels in partially labeled networks. The idea comes from the first hitting time in random walk, and it also has nice explanations both in information propagation and the regularization framework. We propose a partially labeled degree-corrected block model (pDCBM) to describe the generation of partially labeled networks. We show that WIL ensures the misclassification rate is of order $O(\frac{1}{d})$ for the pDCBM with average degree $d=\Omega(\log n),$ and that it can handle situations with greater unbalanced than traditional Laplacian methods. WIL outperforms other state-of-the-art methods in most of our simulations and real datasets, especially in unbalanced networks and heterogeneous networks.
Weighted Label Smoothing Regularization
Conventional approaches used supervised learning to estimate off-line writer identifications. In this study, we improved the off-line writer identifications by semi-supervised feature learning pipeline, which trained the extra unlabeled data and the original labeled data simultaneously. In specific, we proposed a weighted label smoothing regularization (WLSR) method, which assigned the weighted uniform label distribution to the extra unlabeled data. We regularized the convolutional neural network (CNN) baseline, which allows learning more discriminative features to represent the properties of different writing styles. Based on experiments on ICDAR2013, CVL and IAM benchmark datasets, our results showed that semi-supervised feature learning improved the baseline measurement and achieved better performance compared with existing writer identifications approaches.
Weighted Majority Algorithm
In machine learning, Weighted Majority Algorithm (WMA) is a meta-learning algorithm used to construct a compound algorithm from a pool of prediction algorithms, which could be any type of learning algorithms, classifiers, or even real human experts. The algorithm assumes that we have no prior knowledge about the accuracy of the algorithms in the pool, but there are sufficient reasons to believe that one or more will perform well. There are many variations of the Weighted Majority Algorithm to handle different situations, like shifting targets, infinite pools, or randomized predictions. The core mechanism remain similar, with the final performances of the compound algorithm bounded by a function of the performance of the specialist (best performing algorithm) in the pool.
Weighted Mean Curvature In image processing tasks, spatial priors are essential for robust computations, regularization, algorithmic design and Bayesian inference. In this paper, we introduce weighted mean curvature (WMC) as a novel image prior and present an efficient computation scheme for its discretization in practical image processing applications. We first demonstrate the favorable properties of WMC, such as sampling invariance, scale invariance, and contrast invariance with Gaussian noise model; and we show the relation of WMC to area regularization. We further propose an efficient computation scheme for discretized WMC, which is demonstrated herein to process over 33.2 giga-pixels/second on GPU. This scheme yields itself to a convolutional neural network representation. Finally, WMC is evaluated on synthetic and real images, showing its superiority quantitatively to total-variation and mean curvature.
Weighted Multisource Tradaboost In this paper we propose an improved method for transfer learning that takes into account the balance between target and source data. This method builds on the state-of-the-art Multisource Tradaboost, but weighs the importance of each datapoint taking into account the amount of target and source data available. A comparative study is then presented exposing the performance of four transfer learning methods as well as the proposed Weighted Multisource Tradaboost. The experimental results show that the proposed method is able to outperform the base method as the number of target samples increase. These results are promising in the sense that source-target ratio weighing may be a path to improve current methods of transfer learning. However, against the asymptotic conjecture, all transfer learning methods tested in this work get outperformed by a no-transfer SVM for large number on target samples.
Weighted Network
In recent years, there has been increasing demand for automatic architecture search in deep learning. Numerous approaches have been proposed and led to state-of-the-art results in various applications, including image classification and language modeling. In this paper, we propose a novel way of architecture search by means of weighted networks (WeNet), which consist of a number of networks, with each assigned a weight. These weights are updated with back-propagation to reflect the importance of different networks. Such weighted networks bear similarity to mixture of experts. We conduct experiments on Penn Treebank and WikiText-2. We show that the proposed WeNet can find recurrent architectures which result in state-of-the-art performance.
Weighted Nonlinear Regression Nonlinear Least Squares
Weighted Object k-Means Weighted object version of k-means algorithm, robust against outlier data.
Weighted Ontology Approximation Heuristic
The present paper presents the Weighted Ontology Approximation Heuristic (WOAH), a novel zero-shot approach to ontology estimation for conversational agents development environments. This methodology extracts verbs and nouns separately from data by distilling the dependencies obtained and applying similarity and sparsity metrics to generate an ontology estimation configurable in terms of the level of generalization.
Weighted Ordered Weighted Aggregation
From a formal point of view, the WOWA operator is a particular case of Choquet integral (using a particular type of measure: a distorted probability).
Weighted Orthogonal Components Regression Analysis
In the multiple linear regression setting, we propose a general framework, termed weighted orthogonal components regression (WOCR), which encompasses many known methods as special cases, including ridge regression and principal components regression. WOCR makes use of the monotonicity inherent in orthogonal components to parameterize the weight function. The formulation allows for efficient determination of tuning parameters and hence is computationally advantageous. Moreover, WOCR offers insights for deriving new better variants. Specifically, we advocate weighting components based on their correlations with the response, which leads to enhanced predictive performance. Both simulated studies and real data examples are provided to assess and illustrate the advantages of the proposed methods.
Weighted Parallel SGD
Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD, often require all nodes to have the same performance or to consume equal quantities of data. However, these requirements are difficult to satisfy when the parallel SGD algorithms run in a heterogeneous computing environment; low-performance nodes will exert a negative influence on the final result. In this paper, we propose an algorithm called weighted parallel SGD (WP-SGD). WP-SGD combines weighted model parameters from different nodes in the system to produce the final output. WP-SGD makes use of the reduction in standard deviation to compensate for the loss from the inconsistency in performance of nodes in the cluster, which means that WP-SGD does not require that all nodes consume equal quantities of data. We also analyze the theoretical feasibility of running two other parallel SGD algorithms combined with WP-SGD in a heterogeneous environment. The experimental results show that WP-SGD significantly outperforms the traditional parallel SGD algorithms on distributed training systems with an unbalanced workload.
Weighted Quantile Sum
Weighted Random Survival Forest A weighted random survival forest is presented in the paper. It can be regarded as a modification of the random forest improving its performance. The main idea underlying the proposed model is to replace the standard procedure of averaging used for estimation of the random survival forest hazard function by weighted avaraging where the weights are assigned to every tree and can be veiwed as training paremeters which are computed in an optimal way by solving a standard quadratic optimization problem maximizing Harrell’s C-index. Numerical examples with real data illustrate the outperformance of the proposed model in comparison with the original random survival forest.
Weighted Score Table
Weighted Sigmoid Gate Unit
An activation function has crucial role in a deep neural network. A simple rectified linear unit (ReLU) are widely used for the activation function. In this paper, a weighted sigmoid gate unit (WiG) is proposed as the activation function. The proposed WiG consists of a multiplication of inputs and the weighted sigmoid gate. It is shown that the WiG includes the ReLU and same activation functions as a special case. Many activation functions have been proposed to overcome the performance of the ReLU. In the literature, the performance is mainly evaluated with an object recognition task. The proposed WiG is evaluated with the object recognition task and the image restoration task. Then, the expeirmental comparisons demonstrate the proposed WiG overcomes the existing activation functions including the ReLU.
Weighted Source-to-Distortion Ratio
Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction. This is due to the difficulty of estimating the phase of clean speech. To improve speech enhancement performance, we tackle the phase estimation problem in three ways. First, we propose Deep Complex U-Net, an advanced U-Net structured model incorporating well-defined complex-valued building blocks to deal with complex-valued spectrograms. Second, we propose a polar coordinate-wise complex-valued masking method to reflect the distribution of complex ideal ratio masks. Third, we define a novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure. Our model was evaluated on a mixture of the Voice Bank corpus and DEMAND database, which has been widely used by many deep learning models for speech enhancement. Ablation experiments were conducted on the mixed dataset showing that all three proposed approaches are empirically valid. Experimental results show that the proposed method achieves state-of-the-art performance in all metrics, outperforming previous approaches by a large margin.
Weighted Topological Overlaps
Weighted-SVD The Matrix Factorization models, sometimes called the latent factor models, are a family of methods in the recommender system research area to (1) generate the latent factors for the users and the items and (2) predict users’ ratings on items based on their latent factors. However, current Matrix Factorization models presume that all the latent factors are equally weighted, which may not always be a reasonable assumption in practice. In this paper, we propose a new model, called Weighted-SVD, to integrate the linear regression model with the SVD model such that each latent factor accompanies with a corresponding weight parameter. This mechanism allows the latent factors have different weights to influence the final ratings. The complexity of the Weighted-SVD model is slightly larger than the SVD model but much smaller than the SVD++ model. We compared the Weighted-SVD model with several latent factor models on five public datasets based on the Root-Mean-Squared-Errors (RMSEs). The results show that the Weighted-SVD model outperforms the baseline methods in all the experimental datasets under almost all settings.
Weight-Median Sketch We introduce a new sub-linear space data structure—the Weight-Median Sketch—that captures the most heavily weighted features in linear classifiers trained over data streams. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. In contrast with related sketches that capture the most commonly occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis of this approach that establishes recovery guarantees in the online learning setting, and demonstrate substantial empirical improvements in accuracy-memory trade-offs over alternatives, including count-based sketches and feature hashing.
Weka Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
“Waikato Environment for Knowledge Analysis”
Whale Optimization Algorithm
Whale Optimization Algorithm (WOA) is a recently proposed (2016) optimization algorithm mimicking the hunting mechanism of humpback whales in nature. It is worth mentioning here that bubble-net feeding is a unique behavior that can only be observed in humpback whales. In WOA the spiral bubble-net feeding maneuver is mathematically modeled in order to perform optimization.
A Systematic and Meta-analysis Survey of Whale Optimization Algorithm
What-If Tool What If… you could inspect a machine learning model, with no coding required? Building effective machine learning systems means asking a lot of questions. It’s not enough to train a model and walk away. Instead, good practitioners act as detectives, probing to understand their model better. But answering these kinds of questions isn’t easy. Probing ‘what if’ scenarios often means writing custom, one-off code to analyze a specific model. Not only is this process inefficient, it makes it hard for non-programmers to participate in the process of shaping and improving machine learning models. For us, making it easier for a broad set of people to examine, evaluate, and debug machine learning systems is a key concern. That’s why we built the What-If Tool. Built into the open-source TensorBoard web application – a standard part of the TensorFlow platform – the tool allows users to analyze an machine learning model without the need for writing any further code. Given pointers to a TensorFlow model and a dataset, the What-If Tool offers an interactive visual interface for exploring model results.
WHInter Learning sparse linear models with two-way interactions is desirable in many application domains such as genomics. l1-regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate two-way interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features. Here we present WHInter, a working set algorithm to solve large l1-regularised problems with two-way interactions for binary design matrices. The novelty of WHInter stems from a new bound to efficiently identify working sets while avoiding to scan all features, and on fast computations inspired from solutions to the maximum inner product search problem. We apply WHInter to simulated and real genetic data and show that it is more scalable and two orders of magnitude faster than the state of the art.
White Noise In signal processing, white noise is a random signal with a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, including physics, acoustic engineering, telecommunications, statistical forecasting, and many more. White noise refers to a statistical model for signals and signal sources, rather than to any specific signal. A ‘white noise’ image. In discrete time, white noise is a discrete signal whose samples are regarded as a sequence of serially uncorrelated random variables with zero mean and finite variance; a single realization of white noise is a random shock. Depending on the context, one may also require that the samples be independent and have the same probability distribution (in other words i.i.d is a simplest representative of the white noise). In particular, if each sample has a normal distribution with zero mean, the signal is said to be Gaussian white noise. The samples of a white noise signal may be sequential in time, or arranged along one or more spatial dimensions. In digital image processing, the pixels of a white noise image are typically arranged in a rectangular grid, and are assumed to be independent random variables with uniform probability distribution over some interval. The concept can be defined also for signals spread over more complicated domains, such as a sphere or a torus. Some ‘white noise’ sound. An infinite-bandwidth white noise signal is a purely theoretical construction. The bandwidth of white noise is limited in practice by the mechanism of noise generation, by the transmission medium and by finite observation capabilities. Thus, a random signal is considered ‘white noise’ if it is observed to have a flat spectrum over the range of frequencies that is relevant to the context. For an audio signal, for example, the relevant range is the band of audible sound frequencies, between 20 to 20,000 Hz. Such a signal is heard as a hissing sound, resembling the /sh/ sound in ‘ash’. In music and acoustics, the term ‘white noise’ may be used for any signal that has a similar hissing sound. White noise draws its name from white light, although light that appears white generally does not have a flat spectral power density over the visible band. The term white noise is sometimes used in the context of phylogenetically based statistical methods to refer to a lack of phylogenetic pattern in comparative data. It is sometimes used in non technical contexts, in the metaphoric sense of ‘random talk without meaningful contents’.
White Noise Test
Whitening Transformation A whitening transformation is a decorrelation transformation that transforms a set of random variables having a known covariance matrix into a set of new random variables whose covariance is the identity matrix (meaning that they are uncorrelated and all have variance 1). The transformation is called “whitening” because it changes the input vector into a white noise vector. It differs from a general decorrelation transformation in that the latter only makes the covariances equal to zero, so that the correlation matrix may be any diagonal matrix. The inverse coloring transformation transforms a vector of uncorrelated variables (a white random vector) into a vector with a specified covariance matrix.
Whittemore This paper introduces Whittemore, a language for causal programming. Causal programming is based on the theory of structural causal models and consists of two primary operations: identification, which finds formulas that compute causal queries, and estimation, which applies formulas to transform probability distributions to other probability distribution. Causal programming provides abstractions to declare models, queries, and distributions with syntax similar to standard mathematical notation, and conducts rigorous causal inference, without requiring detailed knowledge of the underlying algorithms. Examples of causal inference with real data are provided, along with discussion of the implementation and possibilities for future extension.
Widely Applicable Bayesian Information Criterion
A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/logn, where n is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical model. Since WBIC can be numerically calculated without any information about a true distribution, it is a generalized version of BIC onto singular statistical models.
“Watanabe-Akaike Information Criteria”
Widely Applicable Information Criterion
“Watanabe-Akaike Information Criteria”
Width of the Language We consider the problem of quantifying information flow in interactive systems, modelled as finite-state transducers in the style of Goguen and Meseguer. Our main result is that if the system is deterministic then the information flow is either logarithmic or linear, and there is a polynomial-time algorithm to distinguish the two cases and compute the rate of logarithmic flow. To achieve this we first extend the theory of information leakage through channels to the case of interactive systems, and establish a number of results which greatly simplify computation. We then show that for deterministic systems the information flow corresponds to the growth rate of antichains inside a certain regular language, a property called the width of the language. In a companion work we have shown that there is a dichotomy between polynomial and exponential antichain growth, and a polynomial time algorithm to distinguish the two cases and to compute the order of polynomial growth. We observe that these two cases correspond to logarithmic and linear information flow respectively. Finally, we formulate several attractive open problems, covering the cases of probabilistic systems, systems with more than two users and nondeterministic systems where the nondeterminism is assumed to be innocent rather than demonic.
Wiener Polarity Index The Wiener polarity index Wp(G) of a graph G is the number of unordered pairs of vertices {u,v} in G such that the distance between u and v is equal to 3.
Wiener Process In mathematics, the Wiener process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown. It is one of the best known Lévy processes (càdlàg stochastic processes with stationary independent increments) and occurs frequently in pure and applied mathematics, economics, quantitative finance, and physics. The Wiener process plays an important role both in pure and applied mathematics. In pure mathematics, the Wiener process gave rise to the study of continuous time martingales. It is a key process in terms of which more complicated stochastic processes can be described. As such, it plays a vital role in stochastic calculus, diffusion processes and even potential theory. It is the driving process of Schramm-Loewner evolution. In applied mathematics, the Wiener process is used to represent the integral of a Gaussian white noise process, and so is useful as a model of noise in electronics engineering, instrument errors in filtering theory and unknown forces in control theory. The Wiener process has applications throughout the mathematical sciences. In physics it is used to study Brownian motion, the diffusion of minute particles suspended in fluid, and other types of diffusion via the Fokker-Planck and Langevin equations. It also forms the basis for the rigorous path integral formulation of quantum mechanics (by the Feynman-Kac formula, a solution to the Schrödinger equation can be represented in terms of the Wiener process) and the study of eternal inflation in physical cosmology. It is also prominent in the mathematical theory of finance, in particular the Black-Scholes option pricing model.
Wiener-Filter In signal processing, the Wiener Filter (Wiener-Kolmogorov Filter) is a filter used to produce an estimate of a desired or target random process by linear time-invariant filtering of an observed noisy process, assuming known stationary signal and noise spectra, and additive noise. The Wiener filter minimizes the mean square error between the estimated random process and the desired process.
WikiAtomicEdits We release a corpus of 43 million atomic edits across 8 languages. These edits are mined from Wikipedia edit history and consist of instances in which a human editor has inserted a single contiguous phrase into, or deleted a single contiguous phrase from, an existing sentence. We use the collected data to show that the language generated during editing differs from the language that we observe in standard corpora, and that models trained on edits encode different aspects of semantics and discourse than models trained on raw, unstructured text. We release the full corpus as a resource to aid ongoing research in semantics, discourse, and representation learning.
Wikibook-Bot A Wikipedia book (known as Wikibook) is a collection of Wikipedia articles on a particular theme that is organized as a book. We propose Wikibook-Bot, a machine-learning based technique for automatically generating high quality Wikibooks based on a concept provided by the user. In order to create the Wikibook we apply machine learning algorithms to the different steps of the proposed technique. Firs, we need to decide whether an article belongs to a specific Wikibook – a classification task. Then, we need to divide the chosen articles into chapters – a clustering task – and finally, we deal with the ordering task which includes two subtasks: order articles within each chapter and order the chapters themselves. We propose a set of structural, text-based and unique Wikipedia features, and we show that by using these features, a machine learning classifier can successfully address the above challenges. The predictive performance of the proposed method is evaluated by comparing the auto-generated books to existing 407 Wikibooks which were manually generated by humans. For all the tasks we were able to obtain high and statistically significant results when comparing the Wikibook-bot books to books that were manually generated by Wikipedia contributors
WikiConv We present a corpus that encompasses the complete history of conversations between contributors to Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversations—including not only comments and replies, but also their modifications, deletions and restorations—this data offers an unprecedented view of online conversation. This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration. We illustrate the corpus’ potential with two case studies that highlight new perspectives on earlier work. First, we explore how a person’s conversational behavior depends on how they relate to the discussion’s venue. Second, we show that community moderation of toxic behavior happens at a higher rate than previously estimated. Finally the reconstruction framework is designed to be language agnostic, and we show that it can extract high quality conversational data in both Chinese and English.
WikiLinkGraphs Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the $9$ largest language editions. The dataset contains yearly snapshots of the network and spans $17$ years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset.
Wikipedia WordNet Based QE Technique
Query expansion (QE) is a well known technique to enhance the effectiveness of information retrieval (IR). QE reformulates the initial query by adding similar terms that helps in retrieving more relevant results. Several approaches have been proposed with remarkable outcome, but they are not evenly favorable for all types of queries. One of the main reasons for this is the use of the same data source while expanding both the individual and the phrase query terms. As a result, the holistic relationship among the query terms is not well captured. To address this issue, we have selected separate data sources for individual and phrase terms. Specifically, we have used WordNet for expanding individual terms and Wikipedia for expanding phrase terms. We have also proposed novel schemes for weighting expanded terms: inlink score (for terms extracted from Wikipedia) and a tfidf based scheme (for terms extracted from WordNet). In the proposed Wikipedia WordNet based QE technique (WWQE), we weigh the expansion terms twice: first, they are scored by the weighting scheme individually, and then, the weighting scheme scores the selected expansion terms in relation to the entire query using correlation score. The experimental results show that the proposed approach successfully combines Wikipedia and WordNet as demonstrated through a better performance on standard evaluation metrics on FIRE dataset. The proposed WWQE approach is also suitable with other standard weighting models for improving the effectiveness of IR.
Wikipedia2Vec We present Wikipedia2Vec, an open source tool for learning embeddings of words and entities from Wikipedia. This tool enables users to easily obtain high-quality embeddings of words and entities from a Wikipedia dump with a single command. The learned embeddings can be used as features in downstream natural language processing (NLP) models. The tool can be installed via PyPI. The source code, documentation, and pretrained embeddings for 12 major languages can be obtained at http://wikipedia2vec.github.io.
WikiRank Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the document. Then we transform the keyphrase extraction problem into an optimization problem on the graph. Finally, we get the optimal keyphrase set to be the output. Our method obtains improvements over other state-of-art models by more than 2% in F1-score.
Wikistat 2.0 Big data, data science, deep learning, artificial intelligence are the key words of intense hype related with a job market in full evolution, that impose to adapt the contents of our university professional trainings. Which artificial intelligence is mostly concerned by the job offers? Which methodologies and technologies should be favored in the training pprograms? Which objectives, tools and educational resources do we needed to put in place to meet these pressing needs? We answer these questions in describing the contents and operational ressources in the Data Science orientation of the speciality Applied Mathematics at INSA Toulouse. We focus on basic mathematics training (Optimization, Probability, Statistics), associated with the practical implementation of the most performing statistical learning algorithms, with the most appropriate technologies and on real examples. Considering the huge volatility of the technologies, it is imperative to train students in seft-training, this will be their technological watch tool when they will be in professional activity. This explains the structuring of the educational site https://…/wikistat into a set of tutorials. Finally, to motivate the thorough practice of these tutorials, a serious game is organized each year in the form of a prediction contest between students of Master degrees in Applied Mathematics for IA.
Wild Scale-Enhanced Bootstrap
Wildly-Unsupervised Domain Adaptation
Unsupervised domain adaptation (UDA) trains with clean labeled data in source domain and unlabeled data in target domain to classify target-domain data. However, in real-world scenarios, it is hard to acquire fully-clean labeled data in source domain due to the expensive labeling cost. This brings us a new but practical adaptation called wildly-unsupervised domain adaptation (WUDA), which aims to transfer knowledge from noisy labeled data in source domain to unlabeled data in target domain. To tackle the WUDA, we present a robust one-step approach called Butterfly, which trains four networks. Specifically, two networks are jointly trained on noisy labeled data in source domain and pseudo-labeled data in target domain (i.e., data in mixture domain). Meanwhile, the other two networks are trained on pseudo-labeled data in target domain. By using dual-checking principle, Butterfly can obtain high-quality target-specific representations. We conduct experiments to demonstrate that Butterfly significantly outperforms other baselines on simulated and real-world WUDA tasks in most cases.
Window-based Sentence Boundary Evaluation
Sentence Boundary Detection (SBD) has been a major research topic since Automatic Speech Recognition transcripts have been used for further Natural Language Processing tasks like Part of Speech Tagging, Question Answering or Automatic Summarization. But what about evaluation? Do standard evaluation metrics like precision, recall, F-score or classification error; and more important, evaluating an automatic system against a unique reference is enough to conclude how well a SBD system is performing given the final application of the transcript? In this paper we propose Window-based Sentence Boundary Evaluation (WiSeBE), a semi-supervised metric for evaluating Sentence Boundary Detection systems based on multi-reference (dis)agreement. We evaluate and compare the performance of different SBD systems over a set of Youtube transcripts using WiSeBE and standard metrics. This double evaluation gives an understanding of how WiSeBE is a more reliable metric for the SBD task.
Window-Bounded co-Occurrence This paper focuses on a traditional relation extraction task in the context of limited annotated data and a narrow knowledge domain. We explore this task with a clinical corpus consisting of 200 breast cancer follow-up treatment letters in which 16 distinct types of relations are annotated. We experiment with an approach to extracting typed relations called window-bounded co-occurrence (WBC), which uses an adjustable context window around entity mentions of a relevant type, and compare its performance with a more typical intra-sentential co-occurrence baseline. We further introduce a new bag-of-concepts (BoC) approach to feature engineering based on the state-of-the-art word embeddings and word synonyms. We demonstrate the competitiveness of BoC by comparing with methods of higher complexity, and explore its effectiveness on this small dataset.
Windowed Fourier Filtering Interferometric phase (InPhase) imaging is an important part of many present-day coherent imaging technologies. Often in such imaging techniques, the acquired images, known as interferograms, suffer from two major degradations: 1) phase wrapping caused by the fact that the sensing mechanism can only measure sinusoidal $2\pi$-periodic functions of the actual phase, and 2) noise introduced by the acquisition process or the system. This work focusses on InPhase denoising which is a fundamental restoration step to many posterior applications of InPhase, namely to phase unwrapping. The presence of sharp fringes that arises from phase wrapping makes InPhase denoising a hard-inverse problem. Motivated by the fact that the InPhase images are often locally sparse in Fourier domain, we propose a multi-resolution windowed Fourier filtering (WFF) analysis that fuses WFF estimates with different resolutions, thus overcoming the WFF fixed resolution limitation. The proposed fusion relies on an unbiased estimate of the mean square error derived using the Stein’s lemma adapted to complex-valued signals. This estimate, known as SURE, is minimized using an optimization framework to obtain the fusion weights. Strong experimental evidence, using synthetic and real (InSAR & MRI) data, that the developed algorithm, termed as SURE-fuse WFF, outperforms the best hand-tuned fixed resolution WFF as well as other state-of-the-art InPhase denoising algorithms, is provided.
Wire Data Wire data is the information that passes over computer and telecommunication networks defining communications between client and server devices. It is the result of decoding wire and transport protocols containing the bi-directional data payload. More precisely, wire data is the information that is communicated in each layer of the OSI model (Layer 1 not being included because those protocols are used to establish connections and do not communicate information).
Wisdom of Crowds
The wisdom of the crowd is the collective opinion of a group of individuals rather than that of a single expert. A large group’s aggregated answers to questions involving quantity estimation, general world knowledge, and spatial reasoning has generally been found to be as good as, and often better than, the answer given by any of the individuals within the group. An explanation for this phenomenon is that there is idiosyncratic noise associated with each individual judgment, and taking the average over a large number of responses will go some way toward canceling the effect of this noise.[1] This process, while not new to the Information Age, has been pushed into the mainstream spotlight by social information sites such as Wikipedia, Yahoo! Answers, Quora, and other web resources that rely on human opinion.[2] Trial by jury can be understood as wisdom of the crowd, especially when compared to the alternative, trial by a judge, the single expert. In politics, sometimes sortition is held as an example of what wisdom of the crowd would look like. Decision-making would happen by a diverse group instead of by a fairly homogenous political group or party. Research within cognitive science has sought to model the relationship between wisdom of the crowd effects and individual cognition.
WoCE: a framework for clustering ensemble by exploiting the wisdom of Crowds theory
Wishart Distribution In statistics, the Wishart distribution is a generalization to multiple dimensions of the chi-squared distribution, or, in the case of non-integer degrees of freedom, of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928. It is a family of probability distributions defined over symmetric, nonnegative-definite matrix-valued random variables (‘random matrices’). These distributions are of great importance in the estimation of covariance matrices in multivariate statistics. In Bayesian statistics, the Wishart distribution is the conjugate prior of the inverse covariance-matrix of a multivariate-normal random-vector.
Wishart Matrix “Wishart Distribution”
Witness-Counting Problem Fast Witness Counting
W-Net Crowd management is of paramount importance when it comes to preventing stampedes and saving lives, especially in a country like China and India where the combined population is a third of the global population. Millions of people convene annually all around the nation to celebrate a myriad of events and crowd count estimation is the linchpin of the crowd management system that could prevent stampedes and save lives. We present a network for crowd counting which reports state of the art results on crowd counting benchmarks. Our contributions are, first, a U-Net inspired model which affords us to report state of the art results. Second, we propose an independent decoding Reinforcement branch which helps the network converge much earlier and also enables the network to estimate density maps with high Structural Similarity Index (SSIM). Third, we discuss the drawbacks of the contemporary architectures and empirically show that even though our architecture achieves state of the art results, the merit may be due to the encoder-decoder pipeline instead. Finally, we report the error analysis which shows that the contemporary line of work is at saturation and leaves certain prominent problems unsolved.
Wolfson Polarization Index affluenceIndex
Word Embedding Association Test
Universal Sentence Encoder
Word Embedding Attention Network
Most recent approaches use the sequence-to-sequence model for paraphrase generation. The existing sequence-to-sequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words. Therefore, the generated sentences are often grammatically correct but semantically improper. In this work, we introduce a novel model based on the encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our proposed model generates the words by querying distributed word representations (i.e. neural word embeddings), hoping to capturing the meaning of the according words. Following previous work, we evaluate our model on two paraphrase-oriented tasks, namely text simplification and short text abstractive summarization. Experimental results show that our model outperforms the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a Chinese summarization dataset. Moreover, our model achieves state-of-the-art performances on these three benchmark datasets.
Word Encoded Sequence Transducer
Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights. This is a bottle-neck in memory constraint on-device training applications like federated learning and on-device inference applications like automatic speech recognition (ASR). One way of compressing the embedding and softmax layers is to substitute larger units such as words with smaller sub-units such as characters. However, often the sub-unit models perform poorly compared to the larger unit models. We propose WEST, an algorithm for encoding categorical features and output classes with a sequence of random or domain dependent sub-units and demonstrate that this transduction can lead to significant compression without compromising performance. WEST bridges the gap between larger unit and sub-unit models and can be interpreted as a MaxEnt model over sub-unit features, which can be of independent interest.
Word ExtrAction for time SEries cLassification
Time series (TS) occur in many scientific and commercial applications, ranging from earth surveillance to industry automation to the smart grids. An important type of TS analysis is classification, which can, for instance, improve energy load forecasting in smart grids by detecting the types of electronic devices based on their energy consumption profiles recorded by automatic sensors. Such sensor-driven applications are very often characterized by (a) very long TS and (b) very large TS datasets needing classification. However, current methods to time series classification (TSC) cannot cope with such data volumes at acceptable accuracy; they are either scalable but offer only inferior classification quality, or they achieve state-of-the-art classification quality but cannot scale to large data volumes. In this paper, we present WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is both scalable and accurate. Like other state-of-the-art TSC methods, WEASEL transforms time series into feature vectors, using a sliding-window approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set. On the popular UCR benchmark of 85 TS datasets, WEASEL is more accurate than the best current non-ensemble algorithms at orders-of-magnitude lower classification and training times, and it is almost as accurate as ensemble classifiers, whose computational complexity makes them inapplicable even for mid-size datasets. The outstanding robustness of WEASEL is also confirmed by experiments on two real smart grid datasets, where it out-of-the-box achieves almost the same accuracy as highly tuned, domain-specific methods.
Word Sense Induction
In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word (i.e. meanings). Given that the output of word-sense induction is a set of senses for the target word (sense inventory), this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.
Word Vectors Word vectors (also referred to as distributed representations) are an amazing alternative that sweep away most of the issues of dealing with NLP. They let us ignore the difficult-to-understand grammar & syntax of language while retaining the ability to ask and answer simple questions about a text.
Word2Bits Word vectors require significant amounts of memory and storage, posing issues to resource limited devices like mobile phones and GPUs. We show that high quality quantized word vectors using 1-2 bits per parameter can be learned by introducing a quantization function into Word2Vec. We furthermore show that training with the quantization function acts as a regularizer. We train word vectors on English Wikipedia (2017) and evaluate them on standard word similarity and analogy tasks and on question answering (SQuAD). Our quantized word vectors not only take 8-16x less space than full precision (32 bit) word vectors but also outperform them on word similarity tasks and question answering.
word2vec This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.
DL4J: Word2Vec
Wordcloud A tag cloud (word cloud, or weighted list in visual design) is a visual representation for text data, typically used to depict keyword metadata (tags) on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color. This format is useful for quickly perceiving the most prominent terms and for locating a term alphabetically to determine its relative prominence. When used as website navigation aids, the terms are hyperlinked to items associated with the tag.
WordNet WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and are freely available for download from the WordNet website. Both the lexicographic data (lexicographer files) and the compiler (called grind) for producing the distributed database are available.
Wordswarm WordSwarm generates dynamic word clouds in which the word size changes as the animation moves forward through the corpus. The top words from the preprocessing are colored randomly or from an assigned pallet, sized according to their magnitude at the first date, and then displayed in a pseudo-random location on the screen. The animation progresses into the future by growing or shrinking each word according to its frequency in the corpus at the next date. Clash detection is achieved using a 2D physics engine, which also applies ‘gravitational force’ to each word, bringing the larger words closer to the center of the screen.
Work Stealing Load Balancing Algorithm A methodology for efficient load balancing of computational problems that can be easily decomposed into multiple tasks, but where it is hard to predict the computation cost of each task, and where new tasks are created dynamically during runtime. We present this methodology and its exploitation and feasibility in the context of graphics processors. Work-stealing allows an idle core to acquire tasks from a core that is overloaded, causing the total work to be distributed evenly among cores, while minimizing the communication costs, as tasks are only redistributed when required. This will often lead to higher throughput than using static partitioning.
Work Stealing with latency
Workflow Satisfiability Problem The Workflow Satisfiability Problem (WSP) Asks Whether There Exists an Assignment of Authorized Users to the Steps in a Workflow Specification That Satisfies the Constraints in the Specification. The Problem is NP-Hard in General, but Several Subclasses of the Problem are Known to be Fixed-Parameter Tractable (FPT) When Parameterized by the Number of Steps in the Specification.
Bounded and Approximate Strong Satisfiability in Workflows
Workforce Analytics Workforce analytics is a combination of software and methodology that applies statistical models to worker-related data, allowing enterprise leaders to optimize human resource management (HRM).
Working Memory Network During the last years, there has been a lot of interest in achieving some kind of complex reasoning using deep neural networks. To do that, models like Memory Networks (MemNNs) have combined external memory storages and attention mechanisms. These architectures, however, lack of more complex reasoning mechanisms that could allow, for instance, relational reasoning. Relation Networks (RNs), on the other hand, have shown outstanding results in relational reasoning tasks. Unfortunately, their computational cost grows quadratically with the number of memories, something prohibitive for larger problems. To solve these issues, we introduce the Working Memory Network, a MemNN architecture with a novel working memory storage and reasoning module. Our model retains the relational reasoning abilities of the RN while reducing its computational complexity from quadratic to linear. We tested our model on the text QA dataset bAbI and the visual QA dataset NLVR. In the jointly trained bAbI-10k, we set a new state-of-the-art, achieving a mean error of less than 0.5%. Moreover, a simple ensemble of two of our models solves all 20 tasks in the joint version of the benchmark.
Workload-Aware Auto-Parallelization Framework
Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. Multi-GPU parallelization is a popular option to accelerate demanding computations in DNN training, but most state-of-the-art multi-GPU deep learning frameworks not only require users to have an in-depth understanding of the implementation of the frameworks themselves, but also apply parallelization in a straight-forward way without optimizing GPU utilization. In this work, we propose a workload-aware auto-parallelization framework (WAP) for DNN training, where the work is automatically distributed to multiple GPUs based on the workload characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks (AlexNet and VGG-16), and show competitive training throughput compared with the state-of-the-art frameworks, and also demonstrate that WAP automatically optimizes GPU assignment based on the workload’s compute requirements, thereby improving energy efficiency.
WPU-Net Deep learning has driven great progress in natural and biological image processing. However, in materials science and engineering, there are often some flaws and indistinctions in material microscopic images induced from complex sample preparation, even due to the material itself, hindering the detection of target objects. In this work, we propose WPU-net that redesign the architecture and weighted loss of U-Net to force the network to integrate information from adjacent slices and pay more attention to the topology in this boundary detection task. Then, the WPU-net was applied into a typical material example, i.e., the grain boundary detection of polycrystalline material. Experiments demonstrate that the proposed method achieves promising performance compared to state-of-the-art methods. Besides, we propose a new method for object tracking between adjacent slices, which can effectively reconstruct the 3D structure of the whole material while maintaining relative accuracy.
Write Once, Deploy Anywhere
Write Once, Run Anywhere
Write once, run anywhere’ (WORA), or sometimes write once, run everywhere (WORE), is a slogan created by Sun Microsystems to illustrate the cross-platform benefits of the Java language. Ideally, this means Java can be developed on any device, compiled into a standard bytecode and be expected to run on any device equipped with a Java virtual machine (JVM). The installation of a JVM or Java interpreter on chips, devices or software packages has become an industry standard practice. This means a programmer can develop code on a PC and can expect it to run on Java enabled cell phones, as well as on routers and mainframes equipped with Java, without any adjustments. This is intended to save software developers the effort of writing a different version of their software for each platform or operating system they intend to deploy on. This idea originated as early as in the late 1970s, when the UCSD Pascal system was developed to produce and interpret p-code. UCSD Pascal (along with the Smalltalk virtual machine) was a key influence on the design of the Java virtual machine, as is cited by James Gosling. The catch is that since there are multiple JVM implementations, on top of a wide variety of different operating systems such as Windows, Linux, Solaris, NetWare, HP-UX, and Mac OS, there can be subtle differences in how a program may execute on each JVM/OS combination, which may require an application to be tested on various target platforms. This has given rise to a joke among Java developers, ‘Write Once, Debug Everywhere’. This architecture has sometimes been criticized as ‘Saying that Java is better because it works in all platforms is like saying that Anal Sex is better because it works with all genders.’. In comparison, the Squeak Smalltalk programming language and environment, boasts as being, ‘truly write once run anywhere’, because it ‘runs bit-identical images across its wide portability base’
WStream In the recent years, the scale of graph datasets has increased to such a degree that a single machine is not capable of efficiently processing large graphs. Thereby, efficient graph partitioning is necessary for those large graph applications. Traditional graph partitioning generally loads the whole graph data into the memory before performing partitioning; this is not only a time consuming task but it also creates memory bottlenecks. These issues of memory limitation and enormous time complexity can be resolved using stream-based graph partitioning. A streaming graph partitioning algorithm reads vertices once and assigns that vertex to a partition accordingly. This is also called an one-pass algorithm. This paper proposes an efficient window-based streaming graph partitioning algorithm called WStream. The WStream algorithm is an edge-cut partitioning algorithm, which distributes a vertex among the partitions. Our results suggest that the WStream algorithm is able to partition large graph data efficiently while keeping the load balanced across different partitions, and communication to a minimum. Evaluation results with real workloads also prove the effectiveness of our proposed algorithm, and it achieves a significant reduction in load imbalance and edge-cut with different ranges of dataset.