U*F Clustering  In this paper, we propose a new clustering method consisting in automated “flood fill segmentation” of the U*matrix of a SelfOrganizing Map after training. Using several artificial datasets as a benchmark, we find that the clustering results of our U*F method are good over a wide range of critical dataset types. Furthermore, comparison to standard clustering algorithms (Kmeans, singlelinkage and Ward) directly applied on the same datasets show that each of the latter performs very bad on at least one kind of dataset, contrary to our U*F clustering method: while not always the best, U*F clustering has the great advantage of exhibiting consistently good results. Another advantage of U*F is that the computation cost of the SOM segmentation phase is negligible, contrary to other SOMbased clustering approaches which apply O(n2logn) standard clustering algorithms to the SOM prototypes. Finally, it should be emphasized that U*F clustering does not require a priori knowledge on the number of clusters, making it a real “clustermining” algorithm. 
UDify  We present UDify, a multilingual multitask model capable of accurately predicting universal partofspeech, morphological features, lemmas, and dependency trees simultaneously for all 124 Universal Dependencies treebanks across 75 languages. By leveraging a multilingual BERT selfattention model pretrained on 104 languages, we found that finetuning it on all datasets concatenated together with simple softmax classifiers for each UD task can result in stateoftheart UPOS, UFeats, Lemmas, UAS, and LAS scores, without requiring any recurrent or languagespecific components. We evaluate UDify for multilingual learning, showing that lowresource languages benefit the most from crosslinguistic annotations. We also evaluate for zeroshot learning, with results suggesting that multilingual training provides strong UD predictions even for languages that neither UDify nor BERT have ever been trained on. Code for UDify is available at https://…/udify. 
Ukkonen’s Algorithm  In computer science, Ukkonen’s algorithm is a lineartime, online algorithm for constructing suffix trees, proposed by Esko Ukkonen in 1995. The algorithm begins with an implicit suffix tree containing the first character of the string. Then it steps through the string adding successive characters until the tree is complete. This order addition of characters gives Ukkonen’s algorithm its “online” property. The original algorithm presented by P. Weiner proceeded backward from the last character to the first one from the shortest to the longest suffix. A simpler algorithm was found by Edward M. McCreight, going from the longest to the shortest suffix. The naive implementation for generating a suffix tree going forward requires O(n2) or even O(n3) time complexity in big O notation, where n is the length of the string. By exploiting a number of algorithmic techniques, Ukkonen reduced this to O(n) (linear) time, for constantsize alphabets, and O(n log n) in general, matching the runtime performance of the earlier two algorithms. 
UltraScalable Ensemble Clustering (USENC) 
This paper focuses on scalability and robustness of spectral clustering for extremely largescale datasets with limited resources. Two novel algorithms are proposed, namely, ultrascalable spectral clustering (USPEC) and ultrascalable ensemble clustering (USENC). In USPEC, a hybrid representative selection strategy and a fast approximation method for Knearest representatives are proposed for the construction of a sparse affinity submatrix. By interpreting the sparse submatrix as a bipartite graph, the transfer cut is then utilized to efficiently partition the graph and obtain the clustering result. In USENC, multiple USPEC clusterers are further integrated into an ensemble clustering framework to enhance the robustness of USPEC while maintaining high efficiency. Based on the ensemble generation via multiple USEPC’s, a new bipartite graph is constructed between objects and base clusters and then efficiently partitioned to achieve the consensus clustering result. It is noteworthy that both USPEC and USENC have nearly linear time and space complexity, and are capable of robustly and efficiently partitioning tenmillionlevel nonlinearlyseparable datasets on a PC with 64GB memory. Experiments on various largescale datasets have demonstrated the scalability and robustness of our algorithms. The MATLAB code and experimental data are available at https://…/330760669. 
UltraScalable Spectral Clustering (USPEC) 
This paper focuses on scalability and robustness of spectral clustering for extremely largescale datasets with limited resources. Two novel algorithms are proposed, namely, ultrascalable spectral clustering (USPEC) and ultrascalable ensemble clustering (USENC). In USPEC, a hybrid representative selection strategy and a fast approximation method for Knearest representatives are proposed for the construction of a sparse affinity submatrix. By interpreting the sparse submatrix as a bipartite graph, the transfer cut is then utilized to efficiently partition the graph and obtain the clustering result. In USENC, multiple USPEC clusterers are further integrated into an ensemble clustering framework to enhance the robustness of USPEC while maintaining high efficiency. Based on the ensemble generation via multiple USEPC’s, a new bipartite graph is constructed between objects and base clusters and then efficiently partitioned to achieve the consensus clustering result. It is noteworthy that both USPEC and USENC have nearly linear time and space complexity, and are capable of robustly and efficiently partitioning tenmillionlevel nonlinearlyseparable datasets on a PC with 64GB memory. Experiments on various largescale datasets have demonstrated the scalability and robustness of our algorithms. The MATLAB code and experimental data are available at https://…/330760669. 
Unbiased Implicit Variational Inference (UIVI) 
We develop unbiased implicit variational inference (UIVI), a method that expands the applicability of variational inference by defining an expressive variational family. UIVI considers an implicit variational distribution obtained in a hierarchical manner using a simple reparameterizable distribution whose variational parameters are defined by arbitrarily flexible deep neural networks. Unlike previous works, UIVI directly optimizes the evidence lower bound (ELBO) rather than an approximation to the ELBO. We demonstrate UIVI on several models, including Bayesian multinomial logistic regression and variational autoencoders, and show that UIVI achieves both tighter ELBO and better predictive performance than existing approaches at a similar computational cost. 
UnBounded output network (UBnet) 
We proposed the expected energybased restricted Boltzmann machine (EERBM) as a discriminative RBM method for classification. Two characteristics of the EERBM are that the output is unbounded and that the target value of correct classification is set to a value much greater than one. In this study, by adopting features of the EERBM approach to feedforward neural networks, we propose the UnBounded output network (UBnet) which is characterized by three features: (1) unbounded output units; (2) the target value of correct classification is set to a value much greater than one; and (3) the models are trained by a modified meansquared error objective. We evaluate our approach using the MNIST, CIFAR10, and CIFAR100 benchmark datasets. We first demonstrate, for shallow UBnets on MNIST, that a setting of the target value equal to the number of hidden units significantly outperforms a setting of the target value equal to one, and it also outperforms standard neural networks by about 25\%. We then validate our approach by achieving highlevel classification performance on the three datasets using unbounded output residual networks. We finally use MNIST to analyze the learned features and weights, and we demonstrate that UBnets are much more robust against adversarial examples than the standard approach of using a softmax output layer and training the networks by a crossentropy objective. 
Uncertain Knowledge Graph Embedding Model (UKGE) 
Embedding models for deterministic Knowledge Graphs (KG) have been extensively studied, with the purpose of capturing latent semantic relations between entities and incorporating the structured knowledge into machine learning. However, there are many KGs that model uncertain knowledge, which typically model the inherent uncertainty of relations facts with a confidence score, and embedding such uncertain knowledge represents an unresolved challenge. The capturing of uncertain knowledge will benefit many knowledgedriven applications such as question answering and semantic search by providing more natural characterization of the knowledge. In this paper, we propose a novel uncertain KG embedding model UKGE, which aims to preserve both structural and uncertainty information of relation facts in the embedding space. Unlike previous models that characterize relation facts with binary classification techniques, UKGE learns embeddings according to the confidence scores of uncertain relation facts. To further enhance the precision of UKGE, we also introduce probabilistic soft logic to infer confidence scores for unseen relation facts during training. We propose and evaluate two variants of UKGE based on different learning objectives. Experiments are conducted on three realworld uncertain KGs via three tasks, i.e. confidence prediction, relation fact ranking, and relation fact classification. UKGE shows effectiveness in capturing uncertain knowledge by achieving promising results on these tasks, and consistently outperforms baselines on these tasks. 
Uncertainty Annotated Database (UADB) 
Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve the uncertainty. In this paper, we propose Uncertainty Annotated Databases (UADBs), which combine an under and overapproximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UADBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UADBs are based on incomplete Krelations, which we introduce to generalize the classical setbased notions of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximations of certain answers that are of high utility. 
Uncertainty Autoencoder  The goal of statistical compressive sensing is to efficiently acquire and reconstruct highdimensional signals with much fewer measurements than the data dimensionality, given access to a finite set of training signals. Current approaches do not learn the acquisition and recovery procedures endtoend and are typically handcrafted for sparsity based priors. We propose Uncertainty Autoencoders, a framework that jointly learns the acquisition (i.e., encoding) and recovery (i.e., decoding) procedures while implicitly modeling domain structure. Our learning objective optimizes for a variational lower bound to the mutual information between the signal and the measurements. We show how our framework provides a unified treatment to several lines of research in dimensionality reduction, compressive sensing, and generative modeling. Empirically, we demonstrate improvements of 32% on average over competing approaches for statistical compressive sensing of highdimensional datasets. 
Uncertainty in Artificial Intelligence (UAI) 
The Association for Uncertainty in Artificial Intelligence is a nonprofit organization focused on organizing the annual Conference on Uncertainty in Artificial Intelligence (UAI) and, more generally, on promoting research in pursuit of advances in knowledge representation, learning and reasoning under uncertainty. ➚ “Association for Uncertainty in Artificial Intelligence” 
Uncertainty Prediction Problem  Machine learning algorithms have been effectively applied into various real world tasks. However, it is difficult to provide highquality machine learning solutions to accommodate an unknown distribution of input datasets; this difficulty is called the uncertainty prediction problems. 
Uncertainty Quantification (UQ) 
A problem of considerable importance within the field of uncertainty quantification (UQ) is the development of efficient methods for the construction of accurate surrogate models. Such efforts are particularly important to applications constrained by highdimensional uncertain parameter spaces. The difficulty of accurate surrogate modeling in such systems, is further compounded by data scarcity brought about by the large cost of forward model evaluations. Traditional response surface techniques, such as Gaussian process regression (or Kriging) and polynomial chaos are difficult to scale to high dimensions. To make surrogate modeling tractable in expensive highdimensional systems, one must resort to dimensionality reduction of the stochastic parameter space. A recent dimensionality reduction technique that has shown great promise is the method of `active subspaces’. The classical formulation of active subspaces, unfortunately, requires gradient information from the forward model – often impossible to obtain. In this work, we present a simple, scalable method for recovering active subspaces in highdimensional stochastic systems, without gradientinformation that relies on a reparameterization of the orthogonal active subspace projection matrix, and couple this formulation with deep neural networks. We demonstrate our approach on synthetic and real world datasets and show favorable predictive comparison to classical active subspaces. 
Uncertainty Robust Bellman Equation (URBE) 
Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior. In this framework, transitions are modeled as arbitrary elements of a known and properly structured uncertainty set and a robust optimal policy can be derived under the worstcase scenario. In this study, we address the issue of learning in RMDPs using a Bayesian approach. We introduce the Uncertainty Robust Bellman Equation (URBE) which encourages safe exploration for adapting the uncertainty set to new observations while preserving robustness. We propose a URBEbased algorithm, DQNURBE, that scales this method to higher dimensional domains. Our experiments show that the derived URBEbased strategy leads to a better tradeoff between less conservative solutions and robustness in the presence of model misspecification. In addition, we show that the DQNURBE algorithm can adapt significantly faster to changing dynamics online compared to existing robust techniques with fixed uncertainty sets. 
UncertaintyAware Feature Selection (UAFS) 
Missing data are a concern in many real world data sets and imputation methods are often needed to estimate the values of missing data, but data sets with excessive missingness and high dimensionality challenge most approaches to imputation. Here we show that appropriate feature selection can be an effective preprocessing step for imputation, allowing for more accurate imputation and subsequent model predictions. The key feature of this preprocessing is that it incorporates uncertainty: by accounting for uncertainty due to missingness when selecting features we can reduce the degree of missingness while also limiting the number of uninformative features being used to make predictive models. We introduce a method to perform uncertaintyaware feature selection (UAFS), provide a theoretical motivation, and test UAFS on both real and synthetic problems, demonstrating that across a variety of data sets and levels of missingness we can improve the accuracy of imputations. Improved imputation due to UAFS also results in improved prediction accuracy when performing supervised learning using these imputed data sets. Our UAFS method is general and can be fruitfully coupled with a variety of imputation methods. 
UncertaintyAware Imitation Learning (UAIL) 
Estimating statistical uncertainties allows autonomous agents to communicate their confidence during task execution and is important for applications in safetycritical domains such as autonomous driving. In this work, we present the uncertaintyaware imitation learning (UAIL) algorithm for improving endtoend control systems via data aggregation. UAIL applies Monte Carlo Dropout to estimate uncertainty in the control output of endtoend systems, using states where it is uncertain to selectively acquire new training data. In contrast to prior data aggregation algorithms that force human experts to visit suboptimal states at random, UAIL can anticipate its own mistakes and switch control to the expert in order to prevent visiting a series of suboptimal states. Our experimental results from simulated driving tasks demonstrate that our proposed uncertainty estimation method can be leveraged to reliably predict infractions. Our analysis shows that UAIL outperforms existing data aggregation algorithms on a series of benchmark tasks. 
UncertaintyAware Principal Component Analysis  We present a technique to perform dimensionality reduction on data that is subject to uncertainty. Our method is a generalization of traditional principal component analysis (PCA) to multivariate probability distributions. In comparison to nonlinear methods, linear dimensionality reduction techniques have the advantage that the characteristics of such probability distributions remain intact after projection. We derive a representation of the covariance matrix that respects potential uncertainty in each of the observations, building the mathematical foundation of our new method uncertaintyaware PCA. In addition to the accuracy and performance gained by our approach over samplingbased strategies, our formulation allows us to perform sensitivity analysis with regard to the uncertainty in the data. For this, we propose factor traces as a novel visualization that enables us to better understand the influence of uncertainty on the chosen principal components. We provide multiple examples of our technique using realworld datasets and show how to propagate multivariate normal distributions through PCA in closedform. Furthermore, we discuss extensions and limitations of our approach. 
Unconditional Maximum Likelihood Estimation (UCON) 
➚ “Joint Maximum Likelihood Estimation” 
Unconstrained Optimization  Unconstrained Optimization works, in general, by doing a search, starting at some initial values and taking steps that decrease (or for FindMaximum, increase) an objective or merit function. 
Underfitting  Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. Intuitively, underfitting occurs when the model or the algorithm does not fit the data well enough. Specifically, underfitting occurs if the model or algorithm shows low variance but high bias. Underfitting is often a result of an excessively simple model. Both overfitting and underfitting lead to poor predictions on new data sets. http://…ttingoverfittingprobleminmclearning 
UNet  Machine reading comprehension with unanswerable questions is a new challenging task for natural language processing. A key subtask is to reliably predict whether the question is unanswerable. In this paper, we propose a unified model, called UNet, with three important components: answer pointer, noanswer pointer, and answer verifier. We introduce a universal node and thus process the question and its context passage as a single contiguous sequence of tokens. The universal node encodes the fused information from both the question and passage, and plays an important role to predict whether the question is answerable and also greatly improves the conciseness of the UNet. Different from the stateofart pipeline models, UNet can be learned in an endtoend fashion. The experimental results on the SQuAD 2.0 dataset show that UNet can effectively predict the unanswerability of questions and achieves an F1 score of 71.7 on SQuAD 2.0. 
UNet++  In this paper, we present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeplysupervised encoderdecoder network where the encoder and decoder subnetworks are connected through a series of nested, dense skip pathways. The redesigned skip pathways aim at reducing the semantic gap between the feature maps of the encoder and decoder subnetworks. We argue that the optimizer would deal with an easier learning task when the feature maps from the decoder and encoder networks are semantically similar. We have evaluated UNet++ in comparison with UNet and wide UNet architectures across multiple medical image segmentation tasks: nodule segmentation in the lowdose CT scans of chest, nuclei segmentation in the microscopy images, liver segmentation in abdominal CT scans, and polyp segmentation in colonoscopy videos. Our experiments demonstrate that UNet++ with deep supervision achieves an average IoU gain of 3.9 and 3.4 points over UNet and wide UNet, respectively. 
Uneven Group Convolution  In this paper, we are interested in boosting the representation capability of convolution neural networks which utilizing the inverted residual structure. Based on the success of Inverted Residual structure[Sandler et al. 2018] and Interleaved LowRank Group Convolutions[Sun et al. 2018], we rethink this two pattern of neural network structure, rather than NAS(Neural architecture search) method[Zoph and Le 2017; Pham et al. 2018; Liu et al. 2018b], we introduce uneven pointwise group convolution, which provide a novel search space for designing basic blocks to obtain better tradeoff between representation capability and computational cost. Meanwhile, we propose two novel information flow patterns that will enable crossgroup information flow for multiple group convolution layers with and without any channel permute/shuffle operation. Dense experiments on image classification task show that our proposed model, named SeesawNet, achieves stateoftheart (SOTA) performance with limited computation and memory cost. Our code will be opensource and available together with pretrained models. 
Unicorn  Some realworld domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent’s competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the frontiers that has resisted quick progress. To test continual learning capabilities we consider a challenging 3D domain with an implicit sequence of tasks and sparse rewards. We propose a novel agent architecture called Unicorn, which demonstrates strong continual learning and outperforms several baseline agents on the proposed domain. The agent achieves this by jointly representing and learning multiple policies efficiently, using a parallel offpolicy learning setup. 
Unidirectional Mass Transfer Model (MTM) 
Motivated by the classical SusceptibleInfectedRecovered (SIR) epidemic models proposed by Kermack and Mckendrick, we consider a class of stochastic compartmental dynamical systems with a notion of partial ordering among the compartments. We call such systems unidirectional Mass Transfer Models (MTMs). We show that there is a natural way of interpreting a unidirectional MTM as a Survival Dynamical System (SDS) that is described in terms of survival functions instead of population counts. This SDS interpretation allows us to employ tools from survival analysis to address various issues with data collection and statistical inference of unidirectional MTMs. In particular, we propose and numerically validate a statistical inference procedure based on SDSlikelihoods. We use the SIR model as a running example throughout the paper to illustrate the ideas. 
Unified Attention Network (UAN) 
We propose a new architecture that learns to attend to different Convolutional Neural Networks (CNN) layers (i.e., different levels of abstraction) and different spatial locations (i.e., specific layers within a given feature map) in a sequential manner to perform the task at hand. Specifically, at each Recurrent Neural Network (RNN) timestep, a CNN layer is selected and its output is processed by a spatial softattention mechanism. We refer to this architecture as the Unified Attention Network (UAN), since it combines the ‘what’ and ‘where’ aspects of attention, i.e., ‘what’ level of abstraction to attend to, and ‘where’ should the network look at. We demonstrate the effectiveness of this approach on two computer vision tasks: (i) imagebased camera pose and orientation regression and (ii) indoor scene classification. We evaluate our method on standard benchmarks for camera localization (Cambridge, 7Scene, and TUMLSI datasets) and for scene classification (MIT67 indoor dataset), and show that our method improves upon the results of previous methods. Empirically, we show that combining ‘what’ and ‘where’ aspects of attention improves network performance on both tasks. 
Unified pretrained Language Model (UniLM) 
This paper presents a new Unified pretrained Language Model (UniLM) that can be finetuned for both natural language understanding and generation tasks. The model is pretrained using three types of language modeling objectives: unidirectional (both lefttoright and righttoleft), bidirectional, and sequencetosequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific selfattention masks to control what context the prediction conditions on. We can finetune UniLM as a unidirectional decoder, a bidirectional encoder, or a sequencetosequence model to support various downstream natural language understanding and generation tasks. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, our model achieves new stateoftheart results on three natural language generation tasks, including improving the CNN/DailyMail abstractive summarization ROUGEL to 40.63 (2.16 absolute improvement), pushing the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), and the SQuAD question generation BLEU4 to 22.88 (6.50 absolute improvement). 
Uniform Manifold Approximation and Projection (UMAP) 
UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with tSNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning. 
Unifying Heterogeneous Classifiers (UHC) 
In this paper, we study the problem of unifying knowledge from a set of classifiers with different architectures and target classes into a single classifier, given only a generic set of unlabelled data. We call this problem Unifying Heterogeneous Classifiers (UHC). This problem is motivated by scenarios where data is collected from multiple sources, but the sources cannot share their data, e.g., due to privacy concerns, and only privately trained models can be shared. In addition, each source may not be able to gather data to train all classes due to data availability at each source, and may not be able to train the same classification model due to different computational resources. To tackle this problem, we propose a generalisation of knowledge distillation to merge HCs. We derive a probabilistic relation between the outputs of HCs and the probability over all classes. Based on this relation, we propose two classes of methods based on crossentropy minimisation and matrix factorisation, which allow us to estimate soft labels over all classes from unlabelled samples and use them in lieu of ground truth labels to train a unified classifier. Our extensive experiments on ImageNet, LSUN, and Places365 datasets show that our approaches significantly outperform a naive extension of distillation and can achieve almost the same accuracy as classifiers that are trained in a centralised, supervised manner. 
Unigram Model  A unigram model used in information retrieval can be treated as the combination of several onestate finite automata. It splits the probabilities of different terms in a context. In this model, the probability to hit each word all depends on its own, so we only have onestate finite automata as units. For each automaton, we only have one way to hit its only state, assigned with one probability. Viewing from the whole model, the sum of all the onestatehitting probabilities should be 1. 
UniParse  This paper describes the design and use of the graphbased parsing framework and toolkit UniParse, released as an opensource python software package. UniParse as a framework novelly streamlines research prototyping, development and evaluation of graphbased dependency parsing architectures. UniParse does this by enabling highly efficient, sufficiently independent, easily readable, and easily extensible implementations for all dependency parser components. We distribute the toolkit with readymade configurations as reimplementations of all current stateoftheart firstorder graphbased parsers, including even more efficient Cython implementations of both encoders and decoders, as well as the required specialised loss functions. 
Unique Trait Combinations (UTC) 
multirich 
UniSent  In this paper, we introduce UniSent a universal sentiment lexica for 1000 languages created using an English sentiment lexicon and a massively parallel corpus in the Bible domain. To the best of our knowledge, UniSent is the largest sentiment resource to date in terms of number of covered languages, including many low resource languages. To create UniSent, we propose Adapted Sentiment Pivot, a novel method that combines annotation projection, vocabulary expansion, and unsupervised domain adaptation. We evaluate the quality of UniSent for Macedonian, Czech, German, Spanish, and French and show that its quality is comparable to manually or semimanually created sentiment resources. With the publication of this paper, we release UniSent lexica as well as Adapted Sentiment Pivot related codes. method. 
Unit of Analysis  One of the most important ideas in a research project is the unit of analysis. The unit of analysis is the major entity that you are analyzing in your study. For instance, any of the following could be a unit of analysis in a study: · individuals · groups · artifacts (books, photos, newspapers) · geographical units (town, census tract, state) · social interactions (dyadic relations, divorces, arrests) Why is it called the ‘unit of analysis’ and not something else (like, the unit of sampling)? Because it is the analysis you do in your study that determines what the unit is. For instance, if you are comparing the children in two classrooms on achievement test scores, the unit is the individual child because you have a score for each child. On the other hand, if you are comparing the two classes on classroom climate, your unit of analysis is the group, in this case the classroom, because you only have a classroom climate score for the class as a whole and not for each individual student. For different analyses in the same study you may have different units of analysis. If you decide to base an analysis on student scores, the individual is the unit. But you might decide to compare average classroom performance. In this case, since the data that goes into the analysis is the average itself (and not the individuals’ scores) the unit of analysis is actually the group. Even though you had data at the student level, you use aggregates in the analysis. In many areas of social research these hierarchies of analysis units have become particularly important and have spawned a whole area of statistical analysis sometimes referred to as hierarchical modeling. This is true in education, for instance, where we often compare classroom performance but collected achievement data at the individual student level. 
Unit Root Processes  A unit root is a feature of processes that evolves through time that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if 1 is a root of the process’s characteristic equation. Such a process is nonstationary. If the other roots of the characteristic equation lie inside the unit circlethat is, have a modulus (absolute value) less than onethen the first difference of the process will be stationary. http://…/intuitiveexplanationofunitroot http://…/08_unitroottests_2pp.pdf 
Unitary Group Convolution (UGConv) 
We propose unitary group convolutions (UGConvs), a building block for CNNs which compose a group convolution with unitary transforms in feature space to learn a richer set of representations than group convolution alone. UGConvs generalize two disparate ideas in CNN architecture, channel shuffling (i.e. ShuffleNet) and blockcirculant networks (i.e. CirCNN), and provide unifying insights that lead to a deeper understanding of each technique. We experimentally demonstrate that dense unitary transforms can outperform channel shuffling in DNN accuracy. On the other hand, different dense transforms exhibit comparable accuracy performance. Based on these observations we propose HadaNet, a UGConv network using Hadamard transforms. HadaNets achieve similar accuracy to circulant networks with lower computation complexity, and better accuracy than ShuffleNets with the same number of parameters and floatingpoint multiplies. 
Unitary Recurrent Neural Network (uRNN) 

UnitB  We present UnitB, a formal method inspired by EventB and UNITY. UnitB aims at the stepwise design of software systems satisfying safety and liveness properties. The method features the novel notion of coarse and fine schedules, a generalisation of weak and strong fairness for specifying events’ scheduling assumptions. Based on events schedules, we propose proof rules to reason about progress properties and a refinement order preserving both liveness and safety properties. We illustrate our approach by an example to show that systems development can be driven by not only safety but also liveness requirements. 
Unity MLAgents Toolkit (Unity) 
Recent advances in Deep Reinforcement Learning and Robotics have been driven by the presence of increasingly realistic and complex simulation environments. Many of the existing platforms, however, provide either unrealistic visuals, inaccurate physics, low task complexity, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to flexibly configure the simulation, hence turning the simulation environment into a blackbox from the perspective of the learning system. Here we describe a new open source toolkit for creating and interacting with simulation environments using the Unity platform: Unity MLAgents Toolkit. By taking advantage of Unity as a simulation platform, the toolkit enables the development of learning environments which are rich in sensory and physical complexity, provide compelling cognitive challenges, and support dynamic multiagent interaction. We detail the platform design, communication protocol, set of example environments, and variety of training scenarios made possible via the toolkit. 
Universal Approximation Theorem  In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feedforward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. The theorem thus states that simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; it does not touch upon the algorithmic learnability of those parameters. One of the first versions of the theorem was proved by George Cybenko in 1989 for sigmoid activation functions. Kurt Hornik showed in 1991 that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators. The output units are always assumed to be linear. For notational convenience, only the single output case will be shown. The general case can easily be deduced from the single output case. 
Universal Denoising Network  We design a novel network architecture for learning discriminative image models that are employed to efficiently tackle the problem of grayscale and color image denoising. Based on the proposed architecture, we introduce two different variants. The first network involves convolutional layers as a core component, while the second one relies instead on nonlocal filtering layers and thus it is able to exploit the inherent nonlocal selfsimilarity property of natural images. As opposed to most of the existing neural networks, which require the training of a specific model for each considered noise level, the proposed networks are able to handle a wide range of different noise levels, while they are very robust when the noise degrading the latent image does not match the statistics of the one used during training. The latter argument is supported by results that we report on publicly available images corrupted by unknown noise and which we compare against solutions obtained by alternative stateoftheart methods. At the same time the introduced networks achieve excellent results under additive white Gaussian noise (AWGN), which are comparable to the current stateoftheart network, while they depend on a more shallow architecture with the number of trained parameters being one order of magnitude smaller. These properties make the proposed networks ideal candidates to serve as subsolvers on restoration methods that deal with general inverse imaging problems such as deblurring, demosaicking, superresolution, etc. 
Universal Hypercomputer  This paper describes a type of infinitary computer (a hypercomputer) capable of computing truth in initial levels of the set theoretic universe, V. The proper class of such hypercomputers is called a universal hypercomputer. There are two basic variants of hypercomputer: a serial hypercomputer and a parallel hypercomputer. The set of computable functions of the two variants is identical but the parallel hypercomputer is in general faster than a serial hypercomputer (as measured by an ordinal complexity measure). Insights into set theory using information theory and a universal hypercomputer are possible, and it is argued that the Generalised Continuum Hypothesis can be regarded as a informationtheoretic principle, which follows from an information minimization principle. 
Universal Language Model FineTuning (ULMFiT) 
Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require taskspecific modifications and training from scratch. We propose Universal Language Model Finetuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for finetuning a language model. Our method significantly outperforms the stateoftheart on six text classification tasks, reducing the error by 18 – 24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100 more data. We opensource our pretrained models and code. MultiTask Learning in Language Model for Text Classification 
Universal Machine Learning Workflow  Applying the Universal Machine Learning Workflow to the UCI Mushroom Dataset Deep Learning with Python 
Universal Numeric Fingerprint (UNF) 
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed. The signature is thus independent of the storage format. E.g., the same data object stored in, say, SPSS and Stata, will have the same UNF. A universal numeric fingerprint is used to guarantee that a two digital objects (or parts thereof) in different formats represent the same intellectual object (or work). UNFs are formed by generating an approximation of the intellectual content of the object, putting this in a normalized form, and applying a cryptographic hash to produce a unique key. (Altman, et al. 2003) http://…/index.html UNF 
Universal Planning Network (UPN) 
A key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization. To this end, we introduce universal planning networks (UPN). UPNs embed differentiable planning within a goaldirected policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradient descent trajectory optimization. The planbygradientdescent process and its underlying representations are learned endtoend to directly optimize a supervised imitation learning objective. We find that the representations learned are not only effective for goaldirected visual imitation via gradientbased trajectory optimization, but can also provide a metric for specifying goals using images. The learned representations can be leveraged to specify distancebased rewards to reach new target states for modelfree reinforcement learning, resulting in substantially more effective learning when solving new tasks described via imagebased goals. We were able to achieve successful transfer of visuomotor planning strategies across robots with significantly different morphologies and actuation capabilities. 
Universal Sentence Encoder  We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for tradeoffs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pretrained sentence encoding models are made freely available for download and on TF Hub. 
Universal Successor Features Approximator (USFA) 
The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange of information between tasks, and the reuse of skills. We focus on one aspect in particular, namely the ability to generalise to unseen tasks. Parametric generalisation relies on the interpolation power of a function approximator that is given the task description as input; one of its most common form are universal value function approximators (UVFAs). Another way to generalise to new tasks is to exploit structure in the RL problem itself. Generalised policy improvement (GPI) combines solutions of previous tasks into a policy for the unseen task; this relies on instantaneous policy evaluation of old policies under the new reward function, which is made possible through successor features (SFs). Our proposed universal successor features approximators (USFAs) combine the advantages of all of these, namely the scalability of UVFAs, the instant inference of SFs, and the strong generalisation of GPI. We discuss the challenges involved in training a USFA, its generalisation properties and demonstrate its practical benefits and transfer abilities on a largescale domain in which the agent has to navigate in a firstperson perspective threedimensional environment. 
Universal Transformer  Selfattentive feedforward sequence models have been shown to achieve impressive results on sequence modeling tasks, thereby presenting a compelling alternative to recurrent neural networks (RNNs) which has remained the defacto standard architecture for many sequence modeling problems to date. Despite these successes, however, feedforward sequence models like the Transformer fail to generalize in many tasks that recurrent models handle with ease (e.g. copying when the string lengths exceed those observed at training time). Moreover, and in contrast to RNNs, the Transformer model is not computationally universal, limiting its theoretical expressivity. In this paper we propose the Universal Transformer which addresses these practical and theoretical shortcomings and we show that it leads to improved performance on several tasks. Instead of recurring over the individual symbols of sequences like RNNs, the Universal Transformer repeatedly revises its representations of all symbols in the sequence with each recurrent step. In order to combine information from different parts of a sequence, it employs a selfattention mechanism in every recurrent step. Assuming sufficient memory, its recurrence makes the Universal Transformer computationally universal. We further employ an adaptive computation time (ACT) mechanism to allow the model to dynamically adjust the number of times the representation of each position in a sequence is revised. Beyond saving computation, we show that ACT can improve the accuracy of the model. Our experiments show that on various algorithmic tasks and a diverse set of largescale language understanding tasks the Universal Transformer generalizes significantly better and outperforms both a vanilla Transformer and an LSTM in machine translation, and achieves a new state of the art on the bAbI linguistic reasoning task and the challenging LAMBADA language modeling task. 
Unmixing Transducer  The goal of this work is to develop a meeting transcription system that can recognize speech even when utterances of different speakers are overlapped. While speech overlaps have been regarded as a major obstacle in accurately transcribing meetings, a traditional beamformer with a single output has been exclusively used because previously proposed speech separation techniques have critical constraints for application to real meetings. This paper proposes a new signal processing module, called an unmixing transducer, and describes its implementation using a windowed BLSTM. The unmixing transducer has a fixed number, say J, of output channels, where J may be different from the number of meeting attendees, and transforms an input multichannel acoustic signal into J timesynchronous audio streams. Each utterance in the meeting is separated and emitted from one of the output channels. Then, each output signal can be simply fed to a speech recognition backend for segmentation and transcription. Our meeting transcription system using the unmixing transducer outperforms a system based on a stateoftheart neural maskbased beamformer by 10.8%. Significant improvements are observed in overlapped segments. To the best of our knowledge, this is the first report that applies overlapped speech recognition to unconstrained real meeting audio. 
Unobserved Component Models (UCM) 
A UCM decomposes the response series into components such as trend, seasons, cycles, and the regression effects due to predictor series. rucm 
Unremarkable AI  Clinical decision support tools (DST) promise improved healthcare outcomes by offering datadriven insights. While effective in lab settings, almost all DSTs have failed in practice. Empirical research diagnosed poor contextual fit as the cause. This paper describes the design and field evaluation of a radically new form of DST. It automatically generates slides for clinicians’ decision meetings with subtly embedded machine prognostics. This design took inspiration from the notion of ‘Unremarkable Computing’, that by augmenting the users’ routines technology/AI can have significant importance for the users yet remain unobtrusive. Our field evaluation suggests clinicians are more likely to encounter and embrace such a DST. Drawing on their responses, we discuss the importance and intricacies of finding the right level of unremarkableness in DST design, and share lessons learned in prototyping critical AI systems as a situated experience. 
Unrooted Binary Tree  In mathematics and computer science, an unrooted binary tree is an unrooted tree in which each vertex has either one or three neighbors. A free tree or unrooted tree is a connected undirected graph with no cycles. The vertices with one neighbor are the leaves of the tree, and the remaining vertices are the internal nodes of the tree. The degree of a vertex is its number of neighbors; in a tree with more than one node, the leaves are the vertices of degree one. An unrooted binary tree is a free tree in which all internal nodes have degree exactly three. In some applications it may make sense to distinguish subtypes of unrooted binary trees: a planar embedding of the tree may be fixed by specifying a cyclic ordering for the edges at each vertex, making it into a plane tree. In computer science, binary trees are often rooted and ordered when they are used as data structures, but in the applications of unrooted binary trees in hierarchical clustering and evolutionary tree reconstruction, unordered trees are more common. Additionally, one may distinguish between trees in which all vertices have distinct labels, trees in which the leaves only are labeled, and trees in which the nodes are not labeled. In an unrooted binary tree with n leaves, there will be n − 2 internal nodes, so the labels may be taken from the set of integers from 1 to 2n − 1 when all nodes are to be labeled, or from the set of integers from 1 to n when only the leaves are to be labeled. 
Unrooted Tree  In a context where trees are supposed to have a root, a tree without any designated root is called a free tree / unrooted tree. 
Unstructured Information Management Architecture (UIMA) 
UIMA stands for Unstructured Information Management Architecture. An OASIS standard as of March 2009, UIMA is to date the only industry standard for content analytics. Other general frameworks used for natural language processing include the General Architecture for Text Engineering (GATE) and the Natural Language Toolkit (NLTK). 
Unstructured Semantic Model (USM) 
With the rapid development of online advertising and recommendation systems, clickthrough rate prediction is expected to play an increasingly important role.Recently many DNNbased models which follow a similar Embedding&MLP paradigm have been proposed, and have achieved good result in image/voice and nlp fields.In these methods the Wide&Deep model announced by Google plays a key role.Most models first map large scale sparse input features into lowdimensional vectors which are transformed to fixedlength vectors, then concatenated together before being fed into a multilayer perceptron (MLP) to learn nonlinear relations among input features. The number of trainable variables normally grow dramatically the number of feature fields and the embedding dimension grow. It is a big challenge to get stateoftheart result through training deep neural network and embedding together, which falls into local optimal or overfitting easily.In this paper, we propose an Unstructured Semantic Model (USM) to tackles this challenge by designing a orthogonal base convolution and pooling model which adaptively learn the multiscale base semantic representation between features supervised by the click label.The output of USM are then used in the Wide&Deep for CTR prediction.Experiments on two public datasets as well as real Weibo production dataset with over 1 billion samples have demonstrated the effectiveness of our proposed approach with superior performance comparing to stateoftheart methods. 
Unsupervised Adversarial Invariance  ➚ “Invariance Induction Framework” 
Unsupervised Continual Learning (UCL) 
We first pose the Unsupervised Continual Learning (UCL) problem: learning salient representations from a nonstationary stream of unlabeled data in which the number of object classes varies with time. Given limited labeled data just before inference, those representations can also be associated with specific object types to perform classification. To solve the UCL problem, we propose an architecture that involves a single module, called SelfTaught Associative Memory (STAM), which loosely models the function of a cortical column in the mammalian brain. Hierarchies of STAM modules learn based on a combination of Hebbian learning, online clustering, detection of novel patterns, forgetting outliers, and topdown predictions. We illustrate the operation of STAMs in the context of learning handwritten digits in a continual manner with only 312 labeled examples per class. STAMs suggest a promising direction to solve the UCL problem without catastrophic forgetting. 
Unsupervised Correlation Analysis (UCA) 
Linking between two data sources is a basic building block in numerous computer vision problems. In this paper, we set to answer a fundamental cognitive question: are prior correspondences necessary for linking between different domains? One of the most popular methods for linking between domains is Canonical Correlation Analysis (CCA). All current CCA algorithms require correspondences between the views. We introduce a new method Unsupervised Correlation Analysis (UCA), which requires no prior correspondences between the two domains. The correlation maximization term in CCA is replaced by a combination of a reconstruction term (similar to autoencoders), full cycle loss, orthogonality and multiple domain confusion terms. Due to lack of supervision, the optimization leads to multiple alternative solutions with similar scores and we therefore introduce a consensusbased mechanism that is often able to recover the desired solution. Remarkably, this suffices in order to link remote domains such as text and images. We also present results on well accepted CCA benchmarks, showing that performance far exceeds other unsupervised baselines, and approaches supervised performance in some cases. 
Unsupervised Coupled Cycle Generative Adversarial Hashing Network (UCH) 
In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in largescale crossmodal retrieval. Benefiting from deep leaning, continuously compelling results in crossmodal retrieval community have been achieved. However, existing deep crossmodal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for crossmodal retrieval, where outercycle network is used to learn powerful common representation, and innercycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the stateoftheart unsupervised crossmodal hashing methods. 
Unsupervised Data Augmentation (UDA) 
Despite its success, deep learning still needs large labeled datasets to succeed. Data augmentation has shown much promise in alleviating the need for more labeled data, but it so far has mostly been applied in supervised settings and achieved limited gains. In this work, we propose to apply data augmentation to unlabeled data in a semisupervised learning setting. Our method, named Unsupervised Data Augmentation or UDA, encourages the model predictions to be consistent between an unlabeled example and an augmented unlabeled example. Unlike previous methods that use random noise such as Gaussian noise or dropout noise, UDA has a small twist in that it makes use of harder and more realistic noise generated by stateoftheart data augmentation methods. This small twist leads to substantial improvements on six language tasks and three vision tasks even when the labeled set is extremely small. For example, on the IMDb text classification dataset, with only 20 labeled examples, UDA outperforms the stateoftheart model trained on 25,000 labeled examples. On standard semisupervised learning benchmarks, CIFAR10 with 4,000 examples and SVHN with 1,000 examples, UDA outperforms all previous approaches and reduces more than $30\%$ of the error rates of stateoftheart methods: going from 7.66% to 5.27% and from 3.53% to 2.46% respectively. UDA also works well on datasets that have a lot of labeled data. For example, on ImageNet, with 1.3M extra unlabeled data, UDA improves the top1/top5 accuracy from 78.28/94.36% to 79.04/94.45% when compared to AutoAugment. 
Unsupervised Dialog Structure Learning  Learning a shared dialog structure from a set of taskoriented dialogs is an important challenge in computational linguistics. The learned dialog structure can shed light on how to analyze human dialogs, and more importantly contribute to the design and evaluation of dialog systems. We propose to extract dialog structures using a modified VRNN model with discrete latent vectors. Different from existing HMMbased models, our model is based on variationalautoencoder (VAE). Such model is able to capture more dynamics in dialogs beyond the surface forms of the language. We find that qualitatively, our method extracts meaningful dialog structure, and quantitatively, outperforms previous models on the ability to predict unseen data. We further evaluate the model’s effectiveness in a downstream task, the dialog system building task. Experiments show that, by integrating the learned dialog structure into the reward function design, the model converges faster and to a better outcome in a reinforcement learning setting. 
Unsupervised Domain Adaptation (UDA) 
Unsupervised domain adaptation (UDA) is important for applications where large scale annotation of representative data is challenging. For semantic segmentation in particular, it helps deploy on real ‘target domain’ data models that are trained on annotated images from a different ‘source domain’, notably a virtual environment. To this end, most previous works consider semantic segmentation as the only mode of supervision for source domain data, while ignoring other, possibly available, information like depth. In this work, we aim at exploiting at best such a privileged information while training the UDA model. We propose a unified depthaware UDA framework that leverages in several complementary ways the knowledge of dense depth in the source domain. As a result, the performance of the trained semantic segmentation model on the target domain is boosted. Our novel approach indeed achieves stateoftheart performance on different challenging synthetic2real benchmarks. 
Unsupervised Ensemble Learning via Ising Model Approximation (unElisa) 
Unsupervised ensemble learning has long been an interesting yet challenging problem that comes to prominence in recent years with the increasing demand of crowdsourcing in various applications. In this paper, we propose a novel method– unsupervised ensemble learning via Ising model approximation (unElisa) that combines a pruning step with a predicting step. We focus on the binary case and use an Ising model to characterize interactions between the ensemble and the underlying true classifier. The presence of an edge between an observed classifier and the true classifier indicates a direct dependence whereas the absence indicates the corresponding one provides no additional information and shall be eliminated. This observation leads to the pruning step where the key is to recover the neighborhood of the true classifier. We show that it can be recovered successfully with exponentially decaying error in the highdimensional setting by performing nodewise $\ell_1$regularized logistic regression. The pruned ensemble allows us to get a consistent estimate of the Bayes classifier for predicting. We also propose an augmented version of majority voting by reversing all labels given by a subgroup of the pruned ensemble. We demonstrate the efficacy of our method through extensive numerical experiments and through the application to EHRbased phenotyping prediction on Rheumatoid Arthritis (RA) using data from Partners Healthcare System. 
Unsupervised Generative Adversarial CrossModal Hashing (UGACH) 
Crossmodal hashing aims to map heterogeneous multimedia data into a common Hamming space, which can realize fast and flexible retrieval across different modalities. Unsupervised crossmodal hashing is more flexible and applicable than supervised methods, since no intensive labeling work is involved. However, existing unsupervised methods learn hashing functions by preserving inter and intra correlations, while ignoring the underlying manifold structure across different modalities, which is extremely helpful to capture meaningful nearest neighbors of different modalities for crossmodal retrieval. To address the above problem, in this paper we propose an Unsupervised Generative Adversarial Crossmodal Hashing approach (UGACH), which makes full use of GAN’s ability for unsupervised representation learning to exploit the underlying manifold structure of crossmodal data. The main contributions can be summarized as follows: (1) We propose a generative adversarial network to model crossmodal hashing in an unsupervised fashion. In the proposed UGACH, given a data of one modality, the generative model tries to fit the distribution over the manifold structure, and select informative data of another modality to challenge the discriminative model. The discriminative model learns to distinguish the generated data and the true positive data sampled from correlation graph to achieve better retrieval accuracy. These two models are trained in an adversarial way to improve each other and promote hashing function learning. (2) We propose a correlation graph based approach to capture the underlying manifold structure across different modalities, so that data of different modalities but within the same manifold can have smaller Hamming distance and promote retrieval accuracy. Extensive experiments compared with 6 stateoftheart methods verify the effectiveness of our proposed approach. 
Unsupervised Open Domain Recognition (UODR) 
We address the unsupervised open domain recognition (UODR) problem, where categories in labeled source domain S is only a subset of those in unlabeled target domain T. The task is to correctly classify all samples in T including known and unknown categories. UODR is challenging due to the domain discrepancy, which becomes even harder to bridge when a large number of unknown categories exist in T. Moreover, the classification rules propagated by graph CNN (GCN) may be distracted by unknown categories and lack generalization capability. To measure the domain discrepancy for asymmetric label space between S and T, we propose SemanticGuided Matching Discrepancy (SGMD), which first employs instance matching between S and T, and then the discrepancy is measured by a weighted feature distance between matched instances. We further design a limited balance constraint to achieve a more balanced classification output on known and unknown categories. We develop Unsupervised Open Domain Transfer Network (UODTN), which learns both the backbone classification network and GCN jointly by reducing the SGMD, enforcing the limited balance constraint and minimizing the classification loss on S. UODTN better preserves the semantic structure and enforces the consistency between the learned domain invariant visual features and the semantic embeddings. Experimental results show superiority of our method on recognizing images of both known and unknown categories. 
Unsupervised Recurrent Neural Network Grammars (URNNG) 
Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a topdown, lefttoright order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms. 
Unsupervised Semantic Deep Hashing (USDH) 
In recent years, deep hashing methods have been proved to be efficient since it employs convolutional neural network to learn features and hashing codes simultaneously. However, these methods are mostly supervised. In realworld application, it is a timeconsuming and overloaded task for annotating a large number of images. In this paper, we propose a novel unsupervised deep hashing method for largescale image retrieval. Our method, namely unsupervised semantic deep hashing (\textbf{USDH}), uses semantic information preserved in the CNN feature layer to guide the training of network. We enforce four criteria on hashing codes learning based on VGG19 model: 1) preserving relevant information of feature space in hashing space; 2) minimizing quantization loss between binarylike codes and hashing codes; 3) improving the usage of each bit in hashing codes by using maximum information entropy, and 4) invariant to image rotation. Extensive experiments on CIFAR10, NUSWIDE have demonstrated that \textbf{USDH} outperforms several stateoftheart unsupervised hashing methods for image retrieval. We also conduct experiments on Oxford 17 datasets for finegrained classification to verify its efficiency for other computer vision tasks. 
Unsupervised Temperature Scaling (UTS) 
Great performances of deep learning are undeniable, with impressive results on wide range of tasks. However, the output confidence of these models is usually not well calibrated, which can be an issue for applications where confidence on the decisions is central to bring trust and reliability (e.g., autonomous driving or medical diagnosis). For models using softmax at the last layer, Temperature Scaling (TS) is a stateoftheart calibration method, with low time and memory complexity as well as demonstrated effectiveness. TS relies on a T parameter to rescale and calibrate values of the softmax layer, using a labelled dataset to determine the value of that parameter.We are proposing an Unsupervised Temperature Scaling (UTS) approach, which does not dependent on labelled samples to calibrate the model,allowing, for example, using a part of test samples for calibrating the pretrained model before going into inference mode. We provide theoretical justifications for UTS and assess its effectiveness on the wide range of deep models and datasets. We also demonstrate calibration results of UTS on skin lesion detection, a problem where a wellcalibrated output can play an important role for accurate decisionmaking. 
Unsupervised Tracklet Association Learning (UTAL) 
Most existing person reidentification (reid) methods rely on supervised model learning on percamerapair manually labelled pairwise training data. This leads to poor scalability in a practical reid deployment, due to the lack of exhaustive identity labelling of positive and negative image pairs for every camerapair. In this work, we present an unsupervised reid deep learning approach. It is capable of incrementally discovering and exploiting the underlying reid discriminative information from automatically generated person tracklet data endtoend. We formulate an Unsupervised Tracklet Association Learning (UTAL) framework. This is by jointly learning withincamera tracklet discrimination and crosscamera tracklet association in order to maximise the discovery of tracklet identity matching both within and across camera views. Extensive experiments demonstrate the superiority of the proposed model over the stateoftheart unsupervised learning and domain adaptation person reid methods on eight benchmarking datasets. 
Unum Number Format (Unum) 
The unum (universal number) format is a floating point format proposed by John Gustafson as an alternative to the now ubiquitous IEEE 754 format. The proposal and justification are explained in his book The End of Error. The two defining features of the unum format (while unum 2.0 is different) are: · a variablewidth storage format for both the significand and exponent, and · an ubit, which determines whether the unum corresponds to an exact number (u=0), or an interval between consecutive exact unums (u=1). In this way, the unums cover the entire extended real number line . For performing computation with the format, Gustafson proposes using interval arithmetic with a pair of unums, what he calls an ubound, providing the guarantee that the resulting interval contains the exact solution. Unum implementations have been explored in Julia. including unum 2.0 (or at least a modified version of his new proposal). Recently, unum has been explored in MATLAB. The Unum Number Format: Mathematical Foundations, Implementation and Comparison to IEEE 754 FloatingPoint Numbers 
Uplift Modeling  Uplift modelling, also known as incremental modelling, true lift modelling, or net modelling is a predictive modelling technique that directly models the incremental impact of a treatment (such as a direct marketing action) on an individual’s behaviour. Uplift modelling has applications in customer relationship management for upsell, crosssell and retention modelling. It has also been applied to political election and personalised medicine. Unlike the related Differential Prediction concept in psychology, Uplift Modelling assumes an active agent. 
UPM  The constant growth of the ecommerce industry has rendered the problem of product retrieval particularly important. As more enterprises move their activities on the Web, the volume and the diversity of the productrelated information increase quickly. These factors make it difficult for the users to identify and compare the features of their desired products. Recent studies proved that the standard similarity metrics cannot effectively identify identical products, since similar titles often refer to different products and viceversa. Other studies employed external data sources (search engines) to enrich the titles; these solutions are rather impractical mainly because the external data fetching is slow. In this paper we introduce UPM, an unsupervised algorithm for matching products by their titles. UPM is independent of any external sources, since it analyzes the titles and extracts combinations of words out of them. These combinations are evaluated according to several criteria, and the most appropriate of them constitutes the cluster where a product is classified into. UPM is also parameterfree, it avoids product pairwise comparisons, and includes a postprocessing verification stage which corrects the erroneous matches. The experimental evaluation of UPM demonstrated its superiority against the stateoftheart approaches in terms of both efficiency and effectiveness. 
Upper Confidence Bound (UCB) 
Bayesian optimisation (BO) has been a successful approach to optimise functions which are expensive to evaluate and whose observations are noisy. Classical BO algorithms, however, do not account for errors about the location where observations are taken, which is a common issue in problems with physical components. In these cases, the estimation of the actual query location is also subject to uncertainty. In this context, we propose an upper confidence bound (UCB) algorithm for BO problems where both the outcome of a query and the true query location are uncertain. The algorithm employs a Gaussian process model that takes probability distributions as inputs. Theoretical results are provided for both the proposed algorithm and a conventional UCB approach within the uncertaininputs setting. Finally, we evaluate each method’s performance experimentally, comparing them to other input noise aware BO approaches on simulated scenarios involving synthetic and real data. 
Upper Edge Cover  Optimization problems consist of either maximizing or minimizing an objective function. Instead of looking for a maximum solution (resp. minimum solution), one can find a minimum maximal solution (resp. maximum minimal solution). Such ‘flipping’ of the objective function was done for many classical optimization problems. For example, Minimum Vertex Cover becomes Maximum Minimal Vertex Cover, Maximum Independent Set becomes Minimum Maximal Independent Set and so on. In this paper, we propose to study the weighted version of Maximum Minimal Edge Cover called Upper Edge Cover, a problem having application in the genomic sequence alignment. It is wellknown that Minimum Edge Cover is polynomialtime solvable and the ‘flipped’ version is NPhard, but constant approximable. We show that the weighted Upper Edge Cover is much more difficult than Upper Edge Cover because it is not $O(\frac{1}{n^{1/2\varepsilon}})$ approximable, nor $O(\frac{1}{\Delta^{1\varepsilon}})$ in edgeweighted graphs of size $n$ and maximum degree $\Delta$ respectively. Indeed, we give some hardness of approximation results for some special restricted graph classes such as bipartite graphs, split graphs and $k$trees. We counterbalance these negative results by giving some positive approximation results in specific graph classes. 
Upsampling  In digital signal processing, upsampling, expansion, and interpolation are terms associated with the process of resampling in a multirate digital signal processing system. Upsampling can be synonymous with expansion, or it can describe an entire process of expansion and filtering (interpolation). When upsampling is performed on a sequence of samples of a signal or other continuous function, it produces an approximation of the sequence that would have been obtained by sampling the signal at a higher rate (or density, as in the case of a photograph). For example, if compact disc audio at 44,100 samples/second is upsampled by a factor of 5/4, the resulting samplerate is 55,125. 
Uranie  The highperformance computing resources and the constant improvement of both numerical simulation accuracy and the experimental measurements with which they are confronted, bring a new compulsory step to strengthen the credence given to the simulation results: uncertainty quantification. This can have different meanings, according to the requested goals (rank uncertainty sources, reduce them, estimate precisely a critical threshold or an optimal working point) and it could request mathematical methods with greater or lesser complexity. This paper introduces the Uranie platform, an Opensource framework which is currently developed at the Alternative Energies and Atomic Energy Commission (CEA), in the nuclear energy division, in order to deal with uncertainty propagation, surrogate models, optimisation issues, code calibration… This platform benefits from both its dependencies, but also from personal developments, to offer an efficient data handling model, a C++ and Python interpreter, advanced graphical tools, several parallelisation solutions… These methods are very generic and can then be applied to many kinds of code (as Uranie considers them as black boxes) so to many fields of physics as well. In this paper, the example of thermal exchange between a platesheet and a fluid is introduced to show how Uranie can be used to perform a large range of analysis. The code used to produce the figures of this paper can be found in https://…/uranie along with the sources of the platform. 
URFUNNY  Humor is a unique and creative communicative behavior displayed during social interactions. It is produced in a multimodal manner, through the usage of words (text), gestures (vision) and prosodic cues (acoustic). Understanding humor from these three modalities falls within boundaries of multimodal language; a recent research trend in natural language processing that models natural language as it happens in facetoface communication. Although humor detection is an established research area in NLP, in a multimodal context it is an understudied area. This paper presents a diverse multimodal dataset, called URFUNNY, to open the door to understanding multimodal language used in expressing humor. The dataset and accompanying studies, present a framework in multimodal humor detection for the natural language processing community. URFUNNY is publicly available for research. 
USAID  Several recent works discussed applicationdriven image restoration neural networks, which are capable of not only removing noise in images but also preserving their semanticaware details, making them suitable for various highlevel computer vision tasks as the preprocessing step. However, such approaches require extra annotations for their highlevel vision tasks, in order to train the joint pipeline using hybrid losses. The availability of those annotations is yet often limited to a few image sets, potentially restricting the general applicability of these methods to denoising more unseen and unannotated images. Motivated by that, we propose a segmentationaware image denoising model dubbed USAID, based on a novel unsupervised approach with a pixelwise uncertainty loss. USAID does not need any groundtruth segmentation map, and thus can be applied to any image dataset. It generates denoised images with comparable or even better quality, and the denoised results show stronger robustness for subsequent semantic segmentation tasks, when compared to either its supervised counterpart or classical ‘applicationagnostic’ denoisers. Moreover, we demonstrate the superior generalizability of USAID in threefolds, by plugging its ‘universal’ denoiser without finetuning: (1) denoising unseen types of images; (2) denoising as preprocessing for segmenting unseen noisy images; and (3) denoising for unseen highlevel tasks. Extensive experiments demonstrate the effectiveness, robustness and generalizability of the proposed USAID over various popular image sets. 
uSer and Agent Model IntegrAtion (SAMIA) 
Taskoriented dialogue systems can efficiently serve a large number of customers and relieve people from tedious works. However, existing taskoriented dialogue systems depend on handcrafted actions and states or extra semantic labels, which sometimes degrades user experience despite the intensive human intervention. Moreover, current user simulators have limited expressive ability so that deep reinforcement Seq2Seq models have to rely on selfplay and only work in some special cases. To address those problems, we propose a uSer and Agent Model IntegrAtion (SAMIA) framework inspired by an observation that the roles of the user and agent models are asymmetric. Firstly, this SAMIA framework model the user model as a Seq2Seq learning problem instead of ranking or designing rules. Then the built user model is used as a leverage to train the agent model by deep reinforcement learning. In the test phase, the output of the agent model is filtered by the user model to enhance the stability and robustness. Experiments on a realworld coffee ordering dataset verify the effectiveness of the proposed SAMIA framework. 
User Behavior Analytics (UBA) 
User Behavior Analytics (UBA) is rocking this year’s security conferences. Rather than trying to build an ever stronger perimeter, the discussion has changed substantially. Security professionals are investing more resources than ever before into collecting and analyzing vast amounts of userspecific event and access logs which holds the promise of major security benefits including the opportunity to: · Quickly identify anomalous user behaviors. · Investigate a prioritized list of potential threats. · Leverage machine learning techniques to isolate evolving threats. · Minimize reliance on predefined rules or heuristics. · Detect and respond to Insider Threats much faster. The future of UBA is promising, however, with significant interest and hype surrounding the benefits of UBA for both enterprises and large organizations, how can someone begin to incorporate UBA into their existing security infrastructure? ➚ “Behavioral Analytics” 
User Generated Content (UGC) 
Usergenerated content (UGC) refers to a variety of media available in a range of modern communications technologies. UGC is often produced through open collaboration: it is created by goaloriented yet loosely coordinated participants, who interact to create a product or service of economic value, which they make available to contributors and noncontributors alike. User generated content (UGC) is collectively known as data originating from Facebook, LinkedIn, Twitter, Instagram, YouTube, and many other networking sites, the social media shared by users and the associated metadata. 
UserCentric Composable Services  Machine Learning (ML) techniques, such as Neural Network, are widely used in today’s applications. However, there is still a big gap between the current ML systems and users’ requirements. ML systems focus on the improve the performance of models in training, while individual users cares more about response time and expressiveness of the tool. Many existing research and product begin to move computation towards edge devices. Based on the numerical computing system Owl, we propose to build the Zoo system to support construction, compose, and deployment of ML models on edge and local devices. 
UserSensitive Recommendation Ensemble with Clustered MultiTask Learning (UREC) 
This paper considers recommendation algorithm ensembles in a usersensitive manner. Recently researchers have proposed various effective recommendation algorithms, which utilized different aspects of the data and different techniques. However, the ‘user skewed prediction’ problem may exist for almost all recommendation algorithms — algorithms with best average predictive accuracy may cover up that the algorithms may perform poorly for some part of users, which will lead to biased services in real scenarios. In this paper, we propose a usersensitive ensemble method named ‘UREC’ to address this issue. We first cluster users based on the recommendation predictions, then we use multitask learning to learn the usersensitive ensemble function for the users. In addition, to alleviate the negative effects of new user problem to clustering users, we propose an approximate approach based on a spectral relaxation. Experiments on realworld datasets demonstrate the superiority of our methods. 
uTensor  uTensor is an extremely lightweight machine learning inference framework built on Mbed and Tensorflow. It consists of a runtime library and an offline tool. The total size of graph definition and algorithm implementation of a 3layer MLP produced by uTensor is less than 32kB in the resulting binary (excluding the weights). Simple Neural Network on MCUs 
UtilityOriented Pattern Mining (UPM) 
A Survey of UtilityOriented Pattern Mining 