R  R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, timeseries analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R’s strengths is the ease with which welldesigned publicationquality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. 
R Consortium  The R Consortium, Inc. is a group of businesses organized under an open source governance and foundation model to provide support to the R community, the R Foundation and groups and individuals, using, maintaining and distributing R software. The R language is an open source environment for statistical computing and graphics, and runs on a wide variety of computing platforms. The R language has enjoyed significant growth, and now supports over 2 million users. A broad range of industries have adopted the R language, including biotech, finance, research and high technology industries. The R language is often integrated into third party analysis, visualization and reporting applications. The central mission of the R Consortium is to work with and provide support to the R Foundation and to the key organizations developing, maintaining, distributing and using R software through the identification, development and implementation of infrastructure projects. From a governance perspective, the business of the consortium is managed by a Board of Directors. The technical aspects of the project, including the development and implementation of infrastructure projects, is overseen by an Infrastructure Steering Committee. While the initial members of the Infrastructure Steering Committee consist of representatives of the founding members of the R Consortium, Inc., project leads of key infrastructure projects will become voting members of the Infrastructure Steering Committee. Potential infrastructure projects include: · strengthening the R Forge infrastructure; · assisting the Stanford University group running user!R 2016; · developing documentation; and · encouraging increased communication and collaboration among users and developers of the R language. 
R Service Bus (RSB) 
Having the right algorithm is a first big step to get advanced analytics solve your problem and inform your decisions. The next one is to have the algorithm work for you and integrate it in your workflows and business processes. The R Service Bus is a swiss army knife that allows you to plug R into your processes independently of the technology used by other software applications involved in the workflow. The prime objective of the R Service Bus is to smoothly integrate into your existing infrastructure and it therefore supports communication using a plethora of protocols such as · SOAP and RESTful web services · various email protocols · folder monitoring, (s)ftp · messaging protocols such a JMS or STOMP · … The R Service Bus is based on mature open source projects and was developed to maximize reliability, flexibility, high availability and scalability of Rbased analytics applications. It is in use at major pharmaceutical and financial institutions to power businesscritical modeling activities. The R Service Bus is open source and freely available from our downloads page. The R Service Bus has also been packaged for all current versions of Debian/Ubuntu and is available from our repository. 
R.NET  R.NET enables the .NET Framework to interoperate with the R statistical language in the same process. R.NET requires .NET Framework 4 and the native R DLLs installed with the R environment. R.NET works on Windows, Linux and MacOS. Enjoy statistics and programming in your special language with R. 
R2CNN++  Object detection plays a vital role in natural scene and aerial scene and is full of challenges. Although many advanced algorithms have succeeded in the natural scene, the progress in the aerial scene has been slow due to the complexity of the aerial image and the large degree of freedom of remote sensing objects in scale, orientation, and density. In this paper, a novel multicategory rotation detector is proposed, which can efficiently detect small objects, arbitrary direction objects, and dense objects in complex remote sensing images. Specifically, the proposed model adopts a targeted feature fusion strategy called inception fusion network, which fully considers factors such as feature fusion, anchor sampling, and receptive field to improve the ability to handle small objects. Then we combine the pixel attention network and the channel attention network to weaken the noise information and highlight the objects feature. Finally, the rotational object detection algorithm is realized by redefining the rotating bounding box. Experiments on public datasets including DOTA, NWPU VHR10 demonstrate that the proposed algorithm significantly outperforms stateoftheart methods. The code and models will be available at https://…/R2CNNPlusPlus_Tensorflow. 
r2d3  The r2d3 package provides a suite of tools for using D3 visualizations with R, including: • Translating R objects into D3 friendly data structures • Rendering D3 scripts within the RStudio Viewer and R Notebooks • Publishing D3 visualizations to the web • Incorporating D3 scripts into R Markdown reports, presentations, and dashboards • Creating interactive D3 applications with Shiny • Distributing D3 based htmlwidgets in R packages 
Rabix  An opensource toolkit for developing and running portable workflows based on the Common Workflow Language specification and Docker. liftr 
Race Track Concordance Charts  One way to help keep track of things from the perspective of a particular driver, rather than the race leader, is to rebase the origin of the xaxis relative to the that driver. 
RadegastXDB  A lot of advances in the processing of XML data have been proposed in the previous decade. There were many approaches focused on the efficient processing of twig pattern queries (TPQ). However, including the TPQ into an XQuery compiler is not a straightforward problem and current XML DBMSs process XQueries without any TPQ detection. In this paper, we demonstrate our prototype of a native XML DBMS called RadegastXDB that uses a TPQ detection to accelerate structural XQueries. Such a detection allows us to utilize stateoftheart TPQ processing algorithms. Our experiments show that, for the structural queries, these algorithms and stateoftheart XML indexing techniques make our prototype faster than all of the current XML DBMSs, especially for large data collections. We also show that using the same techniques is also efficient for the processing of queries with value predicates. 
Radial Basis Function (RBF) 
A radial basis function (RBF) is a realvalued function whose value depends only on the distance from the origin, so that Phi(x) = Phi(x); or alternatively on the distance from some other point c, called a center. Any function Phi that satisfies this property is a radial function. The norm is usually Euclidean distance, although other distance functions are also possible. For example, using LukaszykKarmowski metric, it is possible for some radial functions to avoid problems with ill conditioning of the matrix solved to determine coefficients wi, since the x is always greater than zero. Sums of radial basis functions are typically used to approximate given functions. This approximation process can also be interpreted as a simple kind of neural network; this was the context in which they were originally invented, by David Broomhead and David Lowe in 1988. RBFs are also used as a kernel in support vector classification. 
Radial Basis Function Kernel (RBF) 
In machine learning, the (Gaussian) radial basis function kernel, or RBF kernel, is a popular kernel function used in support vector machine classification. 
Radial Basis Function Networks (RBF) 
In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the Royal Signals and Radar Establishment. 
RadialGAN  Training complex machine learning models for prediction often requires a large amount of data that is not always readily available. Leveraging these external datasets from related but different sources is therefore an important task if good predictive models are to be built for deployment in settings where data can be rare. In this paper we propose a novel approach to the problem in which we use multiple GAN architectures to learn to translate from one dataset to another, thereby allowing us to effectively enlarge the target dataset, and therefore learn better predictive models than if we simply used the target dataset. We show the utility of such an approach, demonstrating that our method improves the prediction performance on the target domain over using just the target dataset and also show that our framework outperforms several other benchmarks on a collection of realworld medical datasets. 
RadiXNet  The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity of hardware to store and train them. Research over the past few decades has explored the prospect of sparsifying DNNs before, during, and after training by pruning edges from the underlying topology. The resulting neural network is known as a sparse neural network. More recent work has demonstrated the remarkable result that certain sparse DNNs can train to the same precision as dense DNNs at lower runtime and storage cost. An intriguing class of these sparse DNNs is the XNets, which are initialized and trained upon a sparse topology with neither reference to a parent dense DNN nor subsequent pruning. We present an algorithm that deterministically generates RadiXNets: sparse DNN topologies that, as a whole, are much more diverse than XNet topologies, while preserving XNets’ desired characteristics. We further present a functionalanalytic conjecture based on the longstanding observation that sparse neural network topologies can attain the same expressive power as dense counterparts 
RadViz3D  This paper develops methodology for 3D radial visualization of highdimensional datasets. Our display engine is called RadViz3D and extends the classic RadViz that visualizes multivariate data in the 2D plane by mapping every record to a point inside the unit circle. The classic RadViz display has equallyspaced anchor points on the unit circle, with each of them associated with an attribute or feature of the dataset. RadViz3D obtains equispaced anchor points exactly for the five Platonic solids and approximately for the other cases via a Fibonacci grid. We show that distributing anchor points at least approximately uniformly on the 3D unit sphere provides a better visualization than in 2D. We also propose a MaxRatio Projection (MRP) method that utilizes the group information in high dimensions to provide distinctive lowerdimensional projections that are then displayed using Radviz3D. Our methodology is extended to datasets with discrete and mixed features where a generalized distributional transform is used in conjuction with copula models before applying MRP and RadViz3D visualization. 
Rafiki  Big data analytics is gaining massive momentum in the last few years. Applying machine learning models to big data has become an implicit requirement or an expectation for most analysis tasks, especially on highstakes applications.Typical applications include sentiment analysis against reviews for analyzing online products, image classification in food logging applications for monitoring user’s daily intake and stock movement prediction. Extending traditional database systems to support the above analysis is intriguing but challenging. First, it is almost impossible to implement all machine learning models in the database engines. Second, expertise knowledge is required to optimize the training and inference procedures in terms of efficiency and effectiveness, which imposes heavy burden on the system users. In this paper, we develop and present a system, called Rafiki, to provide the training and inference service of machine learning models, and facilitate complex analytics on top of cloud platforms. Rafiki provides distributed hyperparameter tuning for the training service, and online ensemble modeling for the inference service which trades off between latency and accuracy. Experimental results confirm the efficiency, effectiveness, scalability and usability of Rafiki. 
Rainforest Plots  Research has shown that forest plots are a gold standard in the visualization of metaanalytic results. However, research on the general interpretation of forest plots and the role of researchers’ metaanalysis experience and field of study is still unavailable. Additionally, the traditional display of effect sizes, confidence intervals, and weights have repeatedly been criticized. The current work presents an online statistical cognition experiment in which a total of 279 researchers with experience in metaanalysis from 36 countries evaluated conventional forest plots and two novel versions of forest plots, namely, thick forest plots and rainforest plots. The results indicate certain biases in the interpretation of forest plots, especially with regard to heterogeneity, the distribution of weights, and the theoretical concept of confidence intervals. Although the two novel displays (thick forest plots and rainforest plots) are associated with slightly longer viewing times, they are at least as wellsuited and esthetically and perceptively pleasing as the conventional displays while facilitating the correct and exhaustive interpretation of the metaanalytic information. Furthermore, it is advisable to combine conventional forest plots with distribution information of the individual effects, make confidence lines more visually striking, and to display a background grid in the graph. metaviz 
RamerDouglasPeucker Algorithm (RDP) 
The RamerDouglasPeucker algorithm (RDP) is an algorithm for reducing the number of points in a curve that is approximated by a series of points. The initial form of the algorithm was independently suggested in 1972 by Urs Ramer and 1973 by David Douglas and Thomas Peucker and several others in the following decade. This algorithm is also known under the names DouglasPeucker algorithm, iterative endpoint fit algorithm and splitandmerge algorithm.The purpose of the algorithm is, given a curve composed of line segments, to find a similar curve with fewer points. The algorithm defines ‘dissimilar’ based on the maximum distance between the original curve and the simplified curve. The simplified curve consists of a subset of the points that defined the original curve. http://…/rdp 
RAMODO  Learning expressive lowdimensional representations of ultrahighdimensional data, e.g., data with thousands/millions of features, has been a major way to enable learning methods to address the curse of dimensionality. However, existing unsupervised representation learning methods mainly focus on preserving the data regularity information and learning the representations independently of subsequent outlier detection methods, which can result in suboptimal and unstable performance of detecting irregularities (i.e., outliers). This paper introduces a ranking modelbased framework, called RAMODO, to address this issue. RAMODO unifies representation learning and outlier detection to learn lowdimensional representations that are tailored for a stateoftheart outlier detection approach – the random distancebased approach. This customized learning yields more optimal and stable representations for the targeted outlier detectors. Additionally, RAMODO can leverage little labeled data as prior knowledge to learn more expressive and applicationrelevant representations. We instantiate RAMODO to an efficient method called REPEN to demonstrate the performance of RAMODO. Extensive empirical results on eight realworld ultrahigh dimensional data sets show that REPEN (i) enables a random distancebased detector to obtain significantly better AUC performance and two orders of magnitude speedup; (ii) performs substantially better and more stably than four stateoftheart representation learning methods; and (iii) leverages less than 1% labeled data to achieve up to 32% AUC improvement. 
RampBased Twin Support Vector Clustering (RampTWSVC) 
Traditional planebased clustering methods measure the cost of withincluster and betweencluster by quadratic, linear or some other unbounded functions, which may amplify the impact of cost. This letter introduces a ramp cost function into the planebased clustering to propose a new clustering method, called rampbased twin support vector clustering (RampTWSVC). RampTWSVC is more robust because of its boundness, and thus it is more easier to find the intrinsic clusters than other planebased clustering methods. The nonconvex programming problem in RampTWSVC is solved efficiently through an alternating iteration algorithm, and its local solution can be obtained in a finite number of iterations theoretically. In addition, the nonlinear manifoldbased formation of RampTWSVC is also proposed by kernel trick. Experimental results on several benchmark datasets show the better performance of our RampTWSVC compared with other planebased clustering methods. 
Rand Index  The Rand index or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. From a mathematical standpoint, Rand index is related to the accuracy, but is applicable even when class labels are not used. mri 
RandInterleaving  Ranking functions return ranked lists of items, and users often interact with these items. How to evaluate ranking functions using historical interaction logs, also known as offpolicy evaluation, is an important but challenging problem. The commonly used Inverse Propensity Scores (IPS) approaches work better for the single item case, but suffer from extremely low data efficiency for the ranked list case. In this paper, we study how to improve the data efficiency of IPS approaches in the offline comparison setting. We propose two approaches Truncmatch and Randinterleaving for offline comparison using uniformly randomized data. We show that these methods can improve the data efficiency and also the comparison sensitivity based on one of the largest email search engines. 
Random Assignment  Random assignment or random placement is an experimental technique for assigning subjects to different treatments (or no treatment). The thinking behind random assignment is that by randomizing treatment assignment, then the group attributes for the different treatments will be roughly equivalent and therefore any effect observed between treatment groups can be linked to the treatment effect and is not a characteristic of the individuals in the group. In experimental design, random assignment of participants in experiments or treatment and control groups help to ensure that any differences between and within the groups are not systematic at the outset of the experiment. Random assignment does not guarantee that the groups are “matched” or equivalent, only that any differences are due to chance. Random assignment facilitates comparison in experiments by creating similar groups. Example compares “Apple to Apple” and “Orange to Orange”. Random assignment Step 1: Begin with a collection of subjects. Example 20 people. Step 2: Devise a method to randomize that is purely mechanical ( e.g. flip a coin) Step 3: Assign subjects with “Heads” to one group : Control Group. Assign subjects with “Tails” to the other group: Experimental Group 
Random Average Shifted Histogram (RASH) 
A new density estimator called RASH, for Random Average Shifted Histogram, obtained by averaging several histograms as proposed in Average Shifted Histograms, is presented. The principal difference between the two methods is that in RASH each histogram is built over a grid with random shifted breakpoints. The asymptotic behavior of this estimator is established and its performance through several simulations is analyzed. RASH is compared to several classic density estimators and to some recent ensemble methods. Although RASH does not always outperform the other methods, it is very simple to implement, being also more intuitive. 
Random Boost (RB) 
Inspired by theoretical readings on randomization techniques in boosting, I developed a new algorithm, that I called Random Boost (RB). In its essence, Random Boost sequentially grows regression trees with random depth. More precisely, the algorithm is almost identical to and has the exact same input arguments as MART. The only difference is the parameter d_{max}. In MART, d_{max} determines the maximum depth of all trees in the ensemble. In Random Boost, the argument constitutes the upper bound of possible tree sizes. In each boosting iteration i, a random number d_i between 1 and d_{max} is drawn, which then defines the maximum depth of that tree T_i(d_i). 
Random Conditional Distribution  The need to condition distributional properties such as expectation, variance, and entropy arises in algorithmic fairness, model simplification, robustness and many other areas. At face value however, distributional properties are not random variables, and hence conditioning them is a semantic error and type error in probabilistic programming languages. On the other hand, distributional properties are contingent on other variables in the model, change in value when we observe more information, and hence in a precise sense are random variables too. In order to capture the uncertain over distributional properties, we introduce a probability construct — the random conditional distribution — and incorporate it into a probabilistic programming language Omega. A random conditional distribution is a higherorder random variable whose realizations are themselves conditional random variables. In Omega we extend distributional properties of random variables to random conditional distributions, such that for example while the expectation a real valued random variable is a real value, the expectation of a random conditional distribution is a distribution over expectations. As a consequence, it requires minimal syntax to encode inference problems over distributional properties, which so far have evaded treatment within probabilistic programming systems and probabilistic modeling in general. We demonstrate our approach case studies in algorithmic fairness and robustness. 
Random Connectivity LSTM  Time series prediction can be generalized as a process that extracts useful information from historical records and then determines future values. Learning longrange dependencies that are embedded in time series is often an obstacle for most algorithms, whereas Long ShortTerm Memory (LSTM) solutions, as a specific kind of scheme in deep learning, promise to effectively overcome the problem. In this article, we first give a brief introduction to the structure and forward propagation mechanism of the LSTM model. Then, aiming at reducing the considerable computing cost of LSTM, we put forward the Random Connectivity LSTM (RCLSTM) model and test it by predicting traffic and user mobility in telecommunication networks. Compared to LSTM, RCLSTM is formed via stochastic connectivity between neurons, which achieves a significant breakthrough in the architecture formation of neural networks. In this way, the RCLSTM model exhibits a certain level of sparsity, which leads to an appealing decrease in the computational complexity and makes the RCLSTM model become more applicable in latencystringent application scenarios. In the field of telecommunication networks, the prediction of traffic series and mobility traces could directly benefit from this improvement as we further demonstrate that the prediction accuracy of RCLSTM is comparable to that of the conventional LSTM no matter how we change the number of training samples or the length of input sequences. 
Random Cut Forest  In this paper we focus on the anomaly detection problem for dynamic data streams through the lens of random cut forests. We investigate a robust random cut data structure that can be used as a sketch or synopsis of the input stream. We provide a plausible definition of nonparametric anomalies based on the influence of an unseen point on the remainder of the data, i.e., the externality imposed by that point. We show how the sketch can be efficiently updated in a dynamic data stream. We demonstrate the viability of the algorithm on publicly available real data. 
Random Decision Forests (RDF) 

Random Dot Product Graph (RDPG) 

Random Effects Model  In statistics, a random effect(s) model, also called a variance components model, is a kind of hierarchical linear model. It assumes that the dataset being analysed consists of a hierarchy of different populations whose differences relate to that hierarchy. In econometrics, random effects models are used in the analysis of hierarchical or panel data when one assumes no fixed effects (it allows for individual effects). The random effects model is a special case of the fixed effects model. Contrast this to the biostatistics definitions, as biostatisticians use ‘fixed’ and ‘random’ effects to respectively refer to the populationaverage and subjectspecific effects (and where the latter are generally assumed to be unknown, latent variables). 
Random Energy Model  In statistical physics of disordered systems, the random energy model is a toy model of a system with quenched disorder. It concerns the statistics of a system of N particles, such that the number of possible states for the systems grow as {\displaystyle 2^N, while the energy of such states is a Gaussian stochastic variable. The model has an exact solution. Its simplicity makes this model suitable for pedagogical introduction of concepts like quenched disorder and replica symmetry. Random Energy Models, Optimal Learning Machines and Beyond 
Random Erasing  In this paper, we introduce Random Erasing, a simple yet effective data augmentation techniques for training the convolutional neural network (CNN). In training phase, Random Erasing randomly selects a rectangle region in an image, and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduce the risk of network overfitting and make the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated into most of the CNNbased recognition models. Albeit simple, Random Erasing yields consistent improvement in image classification, object detection and person reidentification (reID). For image classification, our method improves WRN2810: top1 error rate from 3.72% to 3.08% on CIFAR10, and from 18.68% to 17.65% on CIFAR100. For object detection on PASCAL VOC 2007, Random Erasing improves FastRCNN from 74.8% to 76.2% in mAP. For person reID, when using Random Erasing in recent deep models, we achieve the stateoftheart accuracy: the rank1 accuracy is 89.13% for Market1501, 84.02% for DukeMTMCreID, and 63.93% for CUHK03 under the new evaluation protocol. 
Random Ferns Method / Classifier  Random ferns is a machine learning algorithm proposed by Ozuysal, Fua, and Lepetit (2007) for matching the same elements between two images of the same scene, allowing one to recognize certain objects or trace them on videos. The original motivation behind this method was to create a simple and e cient algorithm by extending the naive Bayes classifier; still the authors acknowledged its strong connection to decision tree ensembles like the random forest algorithm (Breiman 2001). Since introduction, random ferns have been applied in numerous computer vision applications, like image recognition (Bosch, Zisserman, and Munoz 2007), action recognition (Oshin, Gilbert, Illingworth, and Bowden 2009) or augmented reality (Wagner, Reitmayr, Mulloni, Drummond, and Schmalstieg 2010). However, it has not gathered attention outside this eld; thus, this work aims to bring this algorithm to a much wider spectrum of applications. In order to do that, I propose a generalized version of the algorithm, implemented in the R (R Core Team 2014) package rFerns (Kursa 2014) which is available from the Comprehensive R Archive Network (CRAN) at http://…/package=rFerns. rFerns 
Random Fields  A random field is a generalization of a stochastic process such that the underlying parameter need no longer be a simple real or integer valued “time”, but can instead take values that are multidimensional vectors, or points on some manifold. At its most basic, discrete case, a random field is a list of random numbers whose indices are mapped onto a space (of n dimensions). When used in the natural sciences, values in a random field are often spatially correlated in one way or another. In its most basic form this might mean that adjacent values (i.e. values with adjacent indices) do not differ as much as values that are further apart. This is an example of a covariance structure, many different types of which may be modeled in a random field. More generally, the values might be defined over a continuous domain, and the random field might be thought of as a “function valued” random variable. 
Random Forest  Random forests are an ensemble learning method for classification (and regression) that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes output by individual trees. The algorithm for inducing a random forest was developed by Leo Breiman and Adele Cutler, and “Random Forests” is their trademark. The term came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in 1995. The method combines Breiman’s “bagging” idea and the random selection of features, introduced independently by Ho and Amit and Geman in order to construct a collection of decision trees with controlled variance. The selection of a random subset of features is an example of the random subspace method, which, in Ho’s formulation, is a way to implement classification proposed by Eugene Kleinberg. ranger 
Random Geometric Graph (RGG) 
We propose an interdependent random geometric graph (RGG) model for interdependent networks. Based on this model, we study the robustness of two interdependent spatially embedded networks where interdependence exists between geographically nearby nodes in the two networks. We study the emergence of the giant mutual component in two interdependent RGGs as node densities increase, and define the percolation threshold as a pair of node densities above which the giant mutual component first appears. In contrast to the case for a single RGG, where the percolation threshold is a unique scalar for a given connection distance, for two interdependent RGGs, multiple pairs of percolation thresholds may exist, given that a smaller node density in one RGG may increase the minimum node density in the other RGG in order for a giant mutual component to exist. We derive analytical upper bounds on the percolation thresholds of two interdependent RGGs by discretization, and obtain $99\%$ confidence intervals for the percolation thresholds by simulation. Based on these results, we derive conditions for the interdependent RGGs to be robust under random failures and geographical attacks. 
Random Image Cropping and Patching (RICAP) 
Deep convolutional neural networks (CNNs) have achieved remarkable results in image processing tasks. However, their high expression ability risks overfitting. Consequently, data augmentation techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters are rendering traditional data augmentation techniques insufficient. In this study, we propose a new data augmentation technique called random image cropping and patching (RICAP) which randomly crops four images and patches them to create a new training image. Moreover, RICAP mixes the class labels of the four images, resulting in an advantage similar to label smoothing. We evaluated RICAP with current stateoftheart CNNs (e.g., the shakeshake regularization model) by comparison with competitive data augmentation techniques such as cutout and mixup. RICAP achieves a new stateoftheart test error of $2.19\%$ on CIFAR10. We also confirmed that deep CNNs with RICAP achieve better results on classification tasks using CIFAR100 and ImageNet and an imagecaption retrieval task using Microsoft COCO. 
Random KNN (RKNN) 
Random KNN consists of an ensemble of base knearest neighbor models, each constructed from a random subset of the input variables. Random KNN can be used to select important features using the RKNNFS algorithm. RKNNFS is an innovative feature selection procedure for ‘small n, large p problems.’ Random KNN (no bootstrapping) is fast and stable compared with Random Forests. The rknn R package implements Random KNN classification, regression and variable selection algorithms. · KNN is stable, no hierarchical structure · Final model can be a single KNN (vs. many trees) · Local method: robust for complex data structure · Automatically retrain, incremental learning · Easy to implement rknn 
Random KNN Feature Selection (RKNNFS) 
We present RKNNFS, an innovative feature selection procedure for ‘small n, large p problems.’ RKNNFS is based on Random KNN (RKNN), a novel generalization of traditional nearestneighbor modeling. RKNN consists of an ensemble of base knearest neighbor models, each constructed from a random subset of the input variables. To rank the importance of the variables, we define a criterion on the RKNN framework, using the notion of support. A twostage backward model selection method is then developed based on this criterion. Empirical results on microarray data sets with thousands of variables and relatively few samples show that RKNNFS is an effective feature selection approach for highdimensional data. RKNN is similar to Random Forests in terms of classification accuracy without feature selection. However, RKNN provides much better classification accuracy than RF when each method incorporates a featureselection step. Our results show that RKNN is significantly more stable and more robust than Random Forests for feature selection when the input data are noisy and/or unbalanced. Further, RKNNFS is much faster than the Random Forests feature selection method (RFFS), especially for large scale problems, involving thousands of variables and multiple classes. rknn 
Random Labeled Point Process (RLPP) 
Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is to first impute the missing values, and then apply the clustering algorithm on the completed data. We consider missing values in the context of optimal clustering, which finds an optimal clustering operator with reference to an underlying random labeled point process (RLPP). We show how the missingvalue problem fits neatly into the overall framework of optimal clustering by incorporating the missing value mechanism into the random labeled point process and then marginalizing out the missingvalue process. In particular, we demonstrate the proposed framework for the Gaussian model with arbitrary covariance structures. Comprehensive experimental studies on both synthetic and realworld RNAseq data show the superior performance of the proposed optimal clustering with missing values when compared to various clustering approaches. Optimal clustering with missing values obviates the need for imputationbased preprocessing of the data, while at the same time possessing smaller clustering errors. 
Random Linear Feedback  Finite Time Adaptive Stabilization of LQ Systems 
Random Multimodel Deep Learning (RMDL) 
The exponential growth in the number of complex datasets every year requires more enhancement in machine learning methods to provide robust and accurate data classification. Lately, deep learning approaches have achieved surpassing results in comparison to previous machine learning algorithms. However, finding the suitable structure for these models has been a challenge for researchers. This paper introduces Random Multimodel Deep Learning (RMDL): a new ensemble, deep learning approach for classification. RMDL solves the problem of finding the best deep learning structure and architecture while simultaneously improving robustness and accuracy through ensembles of deep learning architectures. In short, RMDL trains multiple randomly generated models of Deep Neural Network (DNN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) in parallel and combines their results to produce better result of any of those models individually. In this paper, we describe RMDL model and compare the results for image and text classification as well as face recognition. We used MNIST and CIFAR10 datasets as ground truth datasets for image classification and WOS, Reuters, IMDB, and 20newsgroup datasets for text classification. Lastly, we used ORL dataset to compare the model performance on face recognition task. 
Random Network Distillation (RND) 
We’ve developed Random Network Distillation (RND), a predictionbased method for encouraging reinforcement learning agents to explore their environments through curiosity, which for the first time1 exceeds average human performance on Montezuma’s Revenge. RND achieves stateoftheart performance, periodically finds all 24 rooms and solves the first level without using demonstrations or having access to the underlying state of the game. RND incentivizes visiting unfamiliar states by measuring how hard it is to predict the output of a fixed random neural network on visited states. In unfamiliar states it’s hard to guess the output, and hence the reward is high. It can be applied to any reinforcement learning algorithm, is simple to implement and efficient to scale. Below we release a reference implementation of RND that can reproduce the results from our paper. 
Random Projection  Random Projection is a foundational research topic that connects a bunch of machine learning algorithms under a similar mathematical basis. It is used to reduce the dimensionality of the dataset by projecting the data points efficiently to a smaller dimensions while preserving the original relative distance between the data points. In this paper, we are intended to explain random projection method, by explaining its mathematical background and foundation, the applications that are currently adopting it, and an overview on its current research perspective. 
Random Projection Ensemble Classification  The random projection ensemble classifier is a very general method for classification of highdimensional data, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lowerdimensional space. The random projections are divided into nonoverlapping blocks, and within each block the projection yielding the smallest estimate of the test error is selected. The random projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a datadriven voting threshold to determine the final assignment. http://…/randproj.pdf RPEnsemble 
Random Projection Forest (rpForest) 
Knearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of treebased methodology, we propose random projection forests (rpForests), for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable. 
Random Regression Model (RRM) 
Random regressions are types of hierarchical models in which data are structured in groups and (regression) coefficients can vary by groups. MultiRR 
Random Sample Consensus (RANSAC) 
Random sample consensus (RANSAC) is a successful algorithm in model fitting applications. It is vital to have strong exploration phase when there are an enormous amount of outliers within the dataset. Achieving a proper model is guaranteed by pure exploration strategy of RANSAC. However, finding the optimum result requires exploitation. GASAC is an evolutionary paradigm to add exploitation capability to the algorithm. Although GASAC improves the results of RANSAC, it has a fixed strategy for balancing between exploration and exploitation. In this paper, a new paradigm is proposed based on genetic algorithm with an adaptive strategy. We utilize an adaptive genetic operator to select high fitness individuals as parents and mutate low fitness ones. In the mutation phase, a training method is used to gradually learn which gene is the best replacement for the mutated gene. The proposed method adaptively balance between exploration and exploitation by learning about genes. During the final Iterations, the algorithm draws on this information to improve the final results. The proposed method is extensively evaluated on two set of experiments. In all tests, our method outperformed the other methods in terms of both the number of inliers found and the speed of the algorithm. 
Random SelfEnsemble (RSE) 
Recent studies have revealed the vulnerability of deep neural networks – A small adversarial perturbation that is imperceptible to human can easily make a welltrained deep neural network misclassify. This makes it unsafe to apply neural networks in securitycritical applications. In this paper, we propose a new defensive algorithm called Random SelfEnsemble (RSE) by combining two important concepts: ${\bf randomness}$ and ${\bf ensemble}$. To protect a targeted model, RSE adds random noise layers to the neural network to prevent from stateoftheart gradientbased attacks, and ensembles the prediction over random noises to stabilize the performance. We show that our algorithm is equivalent to ensemble an infinite number of noisy models $f_\epsilon$ without any additional memory overhead, and the proposed training procedure based on noisy stochastic gradient descent can ensure the ensemble model has good predictive capability. Our algorithm significantly outperforms previous defense techniques on real datasets. For instance, on CIFAR10 with VGG network (which has $92\%$ accuracy without any attack), under the stateoftheart C&W attack within a certain distortion tolerance, the accuracy of unprotected model drops to less than $10\%$, the best previous defense technique has $48\%$ accuracy, while our method still has $86\%$ prediction accuracy under the same level of attack. Finally, our method is simple and easy to integrate into any neural network. 
Random SelfReducibility  Random selfreducibility (RSR) is the rule that a good algorithm for the average case implies a good algorithm for the worst case. RSR is the ability to solve all instances of a problem by solving a large fraction of the instances. 
Random Subsampling  Random subsampling, which is also known as Monte Carlo crossvalidation, as multiple holdout or as repeated evaluation set, is based on randomly splitting the data into subsets, whereby the size of the subsets is defined by the user. The random partitioning of the data can be repeated arbitrarily often. In contrast to a full crossvalidation procedure, random subsampling has been shown to be asymptotically consistent resulting in more pessimistic predictions of the test data compared with crossvalidation. The predictions of the test data give a realistic estimation of the predictions of external validation data . 
Random Swap  We formulate probabilistic clustering method based on a sequence of random swaps of cluster centroids. We show that the algorithm has linear dependency on the number of data vectors, quadratic on the number of clusters, and inverse dependency on the dimensionality. Each halving of the probability of failure (e.g. from 1% to 0.5%) is achieved at the cost of only linear increase in the processing time. Efficiency of random swap clustering 
Random Utility Model (RUM) 

Random Variable  In probability and statistics, a random variable, aleatory variable or stochastic variable is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense). A random variable can take on a set of possible different values (similarly to other mathematical variables), each with an associated probability, in contrast to other mathematical variables. A random variable’s possible values might represent the possible outcomes of a yettobeperformed experiment, or the possible outcomes of a past experiment whose alreadyexisting value is uncertain (for example, due to imprecise measurements or quantum uncertainty). They may also conceptually represent either the results of an ‘objectively’ random process (such as rolling a die) or the ‘subjective’ randomness that results from incomplete knowledge of a quantity. The meaning of the probabilities assigned to the potential values of a random variable is not part of probability theory itself but is instead related to philosophical arguments over the interpretation of probability. The mathematics works the same regardless of the particular interpretation in use. The mathematical function describing the possible values of a random variable and their associated probabilities is known as a probability distribution. Random variables can be discrete, that is, taking any of a specified finite or countable list of values, endowed with a probability mass function, characteristic of a probability distribution; or continuous, taking any numerical value in an interval or collection of intervals, via a probability density function that is characteristic of a probability distribution; or a mixture of both types. The realizations of a random variable, that is, the results of randomly choosing values according to the variable’s probability distribution function, are called random variates. The formal mathematical treatment of random variables is a topic in probability theory. In that context, a random variable is understood as a function defined on a sample space whose outputs are numerical values. 
Random Vector Functional Link Network (RVFL+) 
In school, a teacher plays an important role in various classroom teaching patterns. Likewise to this human learning activity, the learning using privileged information (LUPI) paradigm provides additional information generated by the teacher to ‘teach’ learning algorithms during the training stage. Therefore, this novel learning paradigm is a typical TeacherStudent Interaction mechanism. This paper is the first to present a random vector functional link network based on the LUPI paradigm, called RVFL+. Rather than simply combining two existing approaches, the newlyderived RVFL+ fills the gap between neural networks and the LUPI paradigm, which offers an alternative way to train RVFL networks. Moreover, the proposed RVFL+ can perform in conjunction with the kernel trick for highly complicated nonlinear feature learning, which is termed KRVFL+. Furthermore, the statistical property of the proposed RVFL+ is investigated, and we derive a sharp and highquality generalization error bound based on the Rademacher complexity. Competitive experimental results on 14 realworld datasets illustrate the great effectiveness and efficiency of the novel RVFL+ and KRVFL+, which can achieve better generalization performance than stateoftheart algorithms. 
Random Walk Covariance Model  rwc 
Random Warping Series (RWS) 
Time series data analytics has been a problem of substantial interests for decades, and Dynamic Time Warping (DTW) has been the most widely adopted technique to measure dissimilarity between time series. A number of globalalignment kernels have since been proposed in the spirit of DTW to extend its use to kernelbased estimation method such as support vector machine. However, those kernels suffer from diagonal dominance of the Gram matrix and a quadratic complexity w.r.t. the sample size. In this work, we study a family of alignmentaware positive definite (p.d.) kernels, with its feature embedding given by a distribution of \emph{Random Warping Series (RWS)}. The proposed kernel does not suffer from the issue of diagonal dominance while naturally enjoys a \emph{Random Features} (RF) approximation, which reduces the computational complexity of existing DTWbased techniques from quadratic to linear in terms of both the number and the length of timeseries. We also study the convergence of the RF approximation for the domain of time series of unbounded length. Our extensive experiments on 16 benchmark datasets demonstrate that RWS outperforms or matches stateoftheart classification and clustering methods in both accuracy and computational time. Our code and data is available at { \url{https://…/RandomWarpingSeries}}. 
Random Weighting  This paper provides an entire inference procedure for the autoregressive model under (conditional) heteroscedasticity of unknown form with a finite variance. We first establish the asymptotic normality of the weighted least absolute deviations estimator (LADE) for the model. Second, we develop the random weighting (RW) method to estimate its asymptotic covariance matrix, leading to the implementation of the Wald test. Third, we construct a portmanteau test for model checking, and use the RW method to obtain its critical values. As a special weighted LADE, the feasible adaptive LADE (ALADE) is proposed and proved to have the same efficiency as its infeasible counterpart. The importance of our entire methodology based on the feasible ALADE is illustrated by simulation results and the real data analysis on three U.S. economic data sets. 
Randomised Bayesian LeastSquares Policy Iteration (RBLSPI) 
We introduce Bayesian leastsquares policy iteration (BLSPI), an offpolicy, modelfree, policy iteration algorithm that uses the Bayesian leastsquares temporaldifference (BLSTD) learning algorithm to evaluate policies. An online variant of BLSPI has been also proposed, called randomised Bayesian leastsquares policy iteration (RBLSPI), that improves its policy based on an incomplete policy evaluation step. In online setting, the explorationexploitation dilemma should be addressed as we try to discover the optimal policy by using samples collected by ourselves. RBLSPI exploits the advantage of BLSTD to quantify our uncertainty about the value function. Inspired by Thompson sampling, RBLSPI first samples a value function from a posterior distribution over value functions, and then selects actions based on the sampled value function. The effectiveness and the exploration abilities of RBLSPI are demonstrated experimentally in several environments. 
Randomized Adversarial Training (RAT) 
Since the discovery of adversarial examples in machine learning, researchers have designed several techniques to train neural networks that are robust against different types of attacks (most notably $\ell_\infty$ and $\ell_2$ based attacks). However, it has been observed that the defense mechanisms designed to protect against one type of attack often offer poor performance against the other. In this paper, we introduce Randomized Adversarial Training (RAT), a technique that is efficient both against $\ell_2$ and $\ell_\infty$ attacks. To obtain this result, we build upon adversarial training, a technique that is efficient against $\ell_\infty$ attacks, and demonstrate that adding random noise at training and inference time further improves performance against \ltwo attacks. We then show that RAT is as efficient as adversarial training against $\ell_\infty$ attacks while being robust against strong $\ell_2$ attacks. Our final comparative experiments demonstrate that RAT outperforms all stateoftheart approaches against $\ell_2$ and $\ell_\infty$ attacks. 
Randomized Block Cubic Newton (RBCN) 
We study the problem of minimizing the sum of three convex functions: a differentiable, twicedifferentiable and a nonsmooth term in a high dimensional setting. To this effect we propose and analyze a randomized block cubic Newton (RBCN) method, which in each iteration builds a model of the objective function formed as the sum of the natural models of its three components: a linear model with a quadratic regularizer for the differentiable term, a quadratic model with a cubic regularizer for the twice differentiable term, and perfect (proximal) model for the nonsmooth term. Our method in each iteration minimizes the model over a random subset of blocks of the search variable. RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases. We establish ${\cal O}(1/\epsilon)$, ${\cal O}(1/\sqrt{\epsilon})$ and ${\cal O}(\log (1/\epsilon))$ rates under different assumptions on the component functions. Lastly, we show numerically that our method outperforms the stateoftheart on a variety of machine learning problems, including cubically regularized leastsquares, logistic regression with constraints, and Poisson regression. 
Randomized Canonical Correlation  Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples – the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernelbased nonrandom versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernelbased measures. 
Randomized Generalized Variance  Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples – the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernelbased nonrandom versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernelbased measures. 
Randomized Gradient Boosting Machine (RGBM) 
Gradient Boosting Machine (GBM) introduced by Friedman is an extremely powerful supervised learning algorithm that is widely used in practice — it routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In spite of the usefulness of GBM in practice, there is a big gap between its theoretical understanding and its success in practice. In this work, we propose Randomized Gradient Boosting Machine (RGBM) which leads to significant computational gains compared to GBM, by using a randomization scheme to reduce the search in the space of weak learners. Our analysis provides a formal justification of commonly used ad hoc heuristics employed by GBM implementations such as XGBoost, and suggests alternatives. In particular, we also provide a principled guideline towards better stepsize selection in RGBM that does not require a line search. The analysis of RGBM is inspired by a special variant of coordinate descent that combines the benefits of randomized coordinate descent and greedy coordinate descent; and may be of independent interest as an optimization algorithm. As a special case, our results for RGBM lead to superior computational guarantees for GBM. Our computational guarantees depend upon a curious geometric quantity that we call Minimal Cosine Angle, which relates to the density of weak learners in the prediction space. We demonstrate the effectiveness of RGBM over GBM in terms of obtaining a model with good training/test data fidelity with a fraction of the computational cost, via numerical experiments on several real datasets. 
Randomized Hierarchical Alternating Least Squares  Nonnegative matrix factorization (NMF) is a powerful tool for data mining. However, the emergence of `big data’ has severely challenged our ability to compute this fundamental decomposition using deterministic algorithms. This paper presents a randomized hierarchical alternating least squares (HALS) algorithm to compute the NMF. By deriving a smaller matrix from the nonnegative input data, a more efficient nonnegative decomposition can be computed. Our algorithm scales to big data applications while attaining a nearoptimal factorization, i.e., the algorithm scales with the target rank of the data rather than the ambient dimension of measurement space. The proposed algorithm is evaluated using synthetic and real world data and shows substantial speedups compared to deterministic HALS. 
Randomized Independent Component Analysis (RICA) 
Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples – the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernelbased nonrandom versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernelbased measures. 
Randomized Principal Component Analysis (RPCA) 
Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy – even on parallel processors – unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in randomaccess memory (RAM). (The traditional terminology is that our procedure works efficiently outofcore.) We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer’s RAM. Read More: https://…/100804139 
Randomized Response  Randomized response is a research method used in structured survey interview. It was first proposed by S. L. Warner in 19651 and later modified by B. G. Greenberg in 1969.2 It allows respondents to respond to sensitive issues (such as criminal behavior or sexuality) while maintaining confidentiality. Chance decides, unknown to the interviewer, whether the question is to be answered truthfully, or “yes”, regardless of the truth. For example, social scientists have used it to ask people whether they use drugs, whether they have illegally installed telephones, or whether they have evaded paying taxes. Before abortions were legal, social scientists used the method to ask women whether they had had abortions. rr 
Randomized Singular Value Decomposition (rSVD) 
Matrix completion is a widely used technique for image inpainting and personalized recommender system, etc. In this work, we focus on accelerating the matrix completion using faster randomized singular value decomposition (rSVD). Firstly, two fast randomized algorithms (rSVDPI and rSVD BKI) are proposed for handling sparse matrix. They make use of an eigSVD procedure and several accelerating skills. Then, with the rSVDBKI algorithm and a new subspace recycling technique, we accelerate the singular value thresholding (SVT) method in [1] to realize faster matrix completion. Experiments show that the proposed rSVD algorithms can be 6X faster than the basic rSVD algorithm [2] while keeping same accuracy. For image inpainting and movierating estimation problems, the proposed accelerated SVT algorithm consumes 15X and 8X less CPU time than the methods using svds and lansvd respectively, without loss of accuracy. 
Randomized Weighted Majority Algorithm (RWMA) 
The randomized weighted majority algorithm is an algorithm in machine learning theory. It improves the mistake bound of the weighted majority algorithm. Imagine that every morning before the stock market opens, we get a prediction from each of our ‘experts’ about whether the stock market will go up or down. Our goal is to somehow combine this set of predictions into a single prediction that we then use to make a buy or sell decision for the day. The RWMA gives us a way to do this combination such that our prediction record will be nearly as good as that of the single best expert in hindsight. ➘ “Weighted Majority Algorithm” 
Range Entropy  Sample entropy ($SampEn$) has been accepted as an alternate, and sometimes a replacement, measure to approximate entropy ($ApEn$) for characterizing temporal complexity of time series. However, it still suffers from issues such as inconsistency over shortlength signals and its tolerance parameter $r$, susceptibility to signal amplitude changes and insensitivity to selfsimilarity of time series. We propose modifications to the $ApEn$ and $SampEn$ measures which are defined for 0<$r$<1, are more robust to signal amplitude changes and sensitive to selfsimilarity property of time series. We modified $ApEn$ and $SampEn$ by redefining the distance function used originally in their definitions. We then evaluated the new entropy measures, called range entropies ($RangeEn$) using different random processes and nonlinear deterministic signals. We further applied the proposed entropies to normal and epileptic electroencephalographic (EEG) signals under different states. Our results suggest that, unlike $ApEn$ and $SampEn$, $RangeEn$ measures are robust to stationary and nonstationary signal amplitude variations and that their trajectories in the tolerance rplane are constrained between 0 (maximum entropy) and 1 (minimum entropy). We also showed that $RangeEn$ have direct relationships with the Hurst exponent; suggesting that the new definitions are sensitive to selfsimilarity structures of signals. $RangeEn$ analysis of epileptic EEG data showed distinct behaviours in the $r$domain for extracranial versus intracranial recordings as well as different states of epileptic EEG data. The constrained trajectory of $RangeEn$ in the rplane makes them a good candidate for studying complex biological signals such as EEG during seizure and nonseizure states. The Python package used to generate the results shown in this paper is publicly available at: https://…/RangeEn. 
Rank1 Convolutional Neural Network  In this paper, we propose a convolutional neural network(CNN) with 3D rank1 filters which are composed by the outer product of 1D filters. After being trained, the 3D rank1 filters can be decomposed into 1D filters in the test time for fast inference. The reason that we train 3D rank1 filters in the training stage instead of consecutive 1D filters is that a better gradient flow can be obtained with this setting, which makes the training possible even in the case where the network with consecutive 1D filters cannot be trained. The 3D rank1 filters are updated by both the gradient flow and the outer product of the 1D filters in every epoch, where the gradient flow tries to obtain a solution which minimizes the loss function, while the outer product operation tries to make the parameters of the filter to live on a rank1 subspace. Furthermore, we show that the convolution with the rank1 filters results in low rank outputs, constraining the final output of the CNN also to live on a low dimensional subspace. 
RankAware Factorization Machine (RaFM) 
Factorization machines (FM) are a popular model class to learn pairwise interactions by a lowrank approximation. Different from existing FMbased approaches which use a fixed rank for all features, this paper proposes a RankAware Factorization machine (RaFM) model which adopts pairwise interactions from embeddings with different ranks. The proposed model achieves a better performance on realworld datasets where different features have significantly varying frequencies of occurrences. Moreover, we prove that the RaFM model can be stored, evaluated, and trained as efficiently as one single FM, and under some reasonable conditions it can be even significantly more efficient than FM. RaFM improves the performance of FMs in both regression tasks and classification tasks while incurring less computational burden, therefore also has attractive potential in industrial applications. 
RankBreakingThenCompositeMarginalLikelihood (RBCML) 
We propose a novel and flexible rankbreakingthencompositemarginallikelihood (RBCML) framework for learning random utility models (RUMs), which include the PlackettLuce model. We characterize conditions for the objective function of RBCML to be strictly logconcave by proving that strict logconcavity is preserved under convolution and marginalization. We characterize necessary and sufficient conditions for RBCML to satisfy consistency and asymptotic normality. Experiments on synthetic data show that RBCML for Gaussian RUMs achieves better statistical efficiency and computational efficiency than the stateoftheart algorithm and our RBCML for the PlackettLuce model provides flexible tradeoffs between running time and statistical efficiency. 
RankCGAN  In this paper, we investigate the use of generative adversarial networks in the task of image generation according to subjective measures of semantic attributes. Unlike the standard (CGAN) that generates images from discrete categorical labels, our architecture handles both continuous and discrete scales. Given pairwise comparisons of images, our model, called RankCGAN, performs two tasks: it learns to rank images using a subjective measure; and it learns a generative model that can be controlled by that measure. RankCGAN associates each subjective measure of interest to a distinct dimension of some latent space. We perform experiments on UTZap50K, PubFig and OSR datasets and demonstrate that the model is expressive and diverse enough to conduct twoattribute exploration and image editing. 
Ranked Set Sampling (RSS) 
Ranked set sampling (RSS) is introduced as an advanced method for data collection which is substantial for the statistical and methodological analysis in scientific studies by McIntyre (1952) (reprinted in 2005) <doi:10.1198/000313005X54180>. RSSampling 
Ranking Distillation (RD) 
We propose a novel way to train ranking models, such as recommender systems, that are both effective and efficient. Knowledge distillation (KD) was shown to be successful in image recognition to achieve both effectiveness and efficiency. We propose a KD technique for learning to rank problems, called \emph{ranking distillation (RD)}. Specifically, we train a smaller student model to learn to rank documents/items from both the training data and the supervision of a larger teacher model. The student model achieves a similar ranking performance to that of the large teacher model, but its smaller model size makes the online inference more efficient. RD is flexible because it is orthogonal to the choices of ranking models for the teacher and student. We address the challenges of RD for ranking problems. The experiments on public data sets and stateoftheart recommendation models showed that RD achieves its design purposes: the student model learnt with RD has a model size less than half of the teacher model while achieving a ranking performance similar to the teacher model and much better than the student model learnt without RD. 
Ranking Relative Principal Component Attributes Network Model (RELPCANet) 
In 2018, at the World Economic Forum in Davos it was presented a new countries’ economic performance metric named the Inclusive Development Index (IDI) composed of 12 indicators. The new metric implies that countries might need to realize structural reforms for improving both economic expansion and social inclusion performance. That is why, it is vital for the IDI calculation method to have strong statistical and mathematical basis, so that results are accurate and transparent for public purposes. In the current work, we propose a novel approach for the IDI estimation – the Ranking Relative Principal Component Attributes Network Model (RELPCANet). The model is based on RELARM and RankNet principles and combines elements of PCA, techniques applied in image recognition and learning to rank mechanisms. Also, we define a new approach for estimation of target probabilities matrix to reflect dynamic changes in countries’ inclusive development. Empirical study proved that RELPCANet ensures reliable and robust scores and rankings, thus is recommended for practical implementation. 
RankLib  RankLib is a library of learning to rank algorithms. Currently eight popular algorithms have been implemented: · MART (Multiple Additive Regression Trees, a.k.a. Gradient boosted regression tree) · RankNet · RankBoost · AdaRank · Coordinate Ascent · LambdaMART · ListNet · Random Forests · With appropriate parameters for Random Forests, it can also do bagging several MART/LambdaMART rankers. It also implements many retrieval metrics as well as provides many ways to carry out evaluation. 
RankOrdered Logit (ROLogit) 
Tan et al. (2017) <doi:10.1177/0962280217747309> The control of confounding is an area of extensive epidemiological research, especially in the field of causal inference for observational studies. Matched cohort and casecontrol study designs are commonly implemented to control for confounding effects without specifying the functional form of the relationship between the outcome and confounders. This paper extends the commonly used regression models in matched designs for binary and survival outcomes (i.e. conditional logistic and stratified Cox proportional hazards) to studies of continuous outcomes through a novel interpretation and application of logitbased regression models from the econometrics and marketing research literature. We compare the performance of the maximum likelihood estimators using simulated data and propose a heuristic argument for obtaining the residuals for model diagnostics. We illustrate our proposed approach with two real data applications. Our simulation studies demonstrate that our stratification approach is robust to model misspecification and that the distribution of the estimated residuals provides a useful diagnostic when the strata are of moderate size. In our applications to real data, we demonstrate that parity and menopausal status are associated with percent mammographic density, and that the mean level and variability of inpatient blood glucose readings vary between medical and surgical wards within a national tertiary hospital. Our work highlights how the same class of regression models, available in most statistical software, can be used to adjust for confounding in the study of binary, timetoevent and continuous outcomes. ROlogit 
RankPL  In this paper we introduce RankPL, a modeling language that can be thought of as a qualitative variant of a probabilistic programming language with a semantics based on Spohn’s ranking theory. Broadly speaking, RankPL can be used to represent and reason about processes that exhibit uncertainty expressible by distinguishing ‘normal’ from’ surprising’ events. RankPL allows (iterated) revision of rankings over alternative program states and supports various types of reasoning, including abduction and causal inference. We present the language, its denotational semantics, and a number of practical examples. We also discuss an implementation of RankPL that is available for download. 
RankRegret Representative (RRR) 
We propose the rankregret representative as a way of choosing a small subset of the database guaranteed to contain at least one of the topk of any linear ranking function. We provide the techniques for finding such set and conduct experiments on real datasets to confirm the efficiency and effectiveness of our proposal. 
Rant  Rant is an allpurpose procedural text engine that is most simply described as the opposite of Regex. It has been refined to include a dizzying array of features for handling everything from the most basic of string generation tasks to advanced dialogue generation, code templating, automatic formatting, and more. The goal of the project is to enable developers of all kinds to automate repetitive writing tasks with a high degree of creative freedom. 
RaoScott CochranArmitage by Slices Trend Test (RSCABS) 
RSCABS 
RApache  rApache is a project supporting web application development using the R statistical language and environment and the Apache web server. The current software distribution runs on UNIX/Linux and Mac OS X operating systems. Apache servers with threaded MultiProcessing Modules are now supported, but the the Apache Prefork MultiProcessing Module is still recommended (refer to the MultiProcessing Modules chapter from Apache for more about this). The rApache software distribution provides the Apache module named mod_R that embeds the R interpreter inside the web server. It also comes bundled with libapreq, an Apache module for manipulating client request data. Together, they provide the glue to transform R into a serverside scripting environment. Another important project that’s not bundled with rApache, but plays an important role in serverside scripting, is the R package brew (also available on CRAN). It implements a templating framework for report generation, and it’s perfect for generating HTML on the fly. it’s syntax is similar to PHP, Ruby’s erb module, Java Server Pages, and Python’s psp module. brew can be used standalone as well, so it’s not part of the distribution. http://…/rscriptasserviceapi 
Rapid Automatic Keyword Extraction (RAKE) 
Keywords are widely used to define queries within information retrieval (IR) systems as they are easy to define, revise, remember, and share. This chapter describes the rapid automatic keyword extraction (RAKE), an unsupervised, domainindependent, and languageindependent method for extracting keywords from individual documents. It provides details of the algorithm and its configuration parameters, and present results on a benchmark dataset of technical abstracts, showing that RAKE is more computationally efficient than TextRank while achieving higher precision and comparable recall scores. The chapter then describes a novel method for generating stoplists, which is used to configure RAKE for specific domains and corpora. Finally, it applies RAKE to a corpus of news articles and defines metrics for evaluating the exclusivity, essentiality, and generality of extracted keywords, enabling a system to identify keywords that are essential or general to documents in the absence of manual annotations. rapidraker 
Rapid Orthogonal Approximate Slepian Transform (ROAST) 
In this paper, we provide a Rapid Orthogonal Approximate Slepian Transform (ROAST) for the discrete vector one obtains when collecting a finite set of uniform samples from a baseband analog signal. The ROAST offers an orthogonal projection which is an approximation to the orthogonal projection onto the leading discrete prolate spheroidal sequence (DPSS) vectors (also known as Slepian basis vectors). As such, the ROAST is guaranteed to accurately and compactly represent not only oversampled bandlimited signals but also the leading DPSS vectors themselves. Moreover, the subspace angle between the ROAST subspace and the corresponding DPSS subspace can be made arbitrarily small. The complexity of computing the representation of a signal using the ROAST is comparable to the FFT, which is much less than the complexity of using the DPSS basis vectors. We also give nonasymptotic results to guarantee that the proposed basis not only provides a very high degree of approximation accuracy in a meansquare error sense for bandlimited sample vectors, but also that it can provide highquality approximations of all sampled sinusoids within the band of interest. 
RAPIDNN  Classification of very high resolution (VHR) satellite images has three major challenges: 1) inherent low intraclass and high interclass spectral similarities, 2) mismatching resolution of available bands, and 3) the need to regularize noisy classification maps. Conventional methods have addressed these challenges by adopting separate stages of image fusion, feature extraction, and postclassification map regularization. These processing stages, however, are not jointly optimizing the classification task at hand. In this study, we propose a singlestage framework embedding the processing stages in a recurrent multiresolution convolutional network trained in an endtoend manner. The feedforward version of the network, called FuseNet, aims to match the resolution of the panchromatic and multispectral bands in a VHR image using convolutional layers with corresponding downsampling and upsampling operations. Contextual label information is incorporated into FuseNet by means of a recurrent version called ReuseNet. We compared FuseNet and ReuseNet against the use of separate processing steps for both image fusion, e.g. pansharpening and resampling through interpolation, and map regularization such as conditional random fields. We carried out our experiments on a land cover classification task using a Worldview03 image of Quezon City, Philippines and the ISPRS 2D semantic labeling benchmark dataset of Vaihingen, Germany. FuseNet and ReuseNet surpass the baseline approaches in both quantitative and qualitative results. 
Rasch Model  The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the tradeoff between (a) the respondent’s abilities, attitudes or personality traits and (b) the item difficulty. For example, they may be used to estimate a student’s reading ability, or the extremity of a person’s attitude to capital punishment from responses on a questionnaire. In addition to psychometrics and educational research, the Rasch model and its extensions are used in other areas, including the health profession and market research because of their general applicability. The mathematical theory underlying Rasch models is a special case of item response theory and, more generally, a special case of a generalized linear model. However, there are important differences in the interpretation of the model parameters and its philosophical implications that separate proponents of the Rasch model from the item response modeling tradition. A central aspect of this divide relates to the role of specific objectivity, a defining property of the Rasch model according to Georg Rasch, as a requirement for successful measurement. ➚ “Item Response Theory” mixRasch 
Raster Time Series  The raster model is widely used in Geographic Information Systems to represent data that vary continuously in space, such as temperatures, precipitations, elevation, among other spatial attributes. In applications like weather forecast systems, not just a single raster, but a sequence of rasters covering the same region at different timestamps, known as a raster time series, needs to be stored and queried. Compact data structures have proven successful to provide spaceefficient representations of rasters with query capabilities. Hence, a naive approach to save space is to use such a representation for each raster in a time series. 
Rating Scale  A rating scale is a set of categories designed to elicit information about a quantitative or a qualitative attribute. In the social sciences, particularly psychology, common examples are the Likert scale and 110 rating scales in which a person selects the number which is considered to reflect the perceived quality of a product. 
Rationalization  We introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had done the behavior. We describe a rationalization technique that uses neural machine translation to translate internal stateaction representations of the autonomous agent into natural language. We evaluate our technique in the Frogger game environment. The natural language is collected from human players thinking out loud as they play the game. We motivate the use of rationalization as an approach to explanation generation, show the results of experiments on the accuracy of our rationalization technique, and describe future research agenda. 
Raw Data  Raw data (also known as primary data) is a term for data collected from a source. Raw data has not been subjected to processing or any other manipulation, and are also referred to as primary data. Raw data is a relative term. Raw data can be input to a computer program or used in manual procedures such as analyzing statistics from a survey. 
Ray  The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, we consider these requirements and present Ray—a distributed system to address them. Ray implements a dynamic task graph computation model that supports both the taskparallel and the actor programming models. To meet the performance requirements of AI applications, we propose an architecture that logically centralizes the system’s control state using a sharded storage system and a novel bottomup distributed scheduler. In our experiments, we demonstrate submillisecond remote task latencies and linear throughput scaling beyond 1.8 million tasks per second. We empirically validate that Ray speeds up challenging benchmarks and serves as both a natural and performant fit for an emerging class of reinforcement learning applications and algorithms. Ray 
Ray RLLib  Reinforcement learning (RL) algorithms involve the deep nesting of distinct components, where each component typically exhibits opportunities for distributed computation. Current RL libraries offer parallelism at the level of the entire program, coupling all the components together and making existing implementations difficult to extend, combine, and reuse. We argue for building composable RL components by encapsulating parallelism and resource requirements within individual components, which can be achieved by building on top of a flexible taskbased programming model. We demonstrate this principle by building Ray RLLib on top of Ray and show that we can implement a wide range of stateoftheart algorithms by composing and reusing a handful of standard components. This composability does not come at the cost of performance — in our experiments, RLLib matches or exceeds the performance of highly optimized reference implementations. Ray RLLib is available as part of Ray at https://…/. 
RBFI Unit  In adversarial attacks to machinelearning classifiers, small perturbations are added to input that is correctly classified. The perturbations yield adversarial examples, which are virtually indistinguishable from the unperturbed input, and yet are misclassified. In standard neural networks used for deep learning, attackers can craft adversarial examples from most input to cause a misclassification of their choice. We introduce a new type of network units, called RBFI units, whose nonlinear structure makes them inherently resistant to adversarial attacks. On permutationinvariant MNIST, in absence of adversarial attacks, networks using RBFI units match the performance of networks using sigmoid units, and are slightly below the accuracy of networks with ReLU units. When subjected to adversarial attacks, networks with RBFI units retain accuracies above 90% for attacks that degrade the accuracy of networks with ReLU or sigmoid units to below 2%. RBFI networks trained with regular input are superior in their resistance to adversarial attacks even to ReLU and sigmoid networks trained with the help of adversarial examples. The nonlinear structure of RBFI units makes them difficult to train using standard gradient descent. We show that networks of RBFI units can be efficiently trained to high accuracies using pseudogradients, computed using functions especially crafted to facilitate learning instead of their true derivatives. We show that the use of pseudogradients makes training deep RBFI networks practical, and we compare several structural alternatives of RBFI networks for their accuracy. 
RCGANU  ➘ “Robust Conditional GAN” 
RDeepSense  Recent advances in deep learning have led various applications to unprecedented achievements, which could potentially bring higher intelligence to a broad spectrum of mobile and ubiquitous applications. Although existing studies have demonstrated the effectiveness and feasibility of running deep neural network inference operations on mobile and embedded devices, they overlooked the reliability of mobile computing models. Reliability measurements such as predictive uncertainty estimations are key factors for improving the decision accuracy and user experience. In this work, we propose RDeepSense, the first deep learning model that provides wellcalibrated uncertainty estimations for resourceconstrained mobile and embedded devices. RDeepSense enables the predictive uncertainty by adopting a tunable proper scoring rule as the training criterion and dropout as the implicit Bayesian approximation, which theoretically proves its correctness.To reduce the computational complexity, RDeepSense employs efficient dropout and predictive distribution estimation instead of model ensemble or samplingbased method for inference operations. We evaluate RDeepSense with four mobile sensing applications using Intel Edison devices. Results show that RDeepSense can reduce around 90% of the energy consumption while producing superior uncertainty estimations and preserving at least the same model accuracy compared with other stateoftheart methods. 
RDPD  In many situations, we have both rich and poor data environments: in a richdata environment (e.g., intensive care units), we have highquality multimodality data. On the other hand, in a poordata environment (e.g., at home), we often only have access to a single data modality with low quality. How can we learn an accurate and efficient model for the poordata environment by leveraging multimodality data from the richdata environment? In this work, we propose a knowledge distillation model RDPD to enhance a small model trained on poor data with a complex model trained on rich data. In an endtoend fashion, RDPD trains a student model built on a single modality data (poor data) to imitate the behavior and performance of a teacher model from multimodal data (rich data) via jointly optimizing the combined loss of attention imitation and target imitation. We evaluated RDPD on three realworld datasets. RDPD consistently outperformed all baselines across all three datasets, especially achieving the greatest performance improvement over a standard neural network model trained on the common features (Direct model) by 24.56% on PRAUC and 12.21% on ROCAUC, and over the standard knowledge distillation model by 5.91% on PRAUC and 4.44% on ROCAUC. 
ReabsNet  Though deep neural network has hit a huge success in recent studies and applica tions, it still remains vulnerable to adversarial perturbations which are imperceptible to humans. To address this problem, we propose a novel network called ReabsNet to achieve high classification accuracy in the face of various attacks. The approach is to augment an existing classification network with a guardian network to detect if a sample is natural or has been adversarially perturbed. Critically, instead of simply rejecting adversarial examples, we revise them to get their true labels. We exploit the observation that a sample containing adversarial perturbations has a possibility of returning to its true class after revision. We demonstrate that our ReabsNet outperforms the stateoftheart defense method under various adversarial attacks. 
React.js  React (sometimes styled React.js or ReactJS) is an opensource JavaScript library for creating user interfaces that aims to address challenges encountered in developing singlepage applications. It is maintained by Facebook, Instagram and a community of individual developers and corporations. React is intended to help developers build large applications that use data that changes over time. Its goal is to be simple, declarative and composable. React only handles the user interface in an app; it is considered to only be the view in the modelviewcontroller (MVC) software pattern, and can be used in conjunction with other JavaScript libraries or larger MVC frameworks such as AngularJS. It can also be used with Reactbased addons that take care of the nonUI parts of building a web application. According to JavaScript analytics service Libscore, React is currently being used on the homepages of Imgur, Bleacher Report, Feedly, Airbnb, SeatGeek, HelloSign, and others. 
Reactive Application  A Reactive Application is an application that reacts to its changing environment by design. It’s constructed from the beginning to react to load, react to failure and react to users. This is achieved by the underlying notion of reacting to messages. 
Reactive Programming  In computing, reactive programming is a programming paradigm oriented around data flows and the propagation of change. This means that it should be possible to express static or dynamic data flows with ease in the programming languages used, and that the underlying execution model will automatically propagate changes through the data flow. For example, in an imperative programming setting, a:=b+c would mean that a is being assigned the result of b+c in the instant the expression is evaluated. Later, the values of b and c can be changed with no effect on the value of a. In reactive programming, the value of a would be automatically updated based on the new values. 
readPTU  readPTU is a python package designed to analyze timecorrelated singlephoton counting data. The use of the library promotes the storage of the complete time arrival information of the photons and full flexibility in postprocessing data for analysis. The library supports the computation of time resolved signal with external triggers and second order autocorrelation function analysis can be performed using multiple algorithms that provide the user with different tradeoffs with regards to speed and accuracy. Additionally, a thresholding algorithm to perform time postselection is also available. The library has been designed with performance and extensibility in mind to allow future users to implement support for additional file extensions and algorithms without having to deal with low level details. We demonstrate the performance of readPTU by analyzing the secondorder autocorrelation function of the resonance fluorescence from a single quantum dot in a twodimensional semiconductor. 
Real log Canonical Threshold (RLCT) 
➘ “Widely Applicable Bayesian Information Criterion” 
Real Logic  We propose real logic: a uniform framework for integrating automatic learning and reasoning. Real logic is defined on a full firstorder language where formulas have truthvalue in the interval [0,1] and semantics defined concretely on the domain of real numbers. Logical constants are interpreted as (feature) vectors of real numbers. Real logic promotes a wellfounded integration of deductive reasoning on knowledgebases with efficient, datadriven relational machine learning. We show how Real Logic can be implemented in deep Tensor Neural Networks with the use of Google’s TensorFlow primitives. The paper concludes with experiments on a simple but representative example of knowledge completion. 
REalistic Single Image DEhazing (RESIDE) 
In this paper, we present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new largescale benchmark consisting of both synthetic and realworld hazy images, called REalistic Single Image DEhazing (RESIDE). RESIDE highlights diverse data sources and image contents, and is divided into five subsets, each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from fullreference metrics, to noreference metrics, to subjective evaluation and the novel taskdriven evaluation. Experiments on RESIDE sheds light on the comparisons and limitations of stateoftheart dehazing algorithms, and suggest promising future directions. (PDF) RESIDE: A Benchmark for Single Image Dehazing. Available from: https://…IDE_A_Benchmark_for_Single_Image_Dehazing [accessed Jul 03 2018]. 
RealTime Anomaly Detection System (RADS) 
Cybersecurity attacks in Cloud data centres are increasing alongside the growth of the Cloud services market. Existing research proposes a number of anomaly detection systems for detecting such attacks. However, these systems encounter a number of challenges, specifically due to the unknown behaviour of the attacks and the occurrence of genuine Cloud workload spikes, which must be distinguished from attacks. In this paper, we discuss these challenges and investigate the issues with the existing Cloud anomaly detection approaches. Then, we propose a Realtime Anomaly Detection System (RADS) for Cloud data centres, which uses a one class classification algorithm and a windowbased time series analysis to address the challenges. Specifically, RADS can detect VMlevel anomalies occurring due to DDoS and cryptomining attacks. We evaluate the performance of RADS by running labbased experiments and by using realworld Cloud workload traces. Evaluation results demonstrate that RADS can achieve 9095% accuracy with a low false positive rate of 03%. The results further reveal that RADS experiences fewer false positives when using its windowbased time series analysis in comparison to using stateoftheart average or entropy based analysis. 
Realtime Automated Photometric IDentification (RAPID) 
We present RAPID (Realtime Automated Photometric IDentification), a novel timeseries classification tool capable of automatically identifying transients from within a day of the initial alert, to the full lifetime of a light curve. Using a deep recurrent neural network with Gated Recurrent Units (GRUs), we present the first method specifically designed to provide early classifications of astronomical timeseries data, typing 12 different transient classes. Our classifier can process light curves with any phase coverage, and it does not rely on deriving computationally expensive features from the data, making RAPID wellsuited for processing the millions of alerts that ongoing and upcoming widefield surveys such as the Zwicky Transient Facility (ZTF), and the Large Synoptic Survey Telescope (LSST) will produce. The classification accuracy improves over the lifetime of the transient as more photometric data becomes available, and across the 12 transient classes, we obtain an average area under the receiver operating characteristic curve of 0.95 and 0.98 at early and late epochs, respectively. We demonstrate RAPID’s ability to effectively provide early classifications of transients from the ZTF data stream. We have made RAPID available as an opensource software package (https://astrorapid.readthedocs.io ) for machine learningbased alertbrokers to use for the autonomous and quick classification of several thousand light curves within a few seconds. 
RealTime Intelligent Computing  ➘ “RealTime Intelligent Systems” 
RealTime Intelligent Systems  Intelligent computing refers greatly to artificial intelligence with the aim at making computer to act as a human. This newly developed area of realtime intelligent computing integrates the aspect of dynamic environments with the human intelligence. Book: Lecture Notes in RealTime Intelligent Systems 
Realtime IoT Benchmark for Distributed Stream Processing Platforms (RIoTBench) 
The Internet of Things (IoT) is an emerging technology paradigm where millions of sensors and actuators help monitor and manage, physical, environmental and human systems in realtime. The inherent closedloop responsiveness and decision making of IoT applications make them ideal candidates for using low latency and scalable stream processing platforms. Distributed Stream Processing Systems (DSPS) hosted on Cloud datacenters are becoming the vital engine for realtime data processing and analytics in any IoT software architecture. But the efficacy and performance of contemporary DSPS have not been rigorously studied for IoT applications and data streams. Here, we develop RIoTBench, a Realtime IoT Benchmark suite, along with performance metrics, to evaluate DSPS for streaming IoT applications. The benchmark includes 27 common IoT tasks classified across various functional categories and implemented as reusable microbenchmarks. Further, we propose four IoT application benchmarks composed from these tasks, and that leverage various dataflow semantics of DSPS. The applications are based on common IoT patterns for data preprocessing, statistical summarization and predictive analytics. These are coupled with four stream workloads sourced from real IoT observations on smart cities and fitness, with peak streams rates that range from 500 to 10000 messages/sec and diverse frequency distributions. We validate the RIoTBench suite for the popular Apache Storm DSPS on the Microsoft Azure public Cloud, and present empirical observations. This suite can be used by DSPS researchers for performance analysis and resource scheduling, and by IoT practitioners to evaluate DSPS platforms. 
RealTime Predictive Analytics  It is when a predictive model (built/fitted on a set of aggregated data) is deployed to perform runtime prediction on a continuous stream of event data to enable decision making in realtime. In order to achieve this, there are two aspects involved. One, the predictive model built by a Data Scientist via a standalone tool (R, SAS, SPSS, etc.) has to be exported in a consumable format (PMML is a preferred method across machine learning environments these days; we have done this and also via other formats). Second, a streaming operational analytics platform has to consume the model (PMML or other format) and translate it into the necessary predictive function (via opensource jPMML or Cascading Pattern or Zementis’ commercial licensed UPPI or other interfaces), and also feed the processed streaming event data (via a stream processing component in CEP or similar) to compute the predicted outcome. This deployment of a complex predictive model, from its parent machine learning environment to an operational analytics environment, is one possible route in order to successfully achieve a continuous runtime prediction on streaming event data in realtime. 
Reblur2Deblur  Motion blur is a fundamental problem in computer vision as it impacts image quality and hinders inference. Traditional deblurring algorithms leverage the physics of the image formation model and use handcrafted priors: they usually produce results that better reflect the underlying scene, but present artifacts. Recent learningbased methods implicitly extract the distribution of natural images directly from the data and use it to synthesize plausible images. Their results are impressive, but they are not always faithful to the content of the latent image. We present an approach that bridges the two. Our method finetunes existing deblurring neural networks in a selfsupervised fashion by enforcing that the output, when blurred based on the optical flow between subsequent frames, matches the input blurry image. We show that our method significantly improves the performance of existing methods on several datasets both visually and in terms of image quality metrics. The supplementary material is https://goo.gl/nYPjEQ 
Recall  In pattern recognition and information retrieval with binary classification, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance. Suppose a program for recognizing dogs in scenes from a video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4 of the identifications are correct, but 3 are actually cats, the program’s precision is 4/7 while its recall is 4/9. When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. In statistics, if the null hypothesis is that all and only the relevant items are retrieved, absence of type I and type II errors corresponds respectively to maximum precision (no false positive) and maximum recall (no false negative). The above pattern recognition example contained 7 – 4 = 3 type I errors and 9 – 4 = 5 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant, while high recall means that an algorithm returned most of the relevant results. 
RecallOriented Understudy for Gisting Evaluation (ROUGE) 
ROUGE, or RecallOriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (humanproduced) summary or translation. 
Receding Horizon Accelerated Gradient (RHAG) 
This paper studies an online optimization problem with switching costs and a finite prediction window. We propose two computationally efficient algorithms: Receding Horizon Gradient Descent (RHGD), and Receding Horizon Accelerated Gradient (RHAG). Both algorithms only require a finite number of gradient evaluations at each time. We show that both the dynamic regret and the competitive ratio of the proposed algorithms decay exponentially fast with the length of the prediction window, and the decay rate of RHAG is larger than RHGD. Moreover, we provide a fundamental lower bound on the dynamic regret for general online algorithms with a finite prediction window. The lower bound matches the dynamic regret of our RHAG, meaning that the performance can not improve significantly even with more computation. Lastly, we present simulation results to test our algorithms numerically. 
Receding Horizon Gradient Descent (RHGD) 
This paper studies an online optimization problem with switching costs and a finite prediction window. We propose two computationally efficient algorithms: Receding Horizon Gradient Descent (RHGD), and Receding Horizon Accelerated Gradient (RHAG). Both algorithms only require a finite number of gradient evaluations at each time. We show that both the dynamic regret and the competitive ratio of the proposed algorithms decay exponentially fast with the length of the prediction window, and the decay rate of RHAG is larger than RHGD. Moreover, we provide a fundamental lower bound on the dynamic regret for general online algorithms with a finite prediction window. The lower bound matches the dynamic regret of our RHAG, meaning that the performance can not improve significantly even with more computation. Lastly, we present simulation results to test our algorithms numerically. 
Receiver Operating Characteristic (ROC Curve) 
In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is created by plotting the fraction of true positives out of the total actual positives (TPR = true positive rate) vs. the fraction of false positives out of the total actual negatives (FPR = false positive rate), at various threshold settings. TPR is also known as sensitivity or recall in machine learning. The FPR is also known as the fallout and can be calculated as one minus the more well known specificity. The ROC curve is then the sensitivity as a function of fallout. In general, if both of the probability distributions for detection and false alarm are known, the ROC curve can be generated by plotting the Cumulative Distribution Function of the detection probability in the yaxis versus the Cumulative Distribution Function of the false alarm probability in xaxis. https://rocr.bioinf.mpisb.mpg.de ROCR 
RecLab  Different software tools have been developed with the purpose of performing offline evaluations of recommender systems. However, the results obtained with these tools may be not directly comparable because of subtle differences in the experimental protocols and metrics. Furthermore, it is difficult to analyze in the same experimental conditions several algorithms without disclosing their implementation details. For these reasons, we introduce RecLab, an open source software for evaluating recommender systems in a distributed fashion. By relying on consolidated web protocols, we created RESTful APIs for training and querying recommenders remotely. In this way, it is possible to easily integrate into the same toolkit algorithms realized with different technologies. In details, the experimenter can perform an evaluation by simply visiting a web interface provided by RecLab. The framework will then interact with all the selected recommenders and it will compute and display a comprehensive set of measures, each representing a different metric. The results of all experiments are permanently stored and publicly available in order to support accountability and comparative analyses. 
ReCode  In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to memorize large and complex structures. We introduce ReCode, a method based on subtree retrieval that makes it possible to explicitly reference existing code examples within a neural code generation model. First, we retrieve sentences that are similar to input sentences using a dynamicprogrammingbased sentence similarity scoring method. Next, we extract ngrams of action sequences that build the associated abstract syntax tree. Finally, we increase the probability of actions that cause the retrieved ngram action subtree to be in the predicted code. We show that our approach improves the performance on two code generation tasks by up to +2.6 BLEU. 
RecoGym  Recommender Systems are becoming ubiquitous in many settings and take many forms, from product recommendation in ecommerce stores, to query suggestions in search engines, to friend recommendation in social networks. Current research directions which are largely based upon supervised learning from historical data appear to be showing diminishing returns with a lot of practitioners report a discrepancy between improvements in offline metrics for supervised learning and the online performance of the newly proposed models. One possible reason is that we are using the wrong paradigm: when looking at the longterm cycle of collecting historical performance data, creating a new version of the recommendation model, A/B testing it and then rolling it out. We see that there a lot of commonalities with the reinforcement learning (RL) setup, where the agent observes the environment and acts upon it in order to change its state towards better states (states with higher rewards). To this end we introduce RecoGym, an RL environment for recommendation, which is defined by a model of user traffic patterns on ecommerce and the users response to recommendations on the publisher websites. We believe that this is an important step forward for the field of recommendation systems research, that could open up an avenue of collaboration between the recommender systems and reinforcement learning communities and lead to better alignment between offline and online performance metrics. 
RecombinatorkMeans  We present a heuristic algorithm, called recombinatorkmeans, that can substantially improve the results of kmeans optimization. Instead of using simple independent restarts and returning the best result, our scheme performs restarts in batches, using the results of a previous batch as a reservoir of candidates for the new initial starting values (seeds), exploiting the popular kmeans++ seeding algorithm to piece them together into new promising initial configurations. Our scheme is general (it only affects the seeding part of the optimization, thus it could be applied even to kmedians or kmedoids, for example), it has no additional costs and it is trivially parallelizable across the restarts of each batch. In some circumstances, it can systematically find better configurations than the best one obtained after 10^4 restarts of a standard scheme. Our implementation is publicly available at https://…/RecombinatorKMeans.jl. 
Recommendation Engine of Multilayers (REM) 
Recommender systems have been widely adopted by electronic commerce and entertainment industries for individualized prediction and recommendation, which benefit consumers and improve business intelligence. In this article, we propose an innovative method, namely the recommendation engine of multilayers (REM), for tensor recommender systems. The proposed method utilizes the structure of a tensor response to integrate information from multiple modes, and creates an additional layer of nested latent factors to accommodate betweensubjects dependency. One major advantage is that the proposed method is able to address the ‘coldstart’ issue in the absence of information from new customers, new products or new contexts. Specifically, it provides more effective recommendations through subgroup information. To achieve scalable computation, we develop a new algorithm for the proposed method, which incorporates a maximum block improvement strategy into the cyclic blockwisecoordinatedescent algorithm. In theory, we investigate both algorithmic properties for global and local convergence, along with the asymptotic consistency of estimated parameters. Finally, the proposed method is applied in simulations and IRI marketing data with 116 million observations of product sales. Numerical studies demonstrate that the proposed method outperforms existing competitors in the literature. 
Recommender System  Recommender systems or recommendation systems (sometimes replacing “system” with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that user would give to an item. recosystem 
Reconciled Polynomial Machine  In this paper, we aim at introducing a new machine learning model, namely reconciled polynomial machine, which can provide a unified representation of existing shallow and deep machine learning models. Reconciled polynomial machine predicts the output by computing the inner product of the feature kernel function and variable reconciling function. Analysis of several concrete models, including Linear Models, FM, MVM, Perceptron, MLP and Deep Neural Networks, will be provided in this paper, which can all be reduced to the reconciled polynomial machine representations. Detailed analysis of the learning error by these models will also be illustrated in this paper based on their reduced representations from the function approximation perspective. 
Reconciliation kMedian  We propose a new variant of the kmedian problem, where the objective function models not only the cost of assigning data points to cluster representatives, but also a penalty term for disagreement among the representatives. We motivate this novel problem by applications where we are interested in clustering data while avoiding selecting representatives that are too far from each other. For example, we may want to summarize a set of news sources, but avoid selecting ideologicallyextreme articles in order to reduce polarization. To solve the proposed kmedian formulation we adopt the localsearch algorithm of Arya et al. We show that the algorithm provides a provable approximation guarantee, which becomes constant under a mild assumption on the minimum number of points for each cluster. We experimentally evaluate our problem formulation and proposed algorithm on datasets inspired by the motivating applications. In particular, we experiment with data extracted from Twitter, the US Congress voting records, and popular news sources. The results show that our objective can lead to choosing less polarized groups of representatives without significant loss in representation fidelity. 
Reconfigurable Inverted Index (RII) 
Existing approximate nearest neighbor search systems suffer from two fundamental problems that are of practical importance but have not received sufficient attention from the research community. First, although existing systems perform well for the whole database, it is difficult to run a search over a subset of the database. Second, there has been no discussion concerning the performance decrement after many items have been newly added to a system. We develop a reconfigurable inverted index (Rii) to resolve these two issues. Based on the standard IVFADC system, we design a data layout such that items are stored linearly. This enables us to efficiently run a subset search by switching the search method to a linear PQ scan if the size of a subset is small. Owing to the linear layout, the data structure can be dynamically adjusted after new items are added, maintaining the fast speed of the system. Extensive comparisons show that Rii achieves a comparable performance with stateofthe art systems such as Faiss. 
ReCoRD  We present a largescale dataset, ReCoRD, for machine reading comprehension requiring commonsense reasoning. Experiments on this dataset demonstrate that the performance of stateoftheart MRC systems fall far behind human performance. ReCoRD represents a challenge for future research to bridge the gap between human and machine commonsense reading comprehension. ReCoRD is available at http://…/record. 
Record Linkage (RL) 
Record linkage (RL) refers to the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, databases). Record linkage is necessary when joining data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), as may be the case due to differences in record shape, storage location, and/or curator style or preference. A data set that has undergone RLoriented reconciliation may be referred to as being crosslinked. Record Linkage is called Data Linkage in many jurisdictions, but is the same process. 
RecSysDAN  Data sparsity and data imbalance are practical and challenging issues in crossdomain recommender systems. This paper addresses those problems by leveraging the concepts which derive from representation learning, adversarial learning and transfer learning (particularly, domain adaptation). Although various transfer learning methods have shown promising performance in this context, our proposed novel method RecSysDAN focuses on alleviating the crossdomain and withindomain data sparsity and data imbalance and learns transferable latent representations for users, items and their interactions. Different from existing approaches, the proposed method transfers the latent representations from a source domain to a target domain in an adversarial way. The mapping functions in the target domain are learned by playing a minmax game with an adversarial loss, aiming to generate domain indistinguishable representations for a discriminator. Four neural architectural instances of ResSysDAN are proposed and explored. Empirical results on realworld Amazon data show that, even without using labeled data (i.e., ratings) in the target domain, RecSysDAN achieves competitive performance as compared to the stateoftheart supervised methods. More importantly, RecSysDAN is highly flexible to both unimodal and multimodal scenarios, and thus it is more robust to the coldstart recommendation which is difficult for previous methods. 
Rectangular Bounding Process (RBP) 
Stochastic partition models divide a multidimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity. Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a new parsimonious partition model — the Rectangular Bounding Process (RBP) — to efficiently partition multidimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. Unlike existing approaches, the RBP possesses several attractive theoretical properties that make it a powerful nonparametric partition prior on a hypercube. In particular, the RBP is selfconsistent and as such can be directly extended from a finite hypercube to infinite (unbounded) space. We apply the RBP to regression trees and relational models as a flexible partition prior. The experimental results validate the merit of the RBP {in rich yet parsimonious expressiveness} compared to the stateoftheart methods. 
Rectified Decision Tree (ReDT) 
How to obtain a model with good interpretability and performance has always been an important research topic. In this paper, we propose rectified decision trees (ReDT), a knowledge distillation based decision trees rectification with high interpretability, small model size, and empirical soundness. Specifically, we extend the impurity calculation and the pure ending condition of the classical decision tree to propose a decision tree extension that allows the use of soft labels generated by a welltrained teacher model in training and prediction process. It is worth noting that for the acquisition of soft labels, we propose a new multiple crossvalidation based method to reduce the effects of randomness and overfitting. These approaches ensure that ReDT retains excellent interpretability and even achieves fewer nodes than the decision tree in the aspect of compression while having relatively good performance. Besides, in contrast to traditional knowledge distillation, back propagation of the student model is not necessarily required in ReDT, which is an attempt of a new knowledge distillation approach. Extensive experiments are conducted, which demonstrates the superiority of ReDT in interpretability, compression, and empirical soundness. 
Rectified Factor Networks (RFN) 
We propose rectified factor networks (RFNs) to efficiently construct very sparse, nonlinear, highdimensional representations of the input. RFN models identify rare and small events in the input, have a low interference between code units, have a small reconstruction error, and explain the data covariance structure. RFN learning is a generalized alternating minimization algorithm derived from the posterior regularization method which enforces nonnegative and normalized posterior means. We proof convergence and correctness of the RFN learning algorithm. On benchmarks, RFNs are compared to other unsupervised methods like autoencoders, RBMs, factor analysis, ICA, and PCA. In contrast to previous sparse coding methods, RFNs yield sparser codes, capture the data’s covariance structure more precisely, and have a significantly smaller reconstruction error. We test RFNs as pretraining technique for deep networks on different vision datasets, where RFNs were superior to RBMs and autoencoders. On gene expression data from two pharmaceutical drug discovery studies, RFNs detected small and rare gene modules that revealed highly relevant new biological insights which were so far missed by other unsupervised methods. 
Rectified Linear Unit (ReLU) 
➘ “Rectifier” 
Rectified Local Phase Volume (ReLPV) 
Traditional 3D Convolutional Neural Networks (CNNs) are computationally expensive, memory intensive, prone to overfit, and most importantly, there is a need to improve their feature learning capabilities. To address these issues, we propose Rectified Local Phase Volume (ReLPV) block, an efficient alternative to the standard 3D convolutional layer. The ReLPV block extracts the phase in a 3D local neighborhood (e.g., 3x3x3) of each position of the input map to obtain the feature maps. The phase is extracted by computing 3D Short Term Fourier Transform (STFT) at multiple fixed low frequency points in the 3D local neighborhood of each position. These feature maps at different frequency points are then linearly combined after passing them through an activation function. The ReLPV block provides significant parameter savings of at least, 3^3 to 13^3 times compared to the standard 3D convolutional layer with the filter sizes 3x3x3 to 13x13x13, respectively. We show that the feature learning capabilities of the ReLPV block are significantly better than the standard 3D convolutional layer. Furthermore, it produces consistently better results across different 3D data representations. We achieve stateoftheart accuracy on the volumetric ModelNet10 and ModelNet40 datasets while utilizing only 11% parameters of the current stateoftheart. We also improve the stateoftheart on the UCF101 split1 action recognition dataset by 5.68% (when trained from scratch) while using only 15% of the parameters of the stateoftheart. The project webpage is available at https://…/home. 
Rectified Wire Network  We introduce a new neural network model, together with a tractable and monotone online learning algorithm. Our model describes feedforward networks for classification, with one output node for each class. The only nonlinear operation is rectification using a ReLU function with a bias. However, there is a rectifier on every edge rather than at the nodes of the network. There are also weights, but these are positive, static, and associated with the nodes. Our rectified wire networks are able to represent arbitrary Boolean functions. Only the bias parameters, on the edges of the network, are learned. Another departure in our approach, from standard neural networks, is that the loss function is replaced by a constraint. This constraint is simply that the value of the output node associated with the correct class should be zero. Our model has the property that the exact normminimizing parameter update, required to correctly classify a training item, is the solution to a quadratic program that can be computed with a few passes through the network. We demonstrate a training algorithm using this update, called sequential deactivation (SDA), on MNIST and some synthetic datasets. Upon adopting a natural choice for the nodal weights, SDA has no hyperparameters other than those describing the network structure. Our experiments explore behavior with respect to network size and depth in a family of sparse expander networks. 
Rectifier  In the context of artificial neural networks, the rectifier is an activation function defined as f(x) = max(0, x) where x is the input to a neuron. This activation function has been argued to be more biologically plausible (cortical neurons are rarely in their maximum saturation regime) than the widely used logistic sigmoid (which is inspired by probability theory; see logistic regression) and its more practical counterpart, the hyperbolic tangent. A unit employing the rectifier is also called a rectified linear unit (ReLU). 
RecurJac  The Jacobian matrix (or the gradient for singleoutput networks) is directly related to many important properties of neural networks, such as the function landscape, stationary points, (local) Lipschitz constants and robustness to adversarial attacks. In this paper, we propose a recursive algorithm, RecurJac, to compute both upper and lower bounds for each element in the Jacobian matrix of a neural network with respect to network’s input, and the network can contain a wide range of activation functions. As a byproduct, we can efficiently obtain a (local) Lipschitz constant, which plays a crucial role in neural network robustness verification, as well as the training stability of GANs. Experiments show that (local) Lipschitz constants produced by our method is of better quality than previous approaches, thus providing better robustness verification results. Our algorithm has polynomial time complexity, and its computation time is reasonable even for relatively large networks. Additionally, we use our bounds of Jacobian matrix to characterize the landscape of the neural network, for example, to determine whether there exist stationary points in a local neighborhood. Source code available at https://…/RecurJacJacobianBounds. 
Recurrence Plot (RP) 
In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for a given moment in time, the times at which a phase space trajectory visits roughly the same area in the phase space. 
Recurrence Quantification Analysis (RQA) 
Recurrence quantification analysis (RQA) is a method of nonlinear data analysis (cf. chaos theory) for the investigation of dynamical systems. It quantifies the number and duration of recurrences of a dynamical system presented by its phase space trajectory. Dynamic Natural Language Processing with Recurrence Quantification Analysis 
Recurrent Additive Networks (RAN) 
We introduce recurrent additive networks (RANs), a new gated RNN which is distinguished by the use of purely additive latent state updates. At every time step, the new state is computed as a gated componentwise sum of the input and the previous state, without any of the nonlinearities commonly used in RNN transition dynamics. We formally show that RAN states are weighted sums of the input vectors, and that the gates only contribute to computing the weights of these sums. Despite this relatively simple functional form, experiments demonstrate that RANs outperform both LSTMs and GRUs on benchmark language modeling problems. This result shows that many of the nonlinear computations in LSTMs and related networks are not essential, at least for the problems we consider, and suggests that the gates are doing more of the computational work than previously understood. 
Recurrent Attention Unit (RAU) 
Recurrent Neural Network (RNN) has been successfully applied in many sequence learning problems. Such as handwriting recognition, image description, natural language processing and video motion analysis. After years of development, researchers have improved the internal structure of the RNN and introduced many variants. Among others, Gated Recurrent Unit (GRU) is one of the most widely used RNN model. However, GRU lacks the capability of adaptively paying attention to certain regions or locations, so that it may cause information redundancy or loss during leaning. In this paper, we propose a RNN model, called Recurrent Attention Unit (RAU), which seamlessly integrates the attention mechanism into the interior of GRU by adding an attention gate. The attention gate can enhance GRU’s ability to remember longterm memory and help memory cells quickly discard unimportant content. RAU is capable of extracting information from the sequential data by adaptively selecting a sequence of regions or locations and pay more attention to the selected regions during learning. Extensive experiments on image classification, sentiment classification and language modeling show that RAU consistently outperforms GRU and other baseline methods. 
Recurrent Attentive and Intensive Model (RAIM) 
With the improvement of medical data capturing, vast amount of continuous patient monitoring data, e.g., electrocardiogram (ECG), realtime vital signs and medications, become available for clinical decision support at intensive care units (ICUs). However, it becomes increasingly challenging to model such data, due to high density of the monitoring data, heterogeneous data types and the requirement for interpretable models. Integration of these highdensity monitoring data with the discrete clinical events (including diagnosis, medications, labs) is challenging but potentially rewarding since richness and granularity in such multimodal data increase the possibilities for accurate detection of complex problems and predicting outcomes (e.g., length of stay and mortality). We propose Recurrent Attentive and Intensive Model (RAIM) for jointly analyzing continuous monitoring data and discrete clinical events. RAIM introduces an efficient attention mechanism for continuous monitoring data (e.g., ECG), which is guided by discrete clinical events (e.g, medication usage). We apply RAIM in predicting physiological decompensation and length of stay in those critically ill patients at ICU. With evaluations on MIMIC III Waveform Database Matched Subset, we obtain an AUCROC score of 90.18% for predicting decompensation and an accuracy of 86.82% for forecasting length of stay with our final model, which outperforms our six baseline models. 
Recurrent Collective Classification (RCC) 
We propose a new method for training iterative collective classifiers for labeling nodes in network data. The iterative classification algorithm (ICA) is a canonical method for incorporating relational information into classification. Yet, existing methods for training ICA models rely on the assumption that relational features reflect the true labels of the nodes. This unrealistic assumption introduces a bias that is inconsistent with the actual prediction algorithm. In this paper, we introduce recurrent collective classification (RCC), a variant of ICA analogous to recurrent neural network prediction. RCC accommodates any differentiable local classifier and relational feature functions. We provide gradientbased strategies for optimizing over model parameters to more directly minimize the loss function. In our experiments, this direct loss minimization translates to improved accuracy and robustness on real network data. We demonstrate the robustness of RCC in settings where local classification is very noisy, settings that are particularly challenging for ICA. 
Recurrent Control Net (RCN) 
Central Pattern Generators (CPGs) are biological neural circuits capable of producing coordinated rhythmic outputs in the absence of rhythmic input. As a result, they are responsible for most rhythmic motion in living organisms. This rhythmic control is broadly applicable to fields such as locomotive robotics and medical devices. In this paper, we explore the possibility of creating a selfsustaining CPG network for reinforcement learning that learns rhythmic motion more efficiently and across more general environments than the current multilayer perceptron (MLP) baseline models. Recent work introduces the Structured Control Net (SCN), which maintains linear and nonlinear modules for local and global control, respectively. Here, we show that timesequence architectures such as Recurrent Neural Networks (RNNs) model CPGs effectively. Combining previous work with RNNs and SCNs, we introduce the Recurrent Control Net (RCN), which adds a linear component to the, RCNs match and exceed the performance of baseline MLPs and SCNs across all environment tasks. Our findings confirm existing intuitions for RNNs on reinforcement learning tasks, and demonstrate promise of SCNlike structures in reinforcement learning. 
Recurrent Convolutional Network (RCN) 
Recently, three dimensional (3D) convolutional neural networks (CNNs) have emerged as dominant methods to capture spatiotemporal representations, by adding to preexisting 2D CNNs a third, temporal dimension. Such 3D CNNs, however, are anticausal (i.e., they exploit information from both the past and the future to produce feature representations, thus preventing their use in online settings), constrain the temporal reasoning horizon to the size of the temporal convolution kernel, and are not temporal resolutionpreserving for video sequencetosequence modelling, as, e.g., in spatiotemporal action detection. To address these serious limitations, we present a new architecture for the causal/online spatiotemporal representation of videos. Namely, we propose a recurrent convolutional network (RCN), which relies on recurrence to capture the temporal context across frames at every level of network depth. Our network decomposes 3D convolutions into (1) a 2D spatial convolution component, and (2) an additional hidden state $1\times 1$ convolution applied across time. The hidden state at any time $t$ is assumed to depend on the hidden state at $t1$ and on the current output of the spatial convolution component. As a result, the proposed network: (i) provides flexible temporal reasoning, (ii) produces causal outputs, and (iii) preserves temporal resolution. Our experiments on the largescale large ‘Kinetics’ dataset show that the proposed method achieves superior performance compared to 3D CNNs, while being causal and using fewer parameters. 
Recurrent Distribution Regression Network (RDRN) 
While deep neural networks have achieved groundbreaking prediction results in many tasks, there is a class of data where existing architectures are not optimal — sequences of probability distributions. Performing forward prediction on sequences of distributions has many important applications. However, there are two main challenges in designing a network model for this task. First, neural networks are unable to encode distributions compactly as each node encodes just a real value. A recent work of Distribution Regression Network (DRN) solved this problem with a novel network that encodes an entire distribution in a single node, resulting in improved accuracies while using much fewer parameters than neural networks. However, despite its compact distribution representation, DRN does not address the second challenge, which is the need to model time dependencies in a sequence of distributions. In this paper, we propose our Recurrent Distribution Regression Network (RDRN) which adopts a recurrent architecture for DRN. The combination of compact distribution representation and shared weights architecture across time steps makes RDRN suitable for modeling the time dependencies in a distribution sequence. Compared to neural networks and DRN, RDRN achieves the best prediction performance while keeping the network compact. 
Recurrent Embedding Dialogue Policy (REDP) 
Machinelearning based dialogue managers are able to learn complex behaviors in order to complete a task, but it is not straightforward to extend their capabilities to new domains. We investigate different policies’ ability to handle uncooperative user behavior, and how well expertise in completing one task (such as restaurant reservations) can be reapplied when learning a new one (e.g. booking a hotel). We introduce the Recurrent Embedding Dialogue Policy (REDP), which embeds system actions and dialogue states in the same vector space. REDP contains a memory component and attention mechanism based on a modified Neural Turing Machine, and significantly outperforms a baseline LSTM classifier on this task. We also show that both our architecture and baseline solve the bAbI dialogue task, achieving 100% test accuracy. 
Recurrent Entity Network (EntNet) 
We introduce a new model, the Recurrent Entity Network (EntNet). It is equipped with a dynamic longterm memory which allows it to maintain and update a representation of the state of the world as it receives new data. For language understanding tasks, it can reason onthefly as it reads text, not just when it is required to answer a question or respond as is the case for a Memory Network (Sukhbaatar et al., 2015). Like a Neural Turing Machine or Differentiable Neural Computer (Graves et al., 2014; 2016) it maintains a fixed size memory and can learn to perform location and contentbased read and write operations. However, unlike those models it has a simple parallel architecture in which several memory locations can be updated simultaneously. The EntNet sets a new stateoftheart on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting. We also demonstrate that it can solve a reasoning task which requires a large number of supporting facts, which other methods are not able to solve, and can generalize past its training horizon. It can also be practically used on large scale datasets such as Children’s Book Test, where it obtains competitive performance, reading the story in a single pass. 
Recurrent Event Network  Recently, there has been a surge of interest in learning representation of graphstructured data that are dynamically evolving. However, current dynamic graph learning methods lack a principled way in modeling temporal, multirelational, and concurrent interactions between nodes—a limitation that is especially problematic for the task of temporal knowledge graph reasoning, where the goal is to predict unseen entity relationships (i.e., events) over time. Here we present Recurrent Event Network (\method)—an architecture for modeling complex event sequences—which consists of a recurrent event encoder and a neighborhood aggregator. The event encoder employs a RNN to capture (subject, relation)specific patterns from historical entity interactions; while the neighborhood aggregator summarizes concurrent interactions within each time stamp. An output layer is designed for predicting forthcoming, multirelational events. Experiments on temporal link prediction over two knowledge graph datasets demonstrate the effectiveness of our method, especially on multistep inference over time. 
Recurrent Gaussian Processes (RGP) 
We define Recurrent Gaussian Processes (RGP) models, a general family of Bayesian nonparametric models with recurrent GP priors which are able to learn dynamical patterns from sequential data. Similar to Recurrent Neural Networks (RNNs), RGPs can have different formulations for their internal states, distinct inference methods and be extended with deep structures. In such context, we propose a novel deep RGP model whose autoregressive states are latent, thereby performing representation and dynamical learning simultaneously. To fully exploit the Bayesian nature of the RGP model we develop the Recurrent Variational Bayes (REVARB) framework, which enables efficient inference and strong regularization through coherent propagation of uncertainty across the RGP layers and states. We also introduce a RGP extension where variational parameters are greatly reduced by being reparametrized through RNNbased sequential recognition models. We apply our model to the tasks of nonlinear system identification and human motion modeling. The promising obtained results indicate that our RGP model maintains its highly flexibility while being able to avoid overfitting and being applicable even when larger datasets are not available. 
Recurrent Graph Neural Network  In this paper, we study the problem of node representation learning with graph neural networks. We present a graph neural network class named recurrent graph neural network (RGNN), that address the shortcomings of prior methods. By using recurrent units to capture the longterm dependency across layers, our methods can successfully identify important information during recursive neighborhood expansion. In our experiments, we show that our model class achieves stateoftheart results on three benchmarks: the Pubmed, Reddit, and PPI network datasets. Our indepth analyses also demonstrate that incorporating recurrent units is a simple yet effective method to prevent noisy information in graphs, which enables a deeper graph neural network. 
Recurrent Iterative Gating Network (RIGNet) 
In this paper, we present an approach for Recurrent Iterative Gating called RIGNet. The core elements of RIGNet involve recurrent connections that control the flow of information in neural networks in a topdown manner, and different variants on the core structure are considered. The iterative nature of this mechanism allows for gating to spread in both spatial extent and feature space. This is revealed to be a powerful mechanism with broad compatibility with common existing networks. Analysis shows how gating interacts with different network characteristics, and we also show that more shallow networks with gating may be made to perform better than much deeper networks that do not include RIGNet modules. 
Recurrent Kalman Network (RKN) 
In order to integrate uncertainty estimates into deep timeseries modelling, Kalman Filters (KFs) (Kalman et al., 1960) have been integrated with deep learning models, however, such approaches typically rely on approximate inference techniques such as variational inference which makes learning more complex and often less scalable due to approximation errors. We propose a new deep approach to Kalman filtering which can be learned directly in an endtoend manner using backpropagation without additional approximations. Our approach uses a highdimensional factorized latent state representation for which the Kalman updates simplify to scalar operations and thus avoids hard to backpropagate, computationally heavy and potentially unstable matrix inversions. Moreover, we use locally linear dynamic models to efficiently propagate the latent state to the next time step. The resulting network architecture, which we call Recurrent Kalman Network (RKN), can be used for any timeseries data, similar to a LSTM (Hochreiter & Schmidhuber, 1997) but uses an explicit representation of uncertainty. As shown by our experiments, the RKN obtains much more accurate uncertainty estimates than an LSTM or Gated Recurrent Units (GRUs) (Cho et al., 2014) while also showing a slightly improved prediction performance and outperforms various recent generative models on an image imputation task. 
Recurrent Knowledge Distillation  Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the student network even further by recasting multiple residual layers in the teacher network into a single recurrent student layer. We propose three variants of adding recurrent connections into the student network, and show experimentally on CIFAR10, Scenes and MiniPlaces, that we can reduce the number of parameters at little loss in accuracy. 
Recurrent Ladder Network  In this paper we address the problem of electing a committee among a set of $m$ candidates and on the basis of the preferences of a set of $n$ voters. We consider the approval voting method in which each voter can approve as many candidates as she/he likes by expressing a preference profile (boolean $m$vector). In order to elect a committee, a voting rule must be established to `transform’ the $n$ voters’ profiles into a winning committee. The problem is widely studied in voting theory; for a variety of voting rules the problem was shown to be computationally difficult and approximation algorithms and heuristic techniques were proposed in the literature. In this paper we follow an Ordered Weighted Averaging approach and study the $k$sum approval voting (optimization) problem in the general case $1 \leq k <n$. For this problem we provide different mathematical programming formulations that allow us to solve it in an exact solution framework. We provide computational results showing that our approach is efficient for mediumsize test problems ($n$ up to 200, $m$ up to 60) since in all tested cases it was able to find the exact optimal solution in very short computational times. Recurrent Ladder Networks 
Recurrent Memory Network  Recurrent Neural Networks (RNN) have obtained excellent result in many natural language processing (NLP) tasks. However, understanding and interpreting the source of this success remains a challenge. In this paper, we propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only amplifies the power of RNN but also facilitates our understanding of its internal functioning and allows us to discover underlying patterns in data. We demonstrate the power of RMN on language modeling and sentence completion tasks. On language modeling, RMN outperforms Long ShortTerm Memory (LSTM) network on three large German, Italian, and English dataset. Additionally we perform indepth analysis of various linguistic dimensions that RMN captures. On Sentence Completion Challenge, for which it is essential to capture sentence coherence, our RMN obtains 69.2% accuracy, surpassing the previous stateoftheart by a large margin. 
Recurrent Neural Filter (RNF) 
Despite the recent popularity of deep generative state space models, few comparisons have been made between network architectures and the inference steps of the Bayesian filtering framework — with most models simultaneously approximating both state transition and update steps with a single recurrent neural network (RNN). In this paper, we introduce the Recurrent Neural Filter (RNF), a novel recurrent variational autoencoder architecture that learns distinct representations for each Bayesian filtering step, captured by a series of encoders and decoders. Testing this on three realworld time series datasets, we demonstrate that decoupling representations not only improves the accuracy of onestepahead forecasts while providing realistic uncertainty estimates, but also facilitates multistep prediction through the separation of encoder stages. 
Recurrent Neural Network (RNN) 
A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition, where they have achieved the best known results. rnn 
Recurrent Neural Network Language Model (RNNLM) 
Recurrent neural network based language model has been proposed to overcome certain limitations of the feedforward NNLM, such as the need to specify the context length (the order of the model N), and because theoretically RNNs can efficiently represent more complex patterns than the shallow neural networks. The RNN model does not have a projection layer; only input, hidden and output layer. What is special for this type of model is the recurrent matrix that connects hidden layer to itself, using timedelayed connections. This allows the recurrent model to form some kind of short term memory, as information from the past can be represented by the hidden layer state that gets updated based on the current input and the state of the hidden layer in the previous time step. The complexity per training example of the RNN model is Q = HH + HV; where the word representations D have the same dimensionality as the hidden layer H. Again, the term HV can be efficiently reduced to H log2(V ) by using hierarchical softmax. Most of the complexity then comes from HH. http://rnnlm.org Gated WordCharacter Recurrent Language Model 
Recurrent Neural Network With Residual Attention (RRA) 
In this paper, we propose a recurrent neural network (RNN) with residual attention (RRA) to learn longrange dependencies from sequential data. We propose to add residual connections across timesteps to RNN, which explicitly enhances the interaction between current state and hidden states that are several timesteps apart. This also allows training errors to be directly backpropagated through residual connections and effectively alleviates gradient vanishing problem. We further reformulate an attention mechanism over residual connections. An attention gate is defined to summarize the individual contribution from multiple previous hidden states in computing the current state. We evaluate RRA on three tasks: the adding problem, pixelbypixel MNIST classification and sentiment analysis on the IMDB dataset. Our experiments demonstrate that RRA yields better performance, faster convergence and more stable training compared to a standard LSTM network. Furthermore, RRA shows highly competitive performance to the stateoftheart methods. 
Recurrent Neural Network with Tensor Train  Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many stateoftheart performance on various complex problems. However, most of the stateoftheart RNNs have millions of parameters and require many computational resources for training and predicting new data. This paper proposes an alternative RNN model to reduce the number of parameters significantly by representing the weight parameters based on Tensor Train (TT) format. In this paper, we implement the TTformat representation for several RNN architectures such as simple RNN and Gated Recurrent Unit (GRU). We compare and evaluate our proposed RNN model with uncompressed RNN model on sequence classification and sequence prediction tasks. Our proposed RNNs with TTformat are able to preserve the performance while reducing the number of RNN parameters significantly up to 40 times smaller. 
Recurrent Predictive State Policy Network (RPSP) 
We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially observable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward. The recursive filter leverages predictive state representations (PSRs) (Rosencrantz and Gordon, 2004; Sun et al., 2016) by modeling predictive state– a prediction of the distribution of future observations conditioned on history and future actions. This representation gives rise to a rich class of statistically consistent algorithms (Hefny et al., 2018) to initialize the recursive filter. Predictive state serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSPnetwork can be purely reactive, simplifying training while still allowing optimal behaviour. Moreover, we use the PSR interpretation during training as well, by incorporating prediction error in the loss function. The entire network (recursive filter and reactive policy) is still differentiable and can be trained using gradient based methods. We optimize our policy using a combination of policy gradient based on rewards (Williams, 1992) and gradient descent based on prediction error. We show the efficacy of RPSPnetworks under partial observability on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSPnetworks perform well compared with memorypreserving networks such as GRUs, as well as finite memory models, being the overall best performing method. 
Recurrent Relational Network  Humans possess an ability to abstractly reason about objects and their interactions, an ability not shared with stateoftheart deep learning models. Relational networks, introduced by Santoro et al. (2017), add the capacity for relational reasoning to deep neural networks, but are limited in the complexity of the reasoning tasks they can address. We introduce recurrent relational networks which increase the suite of solvable tasks to those that require an order of magnitude more steps of relational reasoning. We use recurrent relational networks to solve Sudoku puzzles and achieve stateoftheart results by solving 96.6% of the hardest Sudoku puzzles, where relational networks fail to solve any. We also apply our model to the BaBi textual QA dataset solving 19/20 tasks which is competitive with stateoftheart sparse differentiable neural computers. The recurrent relational network is a general purpose module that can augment any neural network model with the capacity to do manystep relational reasoning. 
Recurrent Spatial Transformer Networks (RNNSPN) 
We integrate the recently proposed spatial transformer network (SPN) into a recurrent neural network (RNN) to form an RNNSPN model. We use the RNNSPN to classify digits in cluttered MNIST sequences. The proposed model achieves a single digit error of 1.5% compared to 2.9% for a convolutional networks and 2.0% for convolutional networks with SPN layers. The SPN outputs a zoomed, rotated and skewed version of the input image. We investigate different downsampling factors (ratio of pixel in input and output) for the SPN and show that the RNNSPN model is able to downsample the input images without deteriorating performance. The downsampling in RNNSPN can be thought of as adaptive downsampling that minimizes the information loss in the regions of interest. We attribute the superior performance of the RNNSPN to the fact that it can attend to a sequence of regions of interest. GitXiv 
Recurrent Transformer Network (RTN) 
We present recurrent transformer networks (RTNs) for obtaining dense correspondences between semantically similar images. Our networks accomplish this through an iterative process of estimating spatial transformations between the input images and using these transformations to generate aligned convolutional activations. By directly estimating the transformations between an image pair, rather than employing spatial transformer networks to independently normalize each individual image, we show that greater accuracy can be achieved. This process is conducted in a recursive manner to refine both the transformation estimates and the feature representations. In addition, a technique is presented for weaklysupervised training of RTNs that is based on a proposed classification loss. With RTNs, stateoftheart performance is attained on several benchmarks for semantic correspondence. 
Recurrent Value Function (RVF) 
Despite recent successes in Reinforcement Learning, valuebased methods often suffer from high variance hindering performance. In this paper, we illustrate this in a continuous control setting where state of the art methods perform poorly whenever sensor noise is introduced. To overcome this issue, we introduce Recurrent Value Functions (RVFs) as an alternative to estimate the value function of a state. We propose to estimate the value function of the current state using the value function of past states visited along the trajectory. Due to the nature of their formulation, RVFs have a natural way of learning an emphasis function that selectively emphasizes important states. First, we establish RVF’s asymptotic convergence properties in tabular settings. We then demonstrate their robustness on a partially observable domain and continuous control tasks. Finally, we provide a qualitative interpretation of the learned emphasis function. 
Recurrently Controlled Recurrent Network (RCRN) 
Recurrent neural networks (RNNs) such as long shortterm memory and gated recurrent units are pivotal building blocks across a broad spectrum of sequence modeling problems. This paper proposes a recurrently controlled recurrent network (RCRN) for expressive and powerful sequence encoding. More concretely, the key idea behind our approach is to learn the recurrent gating functions using recurrent networks. Our architecture is split into two components – a controller cell and a listener cell whereby the recurrent controller actively influences the compositionality of the listener cell. We conduct extensive experiments on a myriad of tasks in the NLP domain such as sentiment analysis (SST, IMDb, Amazon reviews, etc.), question classification (TREC), entailment classification (SNLI, SciTail), answer selection (WikiQA, TrecQA) and reading comprehension (NarrativeQA). Across all 26 datasets, our results demonstrate that RCRN not only consistently outperforms BiLSTMs but also stacked BiLSTMs, suggesting that our controller architecture might be a suitable replacement for the widely adopted stacked architecture. 
Recursive Bayesian Estimation  Recursive Bayesian estimation, also known as a Bayes filter, is a general probabilistic approach for estimating an unknown probability density function recursively over time using incoming measurements and a mathematical process model. 
Recursive Bayesian Pruning (RBP) 
Recently, compression and acceleration of deep neural networks are in critic need. Bayesian generalization of structured pruning represents an important research direction to solve the above problem. However, the existing Bayesian methods ignore the dependency among neurons and filters for computational simplicity. In this study, we explore, under Bayesian framework, a structured pruning method with layerwise sequential dependency assumed, a more general learning setting. Based on the property of Dirac distribution, we further derive a new dropout noise, which makes it possible to approximate the posterior of dropout noise knowing that of the previous layer. With the Diraclike dropout noise, we further propose a recursive strategy, named \emph{Recursive Bayesian Pruning} (RBP), to train and prune networks in a layerbylayer fashion. The unimportant neurons and filters are directly targeted and removed, taking the influence from the previous layer. Experiments on typical neural networks LeNet300100, LeNet5 and VGG16 have demonstrated the proposed method are competitive with or even outperform the stateoftheart methods in several compression and acceleration metrics. 
Recursive Feature Elimination (RFE) 
Recursive feature elimination (RFE) is a featureselection strategy. It performs in two nested levels of crossvalidation. First it tries to divide the training set into N folds. RFE puts one fold aside for testing the generalization and then trains itself with the remaining data. http://…efeatureeliminationcoupledtosvminr http://…/recursivefeatureeliminationrfe http://…0Selection%20from%20Microarray%20Data.pdf pathClass 
Recursive Neural Network (RNN) 
A recursive neural network (RNN) is a kind of deep neural network created by applying the same set of weights recursively over a structure, to produce a structured prediction over variablelength input, or a scalar prediction on it, by traversing a given structure in topological order. RNNs have been successful in learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding. RNNs have first been introduced to learn distributed representations of structure, such as logical terms. 
Recursive Partitioning  Recursive partitioning is a statistical method for multivariable analysis. Recursive partitioning creates a decision tree that strives to correctly classify members of the population based on several dichotomous independent variables. A variation is ‘Cox linear recursive partitioning’. 
Recursively Decomposing the function into locally Independent Subspaces (RDIS) 
Continuous optimization is an important problem in many areas of AI, including vision, robotics, probabilistic inference, and machine learning. Unfortunately, most realworld optimization problems are nonconvex, causing standard convex techniques to find only local optima, even with extensions like random restarts and simulated annealing. We observe that, in many cases, the local modes of the objective function have combinatorial structure, and thus ideas from combinatorial optimization can be brought to bear. Based on this, we propose a problemdecomposition approach to nonconvex optimization. Similarly to DPLLstyle SAT solvers and recursive conditioning in probabilistic inference, our algorithm, RDIS, recursively sets variables so as to simplify and decompose the objective function into approximately independent subfunctions, until the remaining functions are simple enough to be optimized by standard techniques like gradient descent. The variables to set are chosen by graph partitioning, ensuring decomposition whenever possible. We show analytically that RDIS can solve a broad class of nonconvex optimization problems exponentially faster than gradient descent with random restarts. Experimentally, RDIS outperforms standard techniques on problems like structure from motion and protein folding. GitXiv 
Recycled Alternating Direction Method of Multiplier (RADMM) 
Alternating direction method of multiplier (ADMM) is a powerful method to solve decentralized convex optimization problems. In distributed settings, each node performs computation with its local data and the local results are exchanged among neighboring nodes in an iterative fashion. During this iterative process the leakage of data privacy arises and can accumulate significantly over many iterations, making it difficult to balance the privacyutility tradeoff. In this study we propose Recycled ADMM (RADMM), where a linear approximation is applied to every even iteration, its solution directly calculated using only results from the previous, odd iteration. It turns out that under such a scheme, half of the updates incur no privacy loss and require much less computation compared to the conventional ADMM. We obtain a sufficient condition for the convergence of RADMM and provide the privacy analysis based on objective perturbation. 
Redescription Mining  In many realworld data analysis tasks, we have different types of data over the same objects or entities, perhaps because the data originate from distinct sources or are based on different terminologies. In order to understand such data, an intuitive approach is to identify the correspondences that exist between these different aspects. This is the motivating principle behind redescription mining, a data analysis task that aims at finding distinct common characterizations of the same objects. 
ReDial  There has been growing interest in using neural networks and deep learning techniques to create dialogue systems. Conversational recommendation is an interesting setting for the scientific exploration of dialogue with natural language as the associated discourse involves goaldriven dialogue that often transforms naturally into more freeform chat. This paper provides two contributions. First, until now there has been no publicly available largescale dataset consisting of realworld dialogues centered around recommendations. To address this issue and to facilitate our exploration here, we have collected ReDial, a dataset consisting of over 10,000 conversations centered around the theme of providing movie recommendations. We make this data available to the community for further research. Second, we use this dataset to explore multiple facets of conversational recommendations. In particular we explore new neural architectures, mechanisms, and methods suitable for composing conversational recommendation systems. Our dataset allows us to systematically probe model subcomponents addressing different parts of the overall problem domain ranging from: sentiment analysis and coldstart recommendation generation to detailed aspects of how natural language is used in this setting in the real world. We combine such subcomponents into a fullblown dialogue system and examine its behavior. 
Redis  Redis is an open source, BSD licensed, advanced keyvalue cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs. You can run atomic operations on these types, like appending to a string; incrementing the value in a hash; pushing an element to a list; computing set intersection, union and difference; or getting the member with highest ranking in a sorted set. In order to achieve its outstanding performance, Redis works with an inmemory dataset. Depending on your use case, you can persist it either by dumping the dataset to disk every once in a while, or by appending each command to a log. Persistence can be optionally disabled, if you just need a featurerich, networked, inmemory cache. Redis also supports trivialtosetup masterslave asynchronous replication, with very fast nonblocking first synchronization, autoreconnection with partial resynchronization on net split. RcppRedis 
ReDMark  Due to the rapid growth of machine learning tools and specifically deep networks in various computer vision and image processing areas, application of Convolutional Neural Networks for watermarking have recently emerged. In this paper, we propose a deep endtoend diffusion watermarking framework (ReDMark) which can be adapted for any desired transform space. The framework is composed of two Fully Convolutional Neural Networks with the residual structure for embedding and extraction. The whole deep network is trained endtoend to conduct a blind secure watermarking. The framework is customizable for the level of robustness vs. imperceptibility. It is also adjustable for the tradeoff between capacity and robustness. The proposed framework simulates various attacks as a differentiable network layer to facilitate endtoend training. For JPEG attack, a differentiable approximation is utilized, which drastically improves the watermarking robustness to this attack. Another important characteristic of the proposed framework, which leads to improved security and robustness, is its capability to diffuse watermark information among a relatively wide area of the image. Comparative results versus recent stateoftheart researches highlight the superiority of the proposed framework in terms of imperceptibility and robustness. 
RedNet  Indoor semantic segmentation has always been a difficult task in computer vision. In this paper, we propose an RGBD residual encoderdecoder architecture, named RedNet, for indoor RGBD semantic segmentation. In RedNet, the residual module is applied to both the encoder and decoder as the basic building block, and the skipconnection is used to bypass the spatial feature between the encoder and decoder. In order to incorporate the depth information of the scene, a fusion structure is constructed, which makes inference on RGB image and depth image separately, and fuses their features over several layers. In order to efficiently optimize the network’s parameters, we propose a `pyramid supervision’ training scheme, which applies supervised learning over different layers in the decoder, to cope with the problem of gradients vanishing. Experiment results show that the proposed RedNet(ResNet50) achieves a stateoftheart mIoU accuracy of 47.8\% on the SUN RGBD benchmark dataset. 
RedSync  Data parallelism has already become a dominant method to scale Deep Neural Network (DNN) training to multiple computation nodes. Considering that the synchronization of local model or gradient between iterations can be a bottleneck for largescale distributed training, compressing communication traffic has gained widespread attention recently. Among several recent proposed compression algorithms, Residual Gradient Compression (RGC) is one of the most successful approaches—it can significantly compress the message size (0.1% of the original size) and still preserve accuracy. However, the literature on compressing deep networks focuses almost exclusively on finding good compression rate, while the efficiency of RGC in real implementation has been less investigated. In this paper, we explore the potential of application RGC method in the real distributed system. Targeting the widely adopted multiGPU system, we proposed an RGC system design call RedSync, which includes a set of optimizations to reduce communication bandwidth while introducing limited overhead. We examine the performance of RedSync on two different multiple GPU platforms, including a supercomputer and a multicard server. Our test cases include image classification and language modeling tasks on Cifar10, ImageNet, Penn Treebank and Wiki2 datasets. For DNNs featured with high communication to computation ratio, which have long been considered with poor scalability, RedSync shows significant performance improvement. 
Reduced Dynamic Chain Event Graph (RDCEG) 
In this paper we introduce a new class of probabilistic graphical models called the Reduced Dynamic Chain Event Graph (RDCEG) which is a novel mixture of a Chain Event Graph (CEG) and a semiMarkov process (SMP). It has been demonstrated that many realworld scenarios, particularly in the domain of public health and security, can be modelled as an unfolding of events in the life histories of individuals. Our interest not only lies in the future trajectories of an individual with a specified history and set of characteristics but also in the timescale associated with these developments. Such information is critical in developing suitable interventions and informs the prioritisation of policy decisions. The RDCEG was born out of the need for such a model. It is a coloured graph which inherits useful properties like fast conjugate model selection, conditional independence interrogations and a support for causal interventions from the family of probabilistic graphical models. Its novelty lies in its underlying semiMarkov structure which offers the flexibility of the holding time at each state being any arbitrary distribution. We demonstrate this new decision support system with a simulated intervention to reduce falls in the elderly. 
ReducedOrder Model (ROM) 
Statistical closure modeling for reducedorder models of stationary systems by the ROMES method 
ReducedOrderModel Error Surrogates (ROMES) 
Statistical closure modeling for reducedorder models of stationary systems by the ROMES method 
ReducedRank Regression  The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating eigenvalues and eigenvectors. We give a number of different applications to regression and time series analysis, and show how the reduced rank regression estimator can be derived as a Gaussian maximum likelihood estimator. rrr 
reductus  The online data reduction service reductus transforms measurements in experimental science from laboratory coordinates into physically meaningful quantities with accurate estimation of uncertainties based on instrumental settings and properties. This reduction process is based on a few wellknown transformations, but flexibility in the application of the transforms and algorithms supports flexibility in experiment design, supporting a broader range of measurements than a rigid reduction scheme for data. The user interface allows easy construction of arbitrary pipelines from wellknown data transforms using a visual dataflow diagram. Source data is drawn from a networked, open data repository. The Python backend uses intelligent caching to store intermediate results of calculations for a highly responsive user experience. The reference implementation allows immediate reduction of measurements as they are recorded for the three neutron reflectometry instruments at the NIST Center for Neutron Research (NCNR), without the need for visiting scientists to install additional software on their own computers. 
Redundancy Analysis (RDA) 
Redundancy analysis (RDA) is a form of constrained ordination that examines how much of the variation in one set of variables explains the variation in another set of variables. It is the multivariate analog of simple linear regression. Redundancy analysis is based on similar principles as principal components analysis and thus makes similar assumptions about the data. It is appropriate when the expected relationship between dependent and independent variables is linear (e.g. climate and allele frequency). 
Reed’s Law  Reed’s law is the assertion of David P. Reed that the utility of large networks, particularly social networks, can scale exponentially with the size of the network. The reason for this is that the number of possible subgroups of network participants is 2N − N − 1, where N is the number of participants. This grows much more rapidly than either · the number of participants, N, or · the number of possible pair connections, N(N − 1)/2 (which follows Metcalfe’s law), so that even if the utility of groups available to be joined is very small on a peergroup basis, eventually the network effect of potential group membership can dominate the overall economics of the system. 
RefCurv  In medicine, reference curves serve as an important tool for everyday clinical practice. Pediatricians assess the growth process of children with the help of percentile curves serving as norm references. The mathematical methods for the construction of these reference curves are sophisticated and often require technical knowledge beyond the scope of physicians. An easytouse software for life scientists and physicians is missing. As a consequence, most medical publications do not document the construction properly. This project aims to develop a software that enables nontechnical users to apply modern statistical methods to create and analyze reference curves. In this paper, we present RefCurv, a software that facilitates the construction of reference curves. The software comprises functionalities to select and visualize data. Users can fit models to the data and graphically present them as percentile curves. Furthermore, the software provides features to highlight possible outliers, perform model selection, and analyze the sensitivity. RefCurv is an opensource software with a graphical user interface (GUI) written in Python. It uses R and the gamlss addon package (Rigby and Stasinopoulos (2005)) as the underlying statistical engine. In summary, RefCurv is the first software based on the gamlss package, which enables practitioners to construct and analyze reference curves in a userfriendly GUI. In broader terms, the software brings together the fields of statistical learning and medical application. Consequently, RefCurv can help to establish the construction of reference curves in other medical fields. 
Reference Measure  We consider settings in which the distribution of a multivariate random variable is partly ambiguous. We assume the ambiguity lies on the level of dependence structure, and that the marginal distributions are known. Furthermore, a current best guess for the distribution, called reference measure, is available. We work with the set of distributions that are both close to the given reference measure in a transportation distance (e.g. the Wasserstein distance), and additionally have the correct marginal structure. The goal is to find upper and lower bounds for integrals of interest with respect to distributions in this set. The described problem appears naturally in the context of risk aggregation. When aggregating different risks, the marginal distributions of these risks are known and the task is to quantify their joint effect on a given system. This is typically done by applying a meaningful risk measure to the sum of the individual risks. For this purpose, the stochastic interdependencies between the risks need to be specified. In practice the models of this dependence structure are however subject to relatively high model ambiguity. The contribution of this paper is twofold: Firstly, we derive a dual representation of the considered problem and prove that strong duality holds. Secondly, we propose a generally applicable and computationally feasible method, which relies on neural networks, in order to numerically solve the derived dual problem. The latter method is tested on a number of toy examples, before it is finally applied to perform robust risk aggregation in a real world instance. 
Referenced Metric and Unreferenced Metric Blended Evaluation Routine (RUBER) 
Opendomain humancomputer conversation has been attracting increasing attention over the past few years. However, there does not exist a standard automatic evaluation metric for opendomain dialog systems; researchers usually resort to human annotation for model evaluation, which is time and laborintensive. In this paper, we propose RUBER, a Referenced metric and Unreferenced metric Blended Evaluation Routine, which evaluates a reply by taking into consideration both a groundtruth reply and a query (previous user utterance). Our metric is learnable, but its training does not require labels of human satisfaction. Hence, RUBER is flexible and extensible to different datasets and languages. Experiments on both retrieval and generative dialog systems show that RUBER has high correlation with human annotation. 
Refinery  Refinery is an open source platform for the massive analysis of large unstructured document collections using the latest state of the art topic models. The goal of Refinery is to simplify this process within an intuitive webbased interface. What makes Refinery unique is that its meant to be run locally, thus bypassing the need for securing document collections over the internet. Refinery was developed by myself and Ben Swanson at MIT Media Lab. It was also the recipient of the Knight Prototype Award in 2014. 
Reflection Principle  In the theory of probability for stochastic processes, the reflection principle for a Wiener process states that if the path of a Wiener process f(t) reaches a value f(s) = a at time t = s, then the subsequent path after time s has the same distribution as the reflection of the subsequent path about the value a. More formally, the reflection principle refers to a lemma concerning the distribution of the supremum of the Wiener process, or Brownian motion. The result relates the distribution of the supremum of Brownian motion up to time t to the distribution of the process at time t. It is a corollary of the strong Markov property of Brownian motion. A Direct Proof of the Reflection Principle for Brownian Motion 
Reflective Oracles  Classical game theory treats players as special – a description of a game contains a full, explicit enumeration of all players – even though in the real world, ‘players’ are no more fundamentally special than rocks or clouds. It isn’t trivial to find a decisiontheoretic foundation for game theory in which an agent’s coplayers are a nondistinguished part of the agent’s environment. Attempts to model both players and the environment as Turing machines, for example, fail for standard diagonalization reasons. In this paper, we introduce a ‘reflective’ type of oracle, which is able to answer questions about the outputs of oracle machines with access to the same oracle. These oracles avoid diagonalization by answering some queries randomly. We show that machines with access to a reflective oracle can be used to de ne rational agents using causal decision theory. These agents model their environment as a probabilistic oracle machine, which may contain other agents as a nondistinguished part. We show that if such agents interact, they will play a Nash equilibrium, with the randomization in mixed strategies coming from the randomization in the oracle’s answers. This can be seen as providing a foundation for classical game theory in which players aren’t special. 
Reforwarding  Deep Neutral Networks(DNN) require huge GPU memory when training on modern image/video databases. Unfortunately, the GPU memory is always finite, which limits the image resolution, batch size, and learning rate that could be tuned for better performances. In this paper, we propose a novel approach, called Reforwarding, that substantially reduces memory usage in training. Our approach only saves the tensors at a subset of layers during the first forward, and conduct extra local forwards (the Reforwarding process) to compute the missing tensors needed during backward. The total memory cost becomes the sum of (1) the cost at the subset of layers and (2) the maximum cost of the reforwarding processes. We propose theories and algorithms that achieve the optimal memory solutions for DNNs with either linear or arbitrary optimization graphs. Experiments show that Reforwarding cut down huge amount of training memory on all popular DNNs such as Alexnet, VGG net, ResNet, Densenet and Inception net. 
Refutation Complexity  The sample complexity of learning a Booleanvalued function class is precisely characterized by its Rademacher complexity. This has little bearing, however, on the sample complexity of \emph{efficient} agnostic learning. We introduce \emph{refutation complexity}, a natural computational analog of Rademacher complexity of a Boolean concept class and show that it exactly characterizes the sample complexity of \emph{efficient} agnostic learning. Informally, refutation complexity of a class $\mathcal{C}$ is the minimum number of examplelabel pairs required to efficiently distinguish between the case that the labels correlate with the evaluation of some member of $\mathcal{C}$ (\emph{structure}) and the case where the labels are i.i.d. Rademacher random variables (\emph{noise}). The easy direction of this relationship was implicitly used in the recent framework for improper PAC learning lower bounds of Daniely and coauthors via connections to the hardness of refuting random constraint satisfaction problems. Our work can be seen as making the relationship between agnostic learning and refutation implicit in their work into an explicit equivalence. In a recent, independent work, Salil Vadhan discovered a similar relationship between refutation and PAClearning in the realizable (i.e. noiseless) case. 
Regime Shift  In ecology, regime shifts are large, abrupt, persistent changes in the structure and function of a system. A regime is a characteristic behaviour of a system which is maintained by mutually reinforced processes or feedbacks. Regimes are considered persistent relative to the time period over which the shift occurs. The change of regimes, or the shift, usually occurs when a smooth change in an internal process (feedback) or a single disturbance (external shocks) triggers a completely different system behavior. Although such nonlinear changes have been widely studied in different disciplines ranging from atoms to climate dynamics, regime shifts have gained importance in ecology because they can substantially affect the flow of ecosystem services that societies rely upon, such as provision of food, clean water or climate regulation. Moreover, regime shift occurrence is expected to increase as human influence on the planet increases – the Anthropocene – including current trends on human induced climate change and biodiversity loss. 
Regime Switching Model (RSM) 
Many time series data, such as stock market conditions, government policy changes, weather patterns, and so on, follow different dynamics in different time periods; this behavior is called structural change or regime switching. One type of model for this kind of behavior is the regimeswitching model (RSM). RSMs enable you to assign different sets of parameter values to different regimes and model the transition probabilities between regimes. They have been powerful tools for sequential data analysis (including time series analysis) in finance, economics, science, and engineering for several decades. An Introduction to Regime Switching Time Series Models 
Region of Practical Equivalence (ROPE) 
Connecting Bayes factor and the Region of Practical Equivalence (ROPE) Procedure for testing interval null hypothesis 
Register Match Automata (RMA) 
We propose an automaton model which is a combination of symbolic and register automata, i.e., we enrich symbolic automata with memory. We call such automata Register Match Automata (RMA). RMA extend the expressive power of symbolic automata, by allowing Boolean formulas to be applied not only to the last element read from the input string, but to multiple elements, stored in their registers. RMA also extend register automata, by allowing arbitrary Boolean formulas, besides equality predicates. We study the closure properties of RMA under union, concatenation, Kleene closure, complement and determinization and show that RMA, contrary to symbolic automata, are not in general closed under determinization, but they are when a window operator, quintessential in Complex Event Processing, is used. We present detailed algorithms for constructing deterministic RMA from regular expressions extended with Boolean constraints, when windowing is used. We show how RMA can be used in Complex Event Processing in order to detect patterns upon streams of events, using a framework that provides denotational and compositional semantics, and that allows for a systematic treatment of such automata. 
Regressing Word Embedding (ReWE) 
Regularization of neural machine translation is still a significant problem, especially in lowresource settings. To mollify this problem, we propose regressing word embeddings (ReWE) as a new regularization technique in a system that is jointly trained to predict the next word in the translation (categorical value) and its word embedding (continuous value). Such a joint training allows the proposed system to learn the distributional properties represented by the word embeddings, empirically improving the generalization to unseen sentences. Experiments over three translation datasets have showed a consistent improvement over a strong baseline, ranging between 0.91 and 2.54 BLEU points, and also a marked improvement over a stateoftheart system. 
Regression Analysis  In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or ‘criterion variable’) changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution. 
Regression Calibration (RC) 
Medical studies that depend on electronic health records (EHR) data are often subject to measurement error as the data are not collected to support research questions under study. Methodology to address covariate measurement error has been well developed; however, timetoevent error has also been shown to cause significant bias but methods to address it are relatively underdeveloped. More generally, it is possible to observe errors in both the covariate and the timetoevent outcome that are correlated. We propose regression calibration (RC) estimators to simultaneously address correlated error in the covariates and the censored event time. Although RC can perform well in many settings with covariate measurement error, it is biased for nonlinear regression models, such as the Cox model. Thus, we additionally propose raking estimators which are consistent estimators of the parameter defined by the population estimating equations, can improve upon RC in certain settings with failuretime data, require no explicit modeling of the error structure, and can be utilized under outcomedependent sampling designs. We discuss features of the underlying estimation problem that affect the degree of improvement the raking estimator has over the RC approach. Detailed simulation studies are presented to examine the performance of the proposed estimators under varying levels of signal, error, and censoring. The methodology is illustrated on observational EHR data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic. 
Regression Discontinuity Design (RDD) 
In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasiexperimental pretestposttest design that elicits the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the local Average treatment effect in environments in which randomization was unfeasible. First applied by Donald Thistlewaite and Donald Campbell to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. rddtools 
Regression Nomogram Plot  regplot 
Regression toward the mean / Regression to the mean  In statistics, regression toward (or to) the mean is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurementand, paradoxically, if it is extreme on its second measurement, it will tend to have been closer to the average on its first. To avoid making incorrect inferences, regression toward the mean must be considered when designing scientific experiments and interpreting data. 
Regression Tree  A dataanalysis method that recursively partitions data into sets each of which are simply modeled using regression methods. 
Regression Tsetlin Machine (RTM) 
The recently introduced Tsetlin Machine (TM) has provided competitive pattern classification accuracy in several benchmarks, composing patterns with easytointerpret conjunctive clauses in propositional logic. In this paper, we go beyond pattern classification by introducing a new type of TMs, namely, the Regression Tsetlin Machine (RTM). In all brevity, we modify the inner inference mechanism of the TM so that input patterns are transformed into a single continuous output, rather than to distinct categories. We achieve this by: (1) using the conjunctive clauses of the TM to capture arbitrarily complex patterns; (2) mapping these patterns to a continuous output through a novel voting and normalization mechanism; and (3) employing a feedback scheme that updates the TM clauses to minimize the regression error. The feedback scheme uses a new activation probability function that stabilizes the updating of clauses, while the overall system converges towards an accurate inputoutput mapping. The performance of the proposed approach is evaluated using six different artificial datasets with and without noise. The performance of the RTM is compared with the Classical Tsetlin Machine (CTM) and the Multiclass Tsetlin Machine (MTM). Our empirical results indicate that the RTM obtains the best training and testing results for both noisy and noisefree datasets, with a smaller number of clauses. This, in turn, translates to higher regression accuracy, using significantly less computational resources. 
RegressionEnhanced Random Forest (RERF) 
Random forest (RF) methodology is one of the most popular machine learning techniques for prediction problems. In this article, we discuss some cases where random forests may suffer and propose a novel generalized RF method, namely regressionenhanced random forests (RERFs), that can improve on RFs by borrowing the strength of penalized parametric regression. The algorithm for constructing RERFs and selecting its tuning parameters is described. Both simulation study and real data examples show that RERFs have better predictive performance than RFs in important situations often encountered in practice. Moreover, RERFs may incorporate known relationships between the response and the predictors, and may give reliable predictions in extrapolation problems where predictions are required at points out of the domain of the training dataset. Strategies analogous to those described here can be used to improve other machine learning methods via combination with penalized parametric regression techniques. 
RegressionviaClassification  RegressionviaClassification (RvC) is the process of converting a regression problem to a classification one. 
Regret  Regret is the negative emotion experienced when learning that an alternative course of action would have resulted in a more favorable outcome. The theory of regret aversion or anticipated regret proposes that when facing a decision, individuals may anticipate the possibility of feeling regret after the uncertainty is resolved and thus incorporate in their choice their desire to eliminate or reduce this possibility. 
Regret Minimizing Set  A regret minimizing set Q is a small size representation of a much larger database P so that user queries executed on Q return answers whose scores are not much worse than those on the full dataset. In particular, a kregret minimizing set has the property that the regret ratio between the score of the top1 item in Q and the score of the topk item in P is minimized, where the score of an item is the inner product of the item’s attributes with a user’s weight (preference) vector. The problem is challenging because we want to find a single representative set Q whose regret ratio is small with respect to all possible user weight vectors. We show that kregret minimization is NPComplete for all dimensions d >= 3. This settles an open problem from Chester et al. [VLDB 2014], and resolves the complexity status of the problem for all d: the problem is known to have polynomialtime solution for d <= 2. In addition, we propose two new approximation schemes for regret minimization, both with provable guarantees, one based on coresets and another based on hitting sets. We also carry out extensive experimental evaluation, and show that our schemes compute regretminimizing sets comparable in size to the greedy algorithm proposed in [VLDB 14] but our schemes are significantly faster and scalable to large data sets. 
Regret With Rolling Window  Nowadays, online learning is an appealing learning paradigm, which is of great interest in practice due to the recent emergence of large scale applications such as online advertising placement and online web ranking. Standard online learning assumes a finite number of samples while in practice data is streamed infinitely. In such a setting gradient descent with a diminishing learning rate does not work. We first introduce regret with rolling window, a new performance metric for online streaming learning, which measures the performance of an algorithm on every fixed number of contiguous samples. At the same time, we propose a family of algorithms based on gradient descent with a constant or adaptive learning rate and provide very technical analyses establishing regret bound properties of the algorithms. We cover the convex setting showing the regret of the order of the square root of the size of the window in the constant and dynamic learning rate scenarios. Our proof is applicable also to the standard online setting where we provide the first analysis of the same regret order (the previous proofs have flaws). We also study a two layer neural network setting with ReLU activation. In this case we establish that if initial weights are close to a stationary point, the same square root regret bound is attainable. We conduct computational experiments demonstrating a superior performance of the proposed algorithms. 
Regularization  Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, refers to a process of introducing additional information in order to solve an illposed problem or to prevent overfitting. This information is usually of the form of a penalty for complexity, such as restrictions for smoothness or bounds on the vector space norm. 
Regularization Approach for InstanceBased Superset Label Learning (RegISL) 
Different from the traditional supervised learning in which each training example has only one explicit label, superset label learning (SLL) refers to the problem that a training example can be associated with a set of candidate labels, and only one of them is correct. Existing SLL methods are either regularizationbased or instancebased, and the latter of which has achieved stateoftheart performance. This is because the latest instancebased methods contain an explicit disambiguation operation that accurately picks up the groundtruth label of each training example from its ambiguous candidate labels. However, such disambiguation operation does not fully consider the mutually exclusive relationship among different candidate labels, so the disambiguated labels are usually generated in a nondiscriminative way, which is unfavorable for the instancebased methods to obtain satisfactory performance. To address this defect, we develop a novel regularization approach for instancebased superset label (RegISL) learning so that our instancebased method also inherits the good discriminative ability possessed by the regularization scheme. Specifically, we employ a graph to represent the training set, and require the examples that are adjacent on the graph to obtain similar labels. More importantly, a discrimination term is proposed to enlarge the gap of values between possible labels and unlikely labels for every training example. As a result, the intrinsic constraints among different candidate labels are deployed, and the disambiguated labels generated by RegISL are more discriminative and accurate than those output by existing instancebased algorithms. The experimental results on various tasks convincingly demonstrate the superiority of our RegISL to other typical SLL methods in terms of both training accuracy and test accuracy. 
Regularization by Denoising (RED) 
Proposed by Romano, Elad, and Milanfar, is powerful new imagerecovery framework that aims to construct an explicit regularization objective from a plugin imagedenoising function. Evidence suggests that the RED algorithms are, indeed, stateoftheart. However, a closer inspection suggests that explicit regularization may not explain the workings of these algorithms. Regularization by Denoising: Clarifications and New Interpretations 
Regularization Learning Network (RLN) 
Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabulardataset learning tasks. We propose that applying a different regularization coefficient to each weight might boost the performance of DNNs by allowing them to make more use of the more relevant inputs. However, this will lead to an intractable number of hyperparameters. Here, we introduce Regularization Learning Networks (RLNs), which overcome this challenge by introducing an efficient hyperparameter tuning scheme that minimizes a new Counterfactual Loss. Our results show that RLNs significantly improve DNNs on tabular datasets, and achieve comparable results to GBTs, with the best performance achieved with an ensemble that combines GBTs and RLNs. RLNs produce extremely sparse networks, eliminating up to 99.8% of the network edges and 82% of the input features, thus providing more interpretable models and reveal the importance that the network assigns to different inputs. RLNs could efficiently learn a single network in datasets that comprise both tabular and unstructured data, such as in the setting of medical imaging accompanied by electronic health records. 
Regularization Methods  Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, refers to a process of introducing additional information in order to solve an illposed problem or to prevent overfitting. This information is usually of the form of a penalty for complexity, such as restrictions for smoothness or bounds on the vector space norm. A theoretical justification for regularization is that it attempts to impose Occam’s razor on the solution. From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters. 
Regularize, Expand and Compress (REC) 
Lifelong learning, the problem of continual learning where tasks arrive in sequence, has been lately attracting more attention in the computer vision community. The aim of lifelong learning is to develop a system that can learn new tasks while maintaining the performance on the previously learned tasks. However, there are two obstacles for lifelong learning of deep neural networks: catastrophic forgetting and capacity limitation. To solve the above issues, inspired by the recent breakthroughs in automatically learning good neural network architectures, we develop a Multitask based lifelong learning via nonexpansive AutoML framework termed Regularize, Expand and Compress (REC). REC is composed of three stages: 1) continually learns the sequential tasks without the learned tasks’ data via a newly proposed multitask weight consolidation (MWC) algorithm; 2) expands the network to help the lifelong learning with potentially improved model capability and performance by networktransformation based AutoML; 3) compresses the expanded model after learning every new task to maintain model efficiency and performance. The proposed MWC and REC algorithms achieve superior performance over other lifelong learning algorithms on four different datasets. 
Regularized Artificial Neural Network (RANN) 
A regularized artificial neural network (RANN) is proposed for intervalvalued data prediction. The ANN model is selected due to its powerful capability in fitting linear and nonlinear functions. To meet mathematical coherence requirement for an interval (i.e., the predicted lower bounds should not cross over their upper bounds), a soft noncrossing regularizer is introduced to the intervalvalued ANN model. We conduct extensive experiments based on both simulation datasets and reallife datasets, and compare the proposed RANN method with multiple traditional models, including the linear constrained center and range method (CCRM), the least absolute shrinkage and selection operatorbased intervalvalued regression method (LassoIR), the nonlinear interval kernel regression (IKR), the interval multilayer perceptron (iMLP) and the multioutput support vector regression (MSVR). Experimental results show that the proposed RANN model is an effective tool for intervalvalued prediction tasks with high prediction accuracy. 
Regularized Contextual Bandits  We consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent must be close to some baseline policy which is known to perform well on the task. To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, and solving simultaneously – and independently – regularized multiarmed bandit instances on each bin. We derive slow and fast rates of convergence, depending on the unknown complexity of the problem. We also consider a new relevant margin condition to get problemindependent convergence rates, ending up in intermediate convergence rates interpolating between the aforementioned slow and fast rates. 
Regularized Deep Embedding Clustering (RDEC) 
Clustering is a fundamental machine learning task and can be used in many applications. With the development of deep neural networks (DNNs), combining techniques from DNNs with clustering has become a new research direction and achieved some success. However, few studies have focused on the imbalanceddata problem which commonly occurs in realworld applications. In this paper, we propose a clustering method, regularized deep embedding clustering (RDEC), that integrates virtual adversarial training (VAT), a network regularization technique, with a clustering method called deep embedding clustering (DEC). DEC optimizes cluster assignments by pushing data more densely around centroids in latent space, but it is sometimes sensitive to the initial location of centroids, especially in the case of imbalanced data, where the minor class has less chance to be assigned a good centroid. RDEC introduces regularization using VAT to ensure the model’s robustness to local perturbations of data. VAT pushes data that are similar in the original space closer together in the latent space, bunching together data from minor classes and thereby facilitating cluster identification by RDEC. Combining the advantages of DEC and VAT, RDEC attains stateoftheart performance on both balanced and imbalanced benchmark/realworld datasets. For example, accuracies are as high as 98.41% on MNIST dataset and 85.45% on a highly imbalanced dataset derived from the MNIST, which is nearly 8% higher than the current best result. 
Regularized Determinantal Point Process (RDPP) 
Given a fixed $n\times d$ matrix $\mathbf{X}$, where $n\gg d$, we study the complexity of sampling from a distribution over all subsets of rows where the probability of a subset is proportional to the squared volume of the parallelopiped spanned by the rows (a.k.a. a determinantal point process). In this task, it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). To that end, we propose a new determinantal point process algorithm which has the following two properties, both of which are novel: (1) a preprocessing step which runs in time $O(\text{numberofnonzeros}(\mathbf{X})\cdot\log n)+\text{poly}(d)$, and (2) a sampling step which runs in $\text{poly}(d)$ time, independent of the number of rows $n$. We achieve this by introducing a new regularized determinantal point process (RDPP), which serves as an intermediate distribution in the sampling procedure by reducing the number of rows from $n$ to $\text{poly}(d)$. Crucially, this intermediate distribution does not distort the probabilities of the target sample. Our key novelty in defining the RDPP is the use of a Poisson random variable for controlling the probabilities of different subset sizes, leading to new determinantal formulas such as the normalization constant for this distribution. Our algorithm has applications in many diverse areas where determinantal point processes have been used, such as machine learning, stochastic optimization, data summarization and lowrank matrix reconstruction. 
Regularized Discriminant Analysis (RDA) 
The regularized discriminant analysis (RDA) is a generalization of the linear discriminant analysis (LDA) and the quadratic discreminant analysis (QDA). Both algorithms are special cases of this algorithm. If the alpha parameter is set to 1, this operator performs LDA. Similarly if the alpha parameter is set to 0, this operator performs QDA. For more information about LDA and QDA please study the documentation of the corresponding operators. Discriminant analysis is used to determine which variables discriminate between two or more naturally occurring groups. For example, an educational researcher may want to investigate which variables discriminate between high school graduates who decide (1) to go to college, (2) NOT to go to college. For that purpose the researcher could collect data on numerous variables prior to students’ graduation. After graduation, most students will naturally fall into one of the two categories. Discriminant Analysis could then be used to determine which variable(s) are the best predictors of students’ subsequent educational choice. Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA). For example, suppose the same student graduation scenario. We could have measured students’ stated intention to continue on to college one year prior to graduation. If the means for the two groups (those who actually went to college and those who did not) are different, then we can say that intention to attend college as stated one year prior to graduation allows us to discriminate between those who are and are not college bound (and this information may be used by career counselors to provide the appropriate guidance to the respective students). The basic idea underlying discriminant analysis is to determine whether groups differ with regard to the mean of a variable, and then to use that variable to predict group membership (e.g., of new cases). Discriminant Analysis may be used for two objectives: either we want to assess the adequacy of classification, given the group memberships of the objects under study; or we wish to assign objects to one of a number of (known) groups of objects. Discriminant Analysis may thus have a descriptive or a predictive objective. In both cases, some group assignments must be known before carrying out the Discriminant Analysis. Such group assignments, or labeling, may be arrived at in any way. Hence Discriminant Analysis can be employed as a useful complement to Cluster Analysis (in order to judge the results of the latter) or Principal Components Analysis. http://…/ESLII_print10.pdf http://…/slacpub4389.pdf http://…/citation.cfm?id=1658388 rda 
Regularized Empirical Risk Minimization (RERM) 
Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on the performance of learning algorithms. Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction 
Regularized Greedy Forest (RGF) 
Regularized Greedy Forest wrapper of the ‘Regularized Greedy Forest’ <https://…/rgf_python> ‘python’ package, which also includes a Multicore implementation (FastRGF) <https://…/fast_rgf>. RGF 
Regularized Kernel  We introduce Regularized Kernel and Neural Sobolev Descent for transporting a source distribution to a target distribution along smooth paths of minimum kinetic energy (defined by the Sobolev discrepancy), related to dynamic optimal transport. In the kernel version, we give a simple algorithm to perform the descent along gradients of the Sobolev critic, and show that it converges asymptotically to the target distribution in the MMD sense. In the neural version, we parametrize the Sobolev critic with a neural network with input gradient norm constrained in expectation. We show in theory and experiments that regularization has an important role in favoring smooth transitions between distributions, avoiding large discrete jumps. Our analysis could provide a new perspective on the impact of critic updates (early stopping) on the paths to equilibrium in the GAN setting. 
Regularized MultiEmbedding (RME) 
Following recent successes in exploiting both latent factor and word embedding models in recommendation, we propose a novel Regularized MultiEmbedding (RME) based recommendation model that simultaneously encapsulates the following ideas via decomposition: (1) which items a user likes, (2) which two users colike the same items, (3) which two items users often coliked, and (4) which two items users often codisliked. In experimental validation, the RME outperforms competing stateoftheart models in both explicit and implicit feedback datasets, significantly improving Recall@5 by 5.9~7.0%, NDCG@20 by 4.3~5.6%, and MAP@10 by 7.9~8.9%. In addition, under the coldstart scenario for users with the lowest number of interactions, against the competing models, the RME outperforms NDCG@5 by 20.2% and 29.4% in MovieLens10M and MovieLens20M datasets, respectively. Our datasets and source code are available at: https://…/RME.git. 
Regularized Nonlinear Acceleration (RNA) 
Nonlinear Acceleration of CNNs 
Regularized Opponent Model With Maximum Entropy Objective (ROMMEO) 
In a singleagent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the ‘optimality’. In this paper, we redefine the binary random variable o in multiagent setting and formalize multiagent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Qiteration method ROMMEOQ with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEOAC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines. 
Regularized Optimal Scaling Regression (ROS Regression) 
In this paper we combine two important extensions of ordinary least squares regression: regularization and optimal scaling. Optimal scaling (sometimes also called optimal scoring) has originally been developed for categorical data, and the process finds quantifications for the categories that are optimal for the regression model in the sense that they maximize the multiple correlation. Although the optimal scaling method was developed initially for variables with a limited number of categories, optimal transformations of continuous variables are a special case. We will consider a variety of transformation types; typically we use step functions for categorical variables, and smooth (spline) functions for continuous variables. Both types of functions can be restricted to be monotonic, preserving the ordinal information in the data. In addition to optimal scaling, three regularization methods will be considered: Ridge regression, the Lasso, and the Elastic Net. The resulting method will be called ROS Regression (Regularized Optimal Scaling Regression. We will show that the basic OS algorithm provides straightforward and efficient estimation of the regularized regression coefficients, automatically gives the Group Lasso and Blockwise Sparse Regression, and extends them with monotonicity properties. We will show that Optimal Scaling linearizes nonlinear relationships between predictors and outcome, and improves upon the condition of the predictor correlation matrix, increasing (on average) the conditional independence of the predictors. Alternative options for regularization of either regression coefficients or category quantifications are mentioned. Extended examples are provided. Keywords: Categorical Data, Optimal Scaling, Conditional Independence, Step Functions, Splines, Monotonic Transformations, Regularization, Lasso, Elastic Net, Group Lasso, Blockwise Sparse Regression. 
REINFORCE  Industrial recommender systems deal with extremely large action spaces — many millions of items to recommend. Moreover, they need to serve billions of users, who are unique at any point in time, making a complex user state space. Luckily, huge quantities of logged implicit feedback (e.g., user clicks, dwell time) are available for learning. Learning from the logged feedback is however subject to biases caused by only observing feedback on recommendations selected by the previous versions of the recommender. In this work, we present a general recipe of addressing such biases in a production topK recommender system at Youtube, built with a policygradientbased algorithm, i.e. REINFORCE. The contributions of the paper are: (1) scaling REINFORCE to a production recommender system with an action space on the orders of millions; (2) applying offpolicy correction to address data biases in learning from logged feedback collected from multiple behavior policies; (3) proposing a novel topK offpolicy correction to account for our policy recommending multiple items at a time; (4) showcasing the value of exploration. We demonstrate the efficacy of our approaches through a series of simulations and multiple live experiments on Youtube. 
REINFORCEbased Genetic Algorithm Learning (REGAL) 
We present a deep reinforcement learning approach to optimizing the execution cost of computation graphs in a static compiler. The key idea is to combine a neural network policy with a genetic algorithm, the Biased RandomKey Genetic Algorithm (BRKGA). The policy is trained to predict, given an input graph to be optimized, the nodelevel probability distributions for sampling mutations and crossovers in BRKGA. Our approach, ‘REINFORCEbased Genetic Algorithm Learning’ (REGAL), uses the policy’s ability to transfer to new graphs to significantly improve the solution quality of the genetic algorithm for the same objective evaluation budget. As a concrete application, we show results for minimizing peak memory in TensorFlow graphs by jointly optimizing device placement and scheduling. REGAL achieves on average 3.56% lower peak memory than BRKGA on previously unseen graphs, outperforming all the algorithms we compare to, and giving 4.4x bigger improvement than the next best algorithm. We also evaluate REGAL on a production compiler team’s performance benchmark of XLA graphs and achieve on average 3.74% lower peak memory than BRKGA, again outperforming all others. Our approach and analysis is made possible by collecting a dataset of 372 unique realworld TensorFlow graphs, more than an order of magnitude more data than previous work. 
Reinforced Continual Learning  Most artificial intelligence models have limiting ability to solve new tasks faster, without forgetting previously acquired knowledge. The recently emerging paradigm of continual learning aims to solve this issue, in which the model learns various tasks in a sequential fashion. In this work, a novel approach for continual learning is proposed, which searches for the best neural architecture for each coming task via sophisticatedly designed reinforcement learning strategies. We name it as Reinforced Continual Learning. Our method not only has good performance on preventing catastrophic forgetting but also fits new tasks well. The experiments on sequential classification tasks for variants of MNIST and CIFAR100 datasets demonstrate that the proposed approach outperforms existing continual learning alternatives for deep networks. 
Reinforced CoTraining  Cotraining is a popular semisupervised learning framework to utilize a large amount of unlabeled data in addition to a small labeled set. Cotraining methods exploit predicted labels on the unlabeled data and select samples based on prediction confidence to augment the training. However, the selection of samples in existing cotraining methods is based on a predetermined policy, which ignores the sampling bias between the unlabeled and the labeled subsets, and fails to explore the data space. In this paper, we propose a novel method, Reinforced CoTraining, to select highquality unlabeled samples to better cotrain on. More specifically, our approach uses Qlearning to learn a data selection policy with a small labeled dataset, and then exploits this policy to train the cotraining classifiers automatically. Experimental results on clickbait detection and generic text classification tasks demonstrate that our proposed method can obtain more accurate text classification results. 
Reinforced EncoderDecoder (RED) 
Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame’s representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced EncoderDecoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequencelevel supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS14 and TVHumanInteraction datasets for action anticipation and achieve stateoftheart performance on all datasets. 
Reinforced Evolutionary Neural Architecture Search (RENAS) 
Neural architecture search (NAS) is an important task in network design, but it remains challenging due to high computational consumption in most methods and low stability in evolution algorithm (EA) based NAS. In this paper, we propose the Reinforced Evolutionary Neural Architecture Search (RENAS), an evolutionary method with reinforced mutation for NAS to address these two issues. Specifically, we integrate reinforced mutation into an EA based NAS method by adopting a mutation controller to learn the effects of slight modifications and make mutation actions. For this reason, the proposed method is more like the process of model design by human experts than typical RLbased NAS methods that construct networks sequentially. Furthermore, as the models are trained by finetuning rather than from scratch in model evaluation, the cellwise search process becomes much more efficient and only takes less than 1.5 days using 4 GPUs (Titan xp). Experimental results demonstrate the effectiveness and efficiency of our method. Moreover, the architecture searched on CIFAR10 sets a new stateoftheart on ImageNet in the mobile setting (top1/5 accuracy = 75.7%/92.6%). 
Reinforced Neural Extractive Summarization (RNES) 
Coherence plays a critical role in producing a highquality summary from a document. In recent years, neural extractive summarization is becoming increasingly attractive. However, most of them ignore the coherence of summaries when extracting sentences. As an effort towards extracting coherent summaries, we propose a neural coherence model to capture the crosssentence semantic and syntactic coherence patterns. The proposed neural coherence model obviates the need for feature engineering and can be trained in an endtoend fashion using unlabeled data. Empirical results show that the proposed neural coherence model can efficiently capture the crosssentence coherence patterns. Using the combined output of the neural coherence model and ROUGE package as the reward, we design a reinforcement learning method to train a proposed neural extractive summarizer which is named Reinforced Neural Extractive Summarization (RNES) model. The RNES model learns to optimize coherence and informative importance of the summary simultaneously. Experimental results show that the proposed RNES outperforms existing baselines and achieves stateoftheart performance in term of ROUGE on CNN/Daily Mail dataset. The qualitative evaluation indicates that summaries produced by RNES are more coherent and readable. 
Reinforced SelfAttention (ReSA) 
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, ‘reinforced selfattention (ReSA)’, for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft selfattention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called ‘reinforced sequence sampling (RSS)’, selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNNfree sentenceencoding model, ‘reinforced selfattention network (ReSAN)’, solely based on ReSA. It achieves stateoftheart performance on both Stanford Natural Language Inference (SNLI) and Sentences Involving Compositional Knowledge (SICK) datasets. 
Reinforced SelfAttention Network (ReSAN) 
➘ “Reinforced SelfAttention” 
REINFORCEjs  REINFORCEjs is a Reinforcement Learning library that implements several common RL algorithms, all with web demos. In particular, the library currently includes: · Dynamic Programming methods · (Tabular) Temporal Difference Learning (SARSA/QLearning) · Deep QLearning for QLearning with function approximation with Neural Networks · Stochastic/Deterministic Policy Gradients and Actor Critic architectures for dealing with continuous action spaces. (very alpha, likely buggy or at the very least finicky and inconsistent) GitHub REINFORCEjs 
Reinforcement Learning (RL) 
Reinforcement learning (RL) is learning by interacting with an environment. An RL agent learns from the consequences of its actions, rather than from being explicitly taught and it selects its actions on basis of its past experiences (exploitation) and also by new choices (exploration), which is essentially trial and error learning. The reinforcement signal that the RLagent receives is a numerical reward, which encodes the success of an action’s outcome, and the agent seeks to learn to select actions that maximize the accumulated reward over time. (The use of the term reward is used here in a neutral fashion and does not imply any pleasure, hedonic impact or other psychological interpretations.) 
Reinforcement Learning and Bayesian Optimization (ReinBo) 
Machine learning pipeline potentially consists of several stages of operations like data preprocessing, feature engineering and machine learning model training. Each operation has a set of hyperparameters, which can become irrelevant for the pipeline when the operation is not selected. This gives rise to a hierarchical conditional hyperparameter space. To optimize this mixed continuous and discrete conditional hierarchical hyperparameter space, we propose an efficient pipeline search and configuration algorithm which combines the power of Reinforcement Learning and Bayesian Optimization. Empirical results show that our method performs favorably compared to state of the art methods like Autosklearn , TPOT, Tree Parzen Window, and Random Search. 
Reinforcement Learning with Parameterized Actions (QPAMDP) 
We introduce a modelfree algorithm for learning in Markov decision processes with parameterized actionsdiscrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with this action. This models domains where there are distinct actions which can be adjusted to a particular state. We introduce the QPAMDP algorithm for learning in these domains. We show that QPAMDP converges to a local optima, and compare different approaches in a robot soccer goalscoring domain and a platformer domain. 
ReinforceWalk  Learning to walk over a graph towards a target node for a given input query and a source node is an important problem in applications such as knowledge graph reasoning. It can be formulated as a reinforcement learning (RL) problem that has a known state transition model, but with partial observability and sparse reward. To overcome these challenges, we develop a graph walking agent called ReinforceWalk, which consists of a deep recurrent neural network (RNN) and a Monte Carlo Tree Search (MCTS). To address partial observability, the RNN encodes the history of observations and map it into the Qvalue, the policy and the state value. In order to effectively train the agent from sparse reward, we combine MCTS with the RNN policy to generate trajectories with more positive rewards. From these trajectories, we update the network in an offpolicy manner using Qlearning and improves the RNN policy. Our proposed RL algorithm repeatedly applies this policy improvement step to learn the entire model. At testing stage, the MCTS is also combined with the RNN to predict the target node with higher accuracy. Experiment results on several graphwalking benchmarks show that we are able to learn better policies from less number of rollouts compared to other baseline methods, which are mainly based on policy gradient method. 
ReingoldTilford Tree  Various algorithms have been proposed for producing tidy drawings of treesdrawings that are aesthetically pleasing and use minimum drawing space. We show that these algorithms contain some difficulties that lead to aesthetically unpleasing, wider than necessary drawings. We then present a new algorithm with comparable time and storage requirements that produces tidier drawings. Generalizations to forests and mary trees are discussed, as are some problems in discretization when alphanumeric output devices are used. collapsibleTree 
Rejection Sampling  In mathematics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptancerejection method or “acceptreject algorithm” and is a type of Monte Carlo method. The method works for any distribution in with a density. Rejection sampling is based on the observation that to sample a random variable one can sample uniformly from the region under the graph of its density function. AR 
ReKopedia  Very important breakthroughs in datacentric machine learning algorithms led to impressive performance in transactional point applications such as detecting anger in speech, alerts from a Face Recognition system, or EKG interpretation. Nontransactional applications, e.g. medical diagnosis beyond the EKG results, require AI algorithms that integrate deeper and broader knowledge in their problemsolving capabilities, e.g. integrating knowledge about anatomy and physiology of the heart with EKG results and additional patient findings. Similarly, for military aerial interpretation, where knowledge about enemy doctrines on force composition and spread helps immensely in situation assessment beyond image recognition of individual objects. The Double Deep Learning approach advocates integrating datacentric machine selflearning techniques with machineteaching techniques to leverage the power of both and overcome their corresponding limitations. To take AI to the next level, it is essential that we rebalance the roles of data and knowledge. Data is important but knowledge deep and commonsense are equally important. An initiative is proposed to build Wikipedia for Smart Machines, meaning target readers are not human, but rather smart machines. Named ReKopedia, the goal is to develop methodologies, tools, and automatic algorithms to convert humanity knowledge that we all learn in schools, universities and during our professional life into Reusable Knowledge structures that smart machines can use in their inference algorithms. Ideally, ReKopedia would be an open source shared knowledge repository similar to the wellknown shared open source software code repositories. Examples in the article are based on or inspired by reallife nontransactional AI systems I deployed over decades of AI career that benefit hundreds of millions of people around the globe. 
RELARM  Following widely used in visual recognition concept of relative attributes, the article establishes definition of the relative PCA attributes for a class of objects defined by vectors of their parameters. A new rating model (RELARM) is built using relative PCA attribute ranking functions for rating object description and kmeans clustering algorithm. Rating assignment of each rating object to a rating category is derived as a result of cluster centers projection on the specially selected rating vector. Empirical study has shown a high level of approximation to the existing S & P, Moody’s and Fitch ratings. 
Relation Extraction (RE) 
With the advent of the Internet, large amount of digital text is generated everyday in the form of news articles, research publications, blogs, question answering forums and social media. It is important to develop techniques for extracting information automatically from these documents, as lot of important information is hidden within them. This extracted information can be used to improve access and management of knowledge hidden in large text corpora. Several applications such as Question Answering, Information Retrieval would benefit from this information. Entities like persons and organizations, form the most basic unit of the information. Occurrences of entities in a sentence are often linked through welldefined relations; e.g., occurrences of person and organization in a sentence may be linked through relations such as employed at. The task of Relation Extraction (RE) is to identify such relations automatically. In this paper, we survey several important supervised, semisupervised and unsupervised RE techniques. We also cover the paradigms of Open Information Extraction (OIE) and Distant Supervision. Finally, we describe some of the recent trends in the RE techniques and possible future research directions. This survey would be useful for three kinds of readers – i) Newcomers in the field who want to quickly learn about RE; ii) Researchers who want to know how the various RE techniques evolved over time and what are possible future research directions and iii) Practitioners who just need to know which RE technique works best in various settings. 
Relation Network (RN) 
We present a conceptually simple, flexible, and general framework for fewshot learning, where a classifier must learn to recognise new classes given only few examples from each. Our method, called the Relation Network (RN), is trained endtoend from scratch. During metalearning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the fewshot setting. Once trained, a RN is able to classify images of new classes by computing relation scores between query images and the few examples of each new class without further updating the network. Besides providing improved performance on fewshot learning, our framework is easily extended to zeroshot learning. Extensive experiments on four datasets demonstrate that our simple approach provides a unified and effective approach for both of these two tasks. 
Relation StructureAware Heterogeneous Information Network Embedding Model (RHINE) 
Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a lowdimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structureaware Heterogeneous Information Network Embedding model (RHINE). By exploring the realworld networks with thorough mathematical analysis, we present two structurerelated measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three realworld datasets demonstrate that our model significantly outperforms the stateoftheart methods in various tasks, including node clustering, link prediction, and node classification. 
Relational Captioning  ➚ “Dense Relational Captioning” 
Relational Class Analysis  RCA 
Relational Collaborative Filtering  Existing itembased collaborative filtering (ICF) methods leverage only the relation of collaborative similarity. Nevertheless, there exist multiple relations between items in realworld scenarios. Distinct from the collaborative similarity that implies cointeract patterns from the user perspective, these relations reveal finegrained knowledge on items from different perspectives of metadata, functionality, etc. However, how to incorporate multiple item relations is less explored in recommendation research. In this work, we propose Relational Collaborative Filtering (RCF), a general framework to exploit multiple relations between items in recommender system. We find that both the relation type and the relation value are crucial in inferring user preference. To this end, we develop a twolevel hierarchical attention mechanism to model user preference. The firstlevel attention discriminates which types of relations are more important, and the secondlevel attention considers the specific relation values to estimate the contribution of a historical item in recommending the target item. To make the item embeddings be reflective of the relational structure between items, we further formulate a task to preserve the item relations, and jointly train it with the recommendation task of preference modeling. Empirical results on two real datasets demonstrate the strong performance of RCF. Furthermore, we also conduct qualitative analyses to show the benefits of explanations brought by the modeling of multiple item relations. 
Relational Concept Analysis (RCA) 
The processing of complex data is admittedly among the major concerns of knowledge discovery from data (kdd). Indeed, a major part of the data worth analyzing is stored in relational databases and, since recently, on the Web of Data. This clearly underscores the need for EntityRelationship and rdf compliant data mining (dm) tools. We are studying an approach to the underlying multirelational data mining (mrdm) problem, which relies on formal concept analysis (fca) as a framework for clustering and classification. Our relational concept analysis (rca) extends fca to the processing of multirelational datasets, i.e., with multiple sorts of individuals, each provided with its own set of attributes, and relationships among those. Given such a dataset, rca constructs a set of concept lattices, one per object sort, through an iterative analysis process that is bound towards a fixedpoint. In doing that, it abstracts the links between objects into attributes akin to role restrictions from description logics (dls). We address here key aspects of the iterative calculation such as evolution in data description along the iterations and process termination. We describe implementations of rca and list applications to problems from software and knowledge engineering. Ondemand Relational Concept Analysis 
Relational Event Models (REM) 
Sequences of relational events underlie much empirical research on organizational relations. Yet relational event data are typically aggregated and dichotomized to derive networks that can be analyzed with specialized statistical methods. Transforming sequences of relational events into binary network ties entails two main limitations: the loss of information about the order and number of events that compose each tie and the inability to account for compositional changes in the set of actors and/or recipients. rem,relevent 
Relational Forward Model (RFM) 
The behavioral dynamics of multiagent systems have a rich and orderly structure, which can be leveraged to understand these systems, and to improve how artificial agents learn to operate in them. Here we introduce Relational Forward Models (RFM) for multiagent learning, networks that can learn to make accurate predictions of agents’ future behavior in multiagent environments. Because these models operate on the discrete entities and relations present in the environment, they produce interpretable intermediate representations which offer insights into what drives agents’ behavior, and what events mediate the intensity and valence of social interactions. Furthermore, we show that embedding RFM modules inside agents results in faster learning systems compared to nonaugmented baselines. As more and more of the autonomous systems we develop and interact with become multiagent in nature, developing richer analysis tools for characterizing how and why agents make decisions is increasingly necessary. Moreover, developing artificial agents that quickly and safely learn to coordinate with one another, and with humans in shared environments, is crucial. 
Relational Graph Attention Network  We investigate Relational Graph Attention Networks, a class of models that extends nonrelational graph attention mechanisms to incorporate relational information, opening up these methods to a wider variety of problems. A thorough evaluation of these models is performed, and comparisons are made against established benchmarks. To provide a meaningful comparison, we retrain Relational Graph Convolutional Networks, the spectral counterpart of Relational Graph Attention Networks, and evaluate them under the same conditions. We find that Relational Graph Attention Networks perform worse than anticipated, although some configurations are marginally beneficial for modelling molecular properties. We provide insights as to why this may be, and suggest both modifications to evaluation strategies, as well as directions to investigate for future work. 
Relational Induction Neural Network (RINN) 
The automation design of microwave integrated circuits (MWIC) has long been viewed as a fundamental challenge for artificial intelligence owing to its larger solution space and structural complexity than Go. Here, we developed a novel artificial agent, termed Relational Induction Neural Network, that can lead to an automotive design of MWIC and avoid bruteforce computing to examine every possible solution, which is a significant breakthrough in the field of electronics. Through the experiments on microwave transmission line circuit, filter circuit and antenna circuit design tasks, strongly competitive results are obtained respectively. Compared with the traditional reinforcement learning method, the learning curve shows that the proposed architecture is able to quickly converge to the predesigned MWIC model and the convergence rate is up to four orders of magnitude. This is the first study which has been shown that an agent through training or learning to automatically induct the relationship between MWIC’s structures without incorporating any of the additional prior knowledge. Notably, the relationship can be explained in terms of the MWIC theory and electromagnetic field distribution. Our work bridges the divide between artificial intelligence and MWIC and can extend to mechanical wave, mechanics and other related fields. 
Relational Knowledge Distillation (RKD) 
Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distancewise and anglewise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers’ performance, achieving the state of the arts on standard benchmark datasets. 
Relational Memory Core (RMC) 
Memorybased neural networks model temporal data by leveraging an ability to remember information for long periods. It is unclear, however, whether they also have an ability to perform complex relational reasoning with the information they remember. Here, we first confirm our intuitions that standard memory architectures may struggle at tasks that heavily involve an understanding of the ways in which entities are connected — i.e., tasks involving relational reasoning. We then improve upon these deficits by using a new memory module — a \textit{Relational Memory Core} (RMC) — which employs multihead dot product attention to allow memories to interact. Finally, we test the RMC on a suite of tasks that may profit from more capable relational reasoning across sequential information, and show large gains in RL domains (e.g. Mini PacMan), program evaluation, and language modeling, achieving stateoftheart results on the WikiText103, Project Gutenberg, and GigaWord datasets. 
Relational Network  Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plugandplay module to solve problems that fundamentally hinge on relational reasoning. We tested RNaugmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve stateoftheart, superhuman performance; textbased question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called SortofCLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations. ➚ “Recurrent Relational Network” Recurrent Relational Networks for Complex Relational Reasoning 
Relational Pooling  This work generalizes graph neural networks (GNNs) beyond those based on the WeisfeilerLehman (WL) algorithm, graph Laplacians, and graph diffusion kernels. Our approach, denoted Relational Pooling (RP), draws from the theory of finite partial exchangeability to provide a framework with maximal representation power for graphs. RP can work with existing graph representation models, and somewhat counterintuitively, can make them even more powerful than the original WL isomorphism test. Additionally, RP is the first theoretically sound framework to use architectures like Recurrent Neural Networks and Convolutional Neural Networks for graph classification. RP also has graph kernels as a special case. We demonstrate improved performance of novel RPbased graph representations over current stateoftheart methods on a number of tasks. 
Relational Proposal Graph Network (RepGN) 
Region based object detectors achieve the stateoftheart performance, but few consider to model the relation of proposals. In this paper, we explore the idea of modeling the relationships among the proposals for object detection from the graph learning perspective. Specifically, we present relational proposal graph network (RepGN) which is defined on object proposals and the semantic and spatial relation modeled as the edge. By integrating our RepGN module into object detectors, the relation and context constraints will be introduced to the feature extraction of regions and bounding boxes regression and classification. Besides, we propose a novel graphcut based pooling layer for hierarchical coarsening of the graph, which empowers the RepGN module to exploit the interregional correlation and scene description in a hierarchical manner. We perform extensive experiments on COCO object detection dataset and show promising results. 
Relational Reasoning Network (RRN) 
Accurately identifying anatomical landmarks is a crucial step in deformation analysis and surgical planning for craniomaxillofacial (CMF) bones. Available methods require segmentation of the object of interest for precise landmarking. Unlike those, our purpose in this study is to perform anatomical landmarking using the inherent relation of CMF bones without explicitly segmenting them. We propose a new deep network architecture, called relational reasoning network (RRN), to accurately learn the local and the global relations of the landmarks. Specifically, we are interested in learning landmarks in CMF region: mandible, maxilla, and nasal bones. The proposed RRN works in an endtoend manner, utilizing learned relations of the landmarks based on denseblock units and without the need for segmentation. For a given a few landmarks as input, the proposed system accurately and efficiently localizes the remaining landmarks on the aforementioned bones. For a comprehensive evaluation of RRN, we used conebeam computed tomography (CBCT) scans of 250 patients. The proposed system identifies the landmark locations very accurately even when there are severe pathologies or deformations in the bones. The proposed RRN has also revealed unique relationships among the landmarks that help us infer several reasoning about informativeness of the landmark points. RRN is invariant to order of landmarks and it allowed us to discover the optimal configurations (number and location) for landmarks to be localized within the object of interest (mandible) or nearby objects (maxilla and nasal). To the best of our knowledge, this is the first of its kind algorithm finding anatomical relations of the objects using deep learning. 
Relational Recurrent Neural Network  Memorybased neural networks model temporal data by leveraging an ability to remember information for long periods. It is unclear, however, whether they also have an ability to perform complex relational reasoning with the information they remember. Here, we first confirm our intuitions that standard memory architectures may struggle at tasks that heavily involve an understanding of the ways in which entities are connected — i.e., tasks involving relational reasoning. We then improve upon these deficits by using a new memory module — a \textit{Relational Memory Core} (RMC) — which employs multihead dot product attention to allow memories to interact. Finally, we test the RMC on a suite of tasks that may profit from more capable relational reasoning across sequential information, and show large gains in RL domains (e.g. Mini PacMan), program evaluation, and language modeling, achieving stateoftheart results on the WikiText103, Project Gutenberg, and GigaWord datasets. 
Relational Rules  Research in cooperative games often assumes that agents know the coalitional values with certainty, and that they can belong to one coalition only. By contrast, this work assumes that the value of a coalition is based on an underlying collaboration structure emerging due to existing but unknown relations among the agents; and that agents can form overlapping coalitions. Specifically, we first propose Relational Rules, a novel representation scheme for cooperative games with overlapping coalitions, which encodes the aforementioned relations, and which extends the wellknown MCnets representation to this setting. We then present a novel decisionmaking method for decentralized overlapping coalition formation, which exploits probabilistic topic modeling, and in particular, online Latent Dirichlet Allocation. By interpreting formed coalitions as documents, agents can effectively learn topics that correspond to profitable collaboration structures. 
Relational Similarity Machines (RSM) 
This paper proposes Relational Similarity Machines (RSM): a fast, accurate, and flexible relational learning framework for supervised and semisupervised learning tasks. Despite the importance of relational learning, most existing methods are hard to adapt to different settings, due to issues with efficiency, scalability, accuracy, and flexibility for handling a wide variety of classification problems, data, constraints, and tasks. For instance, many existing methods perform poorly for multiclass classification problems, graphs that are sparsely labeled or network data with low relational autocorrelation. In contrast, the proposed relational learning framework is designed to be (i) fast for learning and inference at realtime interactive rates, and (ii) flexible for a variety of learning settings (multiclass problems), constraints (few labeled instances), and application domains. The experiments demonstrate the effectiveness of RSM for a variety of tasks and data. 
RelationAware Global Attention (RGA) 
Attention mechanism aims to increase the representation power by focusing on important features and suppressing unnecessary ones. For convolutional neural networks (CNNs), attention is typically learned with local convolutions, which ignores the global information and the hidden relation. How to efficiently exploit the longrange context to globally learn attention is underexplored. In this paper, we propose an effective RelationAware Global Attention (RGA) module for CNNs to fully exploit the global correlations to infer the attention. Specifically, when computing the attention at a feature position, in order to grasp information of global scope, we propose to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions, and the feature itself together for learning the attention with convolutional operations. Given an intermediate feature map, we have validated the effectiveness of this design across both the spatial and channel dimensions. When applied to the task of person reidentification, our model achieves the stateoftheart performance. Extensive ablation studies demonstrate that our RGA can significantly enhance the feature representation power. We further demonstrate the general applicability of RGA to vision tasks by applying it to the scene segmentation and image classification tasks resulting in consistent performance improvement. 
Relationaware Graph Attention Network (ReGAT) 
In order to answer semanticallycomplicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the interactive dynamics between different objects. We propose a Relationaware Graph Attention Network (ReGAT), which encodes each image into a graph and models multitype interobject relations via a graph attention mechanism, to learn questionadaptive relation representations. Two types of visual object relations are explored: (i) Explicit Relations that represent geometric positions and semantic interactions between objects; and (ii) Implicit Relations that capture the hidden dynamics between image regions. Experiments demonstrate that ReGAT outperforms prior stateoftheart approaches on both VQA 2.0 and VQACP v2 datasets. We further show that ReGAT is compatible to existing VQA architectures, and can be used as a generic relation encoder to boost the model performance for VQA. 
Relationship Extraction  A Relationship Extraction (Relation Extraction) task requires the detection and classification of semantic relationship mentions within a set of artifacts, typically from text or XML documents. The task is very similar to that of information extraction (IE), but IE additionally requires the removal of repeated relations (disambiguation) and generally refers to the extraction of many different relationships. http://…ingannlpproblemwithoutusingatonof 
relative Age of Information (rAoI) 
In this paper, we introduce a new data freshness metric, relative Age of Information (rAoI), and examine it in a single server system with various packet management schemes. The (classical) AoI metric was introduced to measure the staleness of status updates at the receiving end with respect to their generation at the source. This metric addresses systems where the timings of update generation at the source are absolute and can be designed separately or jointly with the transmission schedules. In many decentralized applications, transmission schedules are blind to update generation timing, and the transmitter can know the timing of an update packet only after it arrives. As such, an update becomes stale after a new one arrives. The rAoI metric measures how fresh the data is at the receiver with respect to the data at the transmitter. It introduces a particularly explicit dependence on the arrival process in the evaluation of age. We investigate several queuing disciplines and provide closed form expressions for rAoI and numerical comparisons. 
Relative Attributing Propagation (RAP) 
As Deep Neural Networks (DNNs) have demonstrated superhuman performance in many computer vision tasks, there is an increasing interest in revealing the complex internal mechanisms of DNNs. In this paper, we propose Relative Attributing Propagation (RAP), which decomposes the output predictions of DNNs with a new perspective that precisely separates the positive and negative attributions. By identifying the fundamental causes of activation and the proper inversion of relevance, RAP allows each neuron to be assigned an actual contribution to the output. Furthermore, we devise pragmatic methods to handle the effect of bias and batch normalization properly in the attributing procedures. Therefore, our method makes it possible to interpret various kinds of very deep neural network models with clear and attentive visualizations of positive and negative attributions. By utilizing the region perturbation method and comparing the distribution of attributions for a quantitative evaluation, we verify the correctness of our RAP whether the positive and negative attributions correctly account for each meaning. The positive and negative attributions propagated by RAP show the characteristics of vulnerability and robustness to the distortion of the corresponding pixels, respectively. We apply RAP to DNN models; VGG16, ResNet50 and InceptionV3, demonstrating its generation of more intuitive and improved interpretation compared to the existing attribution methods. 
RelATive cEntrality (RATE) 
While the success of deep neural networks (DNNs) is wellestablished across a variety of domains, our ability to explain and interpret these methods is limited. Unlike previously proposed local methods which try to explain particular classification decisions, we focus on global interpretability and ask a universally applicable question: given a trained model, which features are the most important? In the context of neural networks, a feature is rarely important on its own, so our strategy is specifically designed to leverage partial covariance structures and incorporate variable dependence into feature ranking. Our methodological contributions in this paper are twofold. First, we propose an effect size analogue for DNNs that is appropriate for applications with highly collinear predictors (ubiquitous in computer vision). Second, we extend the recently proposed ‘RelATive cEntrality’ (RATE) measure (Crawford et al., 2019) to the Bayesian deep learning setting. RATE applies an information theoretic criterion to the posterior distribution of effect sizes to assess feature significance. We apply our framework to three broad application areas: computer vision, natural language processing, and social science. 
Relative Growth Rate (RGR) 
See Birch, L. C. 1948. The intrinsic rate of natural increase of an insect population. – Journal of Animal Ecology 17: 1526; <doi:10.2307/1605>. petitr 
Relative Likelihood  marl 
Relative Risk  In statistics and epidemiology, relative risk or risk ratio (RR) is the ratio of the probability of an event occurring (for example, developing a disease, being injured) in an exposed group to the probability of the event occurring in a comparison, nonexposed group. Relative risk includes two important features: (i) a comparison of risk between two ‘exposures’ puts risks in context, and (ii) ‘exposure’ is ensured by having proper denominators for each group representing the exposure prop.comb.RR 
Relative Survival  In survival analysis, relative survival of a disease is calculated by dividing the overall survival after diagnosis by the survival as observed in a similar population that was not diagnosed with that disease. A similar population is composed of individuals with at least age and gender similar to those diagnosed with the disease. When describing the survival experience of a group of people or patients typically the method of overall survival is used, and it presents estimates of the proportion of people or patients alive at a certain point in time. The problem with measuring overall survival using KaplanMeier or actuarial survival methods, is that the estimates include two causes of death: 1) deaths due to the disease of interest and; 2) deaths due to all other causes, which includes old age, other cancers, trauma and any other possible cause of death. In general, survival analysis is interested in the deaths due to a disease rather than all causes, and therefore a ’causespecific survival analysis’ is employed to measure diseasespecific survival. Thus, there are two ways in performing a causespecific survival analysis ‘competing risks survival analysis’ and ‘relative survival’. 
Relaxed Online Maximum Margin Algorithm (ROMMA) 
An incremental algorithm for training linear threshold functions: the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctly with the maximum margin. It is known that such a maximummargin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perceptron algorithm. Our analysis implies that the maximummargin algorithm also satisfies this mistake bound; this is the first worstcase performance guarantee for this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of perceptron algorithm, but their generalization is much better. We show that a batch algorithm based on aggressive ROMMA converges to the fixed threshold SVM hypothesis. 
Relay  Frameworks for writing, compiling, and optimizing deep learning (DL) models have recently enabled progress in areas like computer vision and natural language processing. Extending these frameworks to accommodate the rapidly diversifying landscape of DL models and hardware platforms presents challenging tradeoffs between expressiveness, composability, and portability. We present Relay, a new intermediate representation (IR) and compiler framework for DL models. The functional, staticallytyped Relay IR unifies and generalizes existing DL IRs and can express stateoftheart models. Relay’s expressive IR required careful design of the type system, automatic differentiation, and optimizations. Relay’s extensible compiler can eliminate abstraction overhead and target new hardware platforms. The design insights from Relay can be applied to existing frameworks to develop IRs that support extension without compromising on expressivity, composibility, and portability. Our evaluation demonstrates that the Relay prototype can already provide competitive performance for a broad class of models running on CPUs, GPUs, and FPGAs. 
ReLeQ  Despite numerous stateoftheart applications of Deep Neural Networks (DNNs) in a wide range of realworld tasks, two major challenges hinder further advances in DNNs: hyperparameter optimization and lack of computing power. Recent efforts show that quantizing the weights and activations of DNN layers to lower bitwidths takes a significant step toward reducing memory bandwidth and power consumption by using limited computing resources. This paper builds upon the algorithmic insight that the bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. While the use of eightbit weights and activations during inference maintains the accuracy in most cases, lower bitwidths can achieve the same accuracy while utilizing less power. However, deep quantization (quantizing bitwidths below eight) while maintaining accuracy requires a great deal of trialanderror, finetuning as well as retraining. By formulating quantization bitwidth as a hyperparameter in the optimization problem of selecting the bitwidth, we tackle this issue by leveraging a stateoftheart policy gradient based Reinforcement Learning (RL) algorithm called Proximal Policy Optimization [10] (PPO), to efficiently explore a large design space of DNN quantization. The proposed technique also opens up the possibility of performing heterogeneous quantization of the network (e.g., quantizing each layer to different bitwidth) as the RL agent learns the sensitivity of each layer with respect to accuracy in order to perform quantization of the entire network. We evaluated our method on several neural networks including MNIST, CIFAR10, SVHN and the RL agent quantizes these networks to average bitwidths of 2.25, 5 and 4 respectively with less than 0.3% accuracy loss in all cases. 
Relevance Vector Machine (RVM) 
In mathematics, a relevance vector machine (RVM) is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and probabilistic classification. The RVM has an identical functional form to the support vector machine, but provides probabilistic classification. Compared to that of support vector machines (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require crossvalidationbased postoptimizations). However RVMs use an expectation maximization (EM)like learning method and are therefore at risk of local minima. This is unlike the standard sequential minimal optimization (SMO)based algorithms employed by SVMs, which are guaranteed to find a global optimum (of the convex problem). The relevance vector machine is patented in the United States by Microsoft. 
Relevant Component Analysis (RCA) 
Irrelevant data variability often causes difficulties in classification and clustering tasks. For example, when data variability is dominated by environment conditions, such as global illumination, nearestneighbour classification in the original feature space may be very unreliable. The goal of Relevant Component Analysis (RCA) is to find a transformation that amplifies relevant variability and suppresses irrelevant variability. Relevant Component Analysis tries to find a linear transformation W of the feature space such that the effect of irrelevant variability is reduced in the transformed space. That is, we wish to rescale the feature space and reduce the weights of irrelevant directions. The main premise of RCA is that we can reduce irrelevant variability by reducing the withinclass variability. Intuitively, a direction which exhibits high variability among samples of the same class is unlikely to be useful for classification or clustering. RECA 
Reliability Data Analysis  After you have obtained component or system reliability data, how do you fit life distribution models, reliability growth models, or acceleration models? How do you estimate failure rates or MTBF’s and project component or system reliability at use conditions? SPREDA 
Reliability Modelling  Reliability modeling is the process of predicting or understanding the reliability of a component or system prior to its implementation. Two types of analysis that are often used to model a complete system’s availability behavior (including effects from logistics issues like spare part provisioning, transport and manpower) are Fault Tree Analysis and reliability block diagrams. At a component level, the same types of analyses can be used together with others. The input for the models can come from many sources including testing; prior operational experience; field data; as well as data handbooks from similar or related industries. Regardless of source, all model input data must be used with great caution, as predictions are only valid in cases where the same product was used in the same context. As such, predictions are often only used to help compare alternatives. 
ReliefBased Feature Selection  Feature selection plays a critical role in data mining, driven by increasing feature dimensionality in target problems and growing interest in advanced but computationally expensive methodologies able to model complex associations. Specifically, there is a need for feature selection methods that are computationally efficient, yet sensitive to complex patterns of association, e.g. interactions, so that informative features are not mistakenly eliminated prior to downstream modeling. This paper focuses on Reliefbased algorithms (RBAs), a unique family of filterstyle feature selection algorithms that strike an effective balance between these objectives while flexibly adapting to various data characteristics, e.g. classification vs. regression. First, this work broadly examines types of feature selection and defines RBAs within that context. Next, we introduce the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features. Lastly, we include an expansive review of RBA methodological research beyond Relief and its popular descendant, ReliefF. In particular, we characterize branches of RBA research, and provide comparative summaries of RBA algorithms including contributions, strategies, functionality, time complexity, adaptation to key data characteristics, and software availability. 
Reluplex  Deep neural networks have emerged as a widely used and effective means for tackling complex, realworld problems. However, a major obstacle in applying them to safetycritical systems is the great difficulty in providing formal guarantees about their behavior. We present a novel, scalable, and efficient technique for verifying properties of deep neural networks (or providing counterexamples). The technique is based on the simplex method, extended to handle the nonconvex Rectified Linear Unit (ReLU) activation function, which is a crucial ingredient in many modern neural networks. The verification procedure tackles neural networks as a whole, without making any simplifying assumptions. We evaluated our technique on a prototype deep neural network implementation of the nextgeneration Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu). Results show that our technique can successfully prove properties of networks that are an order of magnitude larger than the largest networks verified using existing methods. 
REMIX  Outlier detection is the identification of points in a dataset that do not conform to the norm. Outlier detection is highly sensitive to the choice of the detection algorithm and the feature subspace used by the algorithm. Extracting domainrelevant insights from outliers needs systematic exploration of these choices since diverse outlier sets could lead to complementary insights. This challenge is especially acute in an interactive setting, where the choices must be explored in a timeconstrained manner. In this work, we present REMIX, the first system to address the problem of outlier detection in an interactive setting. REMIX uses a novel mixed integer programming (MIP) formulation for automatically selecting and executing a diverse set of outlier detectors within a time limit. This formulation incorporates multiple aspects such as (i) an upper limit on the total execution time of detectors (ii) diversity in the space of algorithms and features, and (iii) metalearning for evaluating the cost and utility of detectors. REMIX provides two distinct ways for the analyst to consume its results: (i) a partitioning of the detectors explored by REMIX into perspectives through lowrank nonnegative matrix factorization; each perspective can be easily visualized as an intuitive heatmap of experiments versus outliers, and (ii) an ensembled set of outliers which combines outlier scores from all detectors. We demonstrate the benefits of REMIX through extensive empirical validation on realworld data. 
Remove Unwanted Variation, 2step (RUV2) 
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biological variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method “Remove Unwanted Variation, 2step” (RUV2). ruv 
Remove Unwanted Variation, 4step (RUV4) 
High dimensional data suffer from unwanted variation, such as the batch effects common in microarray data. Unwanted variation complicates the analysis of high dimensional data, leading to high rates of false discoveries, high rates of missed discoveries, or both. In many cases the factors causing the unwanted variation are unknown and must be inferred from the data. In such cases, negative controls may be used to identify the unwanted variation and separate it from the wanted variation. We present a new method, RUV4, to adjust for unwanted variation in high dimensional data with negative controls. RUV4 may be used when the goal of the analysis is to determine which of the features are truly associated with a given factor of interest. One nice property of RUV4 is that it is relatively insensitive to the number of unwanted factors included in the model; this makes estimating the number of factors less critical. We also present a novel method for estimating the features’ variances that may be used even when a large number of unwanted factors are included in the model and the design matrix is full rank. We name this the “inverse method for estimating variances.” By combining RUV4 with the inverse method, it is no longer necessary to estimate the number of unwanted factors at all. Using both real and simulated data we compare the performance of RUV4 with that of other adjustment methods such as SVA, LEAPP, ICE, and RUV2. We find that RUV4 and its variants perform as well or better than other methods. ruv 
Renewal Hawkes Process  RHawkes 
Renewal Monte Carlo (RMC) 
In this paper, we present an online reinforcement learning algorithm, called Renewal Monte Carlo (RMC), for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm and retains the advantages of Monte Carlo methods including low bias, simplicity, and ease of implementation while, at the same time, circumvents their key drawbacks of high variance and delayed (end of episode) updates. The key ideas behind RMC are as follows. First, under any reasonable policy, the reward process is ergodic. So, by renewal theory, the performance of a policy is equal to the ratio of expected discounted reward to the expected discounted time over a regenerative cycle. Second, by carefully examining the expression for performance gradient, we propose a stochastic approximation algorithm that only requires estimates of the expected discounted reward and discounted time over a regenerative cycle and their gradients. We propose two unbiased estimators for evaluating performance gradients—a likelihood ratio based estimator and a simultaneous perturbation based estimator—and show that for both estimators, RMC converges to a locally optimal policy. We generalize the RMC algorithm to postdecision state models and also present a variant that converges faster to an approximately optimal policy. We conclude by presenting numerical experiments on a randomly generated MDP, eventtriggered communication, and inventory management. 
Renyi Entropy  In information theory, the Rényi entropy generalizes the Hartley entropy, the Shannon entropy, the collision entropy and the min entropy. Entropies quantify the diversity, uncertainty, or randomness of a system. The Rényi entropy is named after Alfréd Rényi. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of Generalized dimensions. The Rényi entropy is important in ecology and statistics as index of diversity. The Rényi entropy is also important in quantum information, where it can be used as a measure of entanglement. In the Heisenberg XY spin chain model, the Rényi entropy as a function of a can be calculated explicitly by virtue of the fact that it is an automorphic function with respect to a particular subgroup of the modular group. In theoretical computer science, the minentropy is used in the context of randomness extractors. 
Renyi Entropy ActorCritic (RAC) 
We propose a new policy iteration theory as an important extension of soft policy iteration and Soft ActorCritic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected longterm rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy ActorCritic (TAC) and Renyi entropy ActorCritic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble ActorCritic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new valuefunction based mechanism for highlevel action selection. Empirically we show that TAC, RAC and EAC can achieve stateoftheart performance on a range of benchmark control tasks, outperforming SAC and several cuttingedge learning algorithms in terms of both sample efficiency and effectiveness. 
REorders and/or REflects FACTors (REREFACT) 
Executes a postrotation algorithm that REorders and/or REflects FACTors (REREFACT) for each replication of a simulation study with exploratory factor analysis. 
REPACRR  Adhoc retrieval models can benefit from considering different patterns in the interactions between a query and a document, effectively assessing the relevance of a document for a given user query. Factors to be considered in this interaction include (i) the matching of unigrams and ngrams, (ii) the proximity of the matched query terms, (iii) their position in the document, and (iv) how the different relevance signals are combined over different query terms. While previous work has successfully modeled some of these factors, not all aspects have been fully explored. In this work, we close this gap by proposing different neural components and incorporating them into a single architecture, leading to a novel neural IR model called REPACRR. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model yields promising search results. 
Repeated Measures  Repeated measures design uses the same subjects with every branch of research, including the control. For instance, repeated measurements are collected in a longitudinal study in which change over time is assessed. Other (nonrepeated measures) studies compare the same measure under two or more different conditions. For instance, to test the effects of caffeine on cognitive function, a subject’s math ability might be tested once after they consume caffeine and another time when they consume a placebo. Book: Analysis of Repeated Measures Data 
RepeatNet  Recurrent neural networks for sessionbased recommendation have attracted a lot of attention recently because of their promising performance. repeat consumption is a common phenomenon in many recommendation scenarios (e.g., ecommerce, music, and TV program recommendations), where the same item is reconsumed repeatedly over time. However, no previous studies have emphasized repeat consumption with neural networks. An effective neural approach is needed to decide when to perform repeat recommendation. In this paper, we incorporate a repeatexplore mechanism into neural networks and propose a new model, called RepeatNet, with an encoderdecoder structure. RepeatNet integrates a regular neural recommendation approach in the decoder with a new repeat recommendation mechanism that can choose items from a user’s history and recommends them at the right time. We report on extensive experiments on three benchmark datasets. RepeatNet outperforms stateoftheart baselines on all three datasets in terms of MRR and Recall. Furthermore, as the dataset size and the repeat ratio increase, the improvements of RepeatNet over the baselines also increase, which demonstrates its advantage in handling repeat recommendation scenarios. 
Repertoire Dissimilarity Index  In this paper, we present a nonparametric method for directly comparing sequencing repertoires, with the goal of rigorously quantifying differences in V, D, and J gene segment utilization. This method, referred to as the Repertoire Dissimilarity Index (RDI), uses a bootstrapped subsampling approach to account for variance in sequencing depth, and, coupled with a data simulation approach, allows for direct quantification of the average variation between repertoires. We use the RDI method to recapitulate known differences in the formation of the CD4+ and CD8+ T cell repertoires, and further show that antigendriven activation of naïve CD8+ T cells is more selective than in the CD4+ repertoire, resulting in a more specialized CD8+ memory repertoire. rdi 
Repetition Based Pattern (RBP) 
In this paper, we show that standard feedforward and recurrent neural networks fail to learn abstract patterns based on identity rules. We propose Repetition Based Pattern (RBP) extensions to neural network structures that solve this problem and answer, as well as raise, questions about integrating structures for inductive bias into neural networks. Examples of abstract patterns are the sequence patterns ABA and ABB where A or B can be any object. These were introduced by Marcus et al (1999) who also found that 7 month old infants recognise these patterns in sequences that use an unfamiliar vocabulary while simple recurrent neural networks do not.This result has been contested in the literature but it is confirmed by our experiments. We also show that the inability to generalise extends to different, previously untested, settings. We propose a new approach to modify standard neural network architectures, called Repetition Based Patterns (RBP) with different variants for classification and prediction. Our experiments show that neural networks with the appropriate RBP structure achieve perfect classification and prediction performance on synthetic data, including mixed concrete and abstract patterns. RBP also improves neural network performance in experiments with realworld sequence prediction tasks. We discuss these finding in terms of challenges for neural network models and identify consequences from this result in terms of developing inductive biases for neural network learning. 
Replacement AutoEncoder  An increasing number of sensors on mobile, Internet of things (IoT), and wearable devices generate timeseries measurements of physical activities. Though access to the sensory data is critical to the success of many beneficial applications such as health monitoring or activity recognition, a wide range of potentially sensitive information about the individuals can also be discovered through these datasets and this cannot easily be protected using traditional privacy approaches. In this paper, we propose an integrated sensing framework for managing access to personal timeseries data in order to provide utility while protecting individuals’ privacy. We introduce \textit{Replacement AutoEncoder}, a novel featurelearning algorithm which learns how to transform discriminative features of multidimensional timeseries that correspond to sensitive inferences, into some features that have been more observed in nonsensitive inferences, to protect users’ privacy. The main advantage of Replacement AutoEncoder is its ability to keep important features of desired inferences unchanged to preserve the utility of the data. We evaluate the efficacy of the algorithm with an activity recognition task in a multisensing environment using extensive experiments on three benchmark datasets. We show that it can retain the recognition accuracy of stateoftheart techniques while simultaneously preserving the privacy of sensitive information. We use a Generative Adversarial Network to attempt to detect the replacement of sensitive data with fake nonsensitive data. We show that this approach does not detect the replacement unless the network can train using the users’ original unmodified data. 
Replicator Neural Network (RNN) 
Replicator neural networks selforganize by using their inputs as desired outputs; they internally form a compressed representation for the input data. A theorem shows that a class of replicator networks can, through the minimization of mean squared reconstruction error (for instance, by training on raw data examples), carry out optimal data compression for arbitrary data vector sources. Data manifolds, a new general model of data sources, are then introduced and a second theorem shows that, in a practically important limiting case, optimalcompression replicator networks operate by creating an essentially unique natural coordinate system for the manifold. Anomaly Detection Using Replicator Neural Networks Trained on Examples of One Class 
Reporting  
RePr  A welltrained Convolutional Neural Network can easily be pruned without significant loss of performance. This is because of unnecessary overlap in the features captured by the network’s filters. Innovations in network architecture such as skip/dense connections and Inception units have mitigated this problem to some extent, but these improvements come with increased computation and memory requirements at runtime. We attempt to address this problem from another angle – not by changing the network structure but by altering the training method. We show that by temporarily pruning and then restoring a subset of the model’s filters, and repeating this process cyclically, overlap in the learned features is reduced, producing improved generalization. We show that the existing modelpruning criteria are not optimal for selecting filters to prune in this context and introduce interfilter orthogonality as the ranking criteria to determine underexpressive filters. Our method is applicable both to vanilla convolutional networks and more complex modern architectures, and improves the performance across a variety of tasks, especially when applied to smaller networks. 
Representation Adversarial Learning Network (RepGAN) 
A good representation for arbitrarily complicated data should have the capability of semantic generation, clustering and reconstruction. Previous research has already achieved impressive performance on either one. This paper aims at learning a disentangled representation effective for all of them in an unsupervised way. To achieve all the three tasks together, we learn the forward and inverse mapping between data and representation on the basis of a symmetric adversarial process. In theory, we minimize the upper bound of the two conditional entropy loss between the latent variables and the observations together to achieve the cycle consistency. The newly proposed RepGAN is tested on MNIST, fashionMNIST, CelebA, and SVHN datasets to perform unsupervised or semisupervised classification, generation and reconstruction tasks. The result demonstrates that RepGAN is able to learn a useful and competitive representation. To the author’s knowledge, our work is the first one to achieve both a high unsupervised classification accuracy and low reconstruction error on MNIST. 
Representation Learning  Feature learning or representation learning is a set of techniques that learn a transformation of raw data input to a representation that can be effectively exploited in machine learning tasks. Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, realworld data such as images, video, and sensor measurement is usually complex, redundant, and highly variable. Thus, it is necessary to discover useful features or representations from raw data. Traditional handcrafted features often require expensive human labor and often rely on expert knowledge. Also, they normally do not generalize well. This motivates the design of efficient feature learning techniques. Feature learning can be divided into two categories: · In supervised and unsupervised feature learning. In supervised feature learning, features are learned with labeled input data. Examples include neural networks, multilayer perceptron, and (supervised) dictionary learning. · In unsupervised feature learning, features are learned with unlabeled input data. Examples include dictionary learning, independent component analysis, autoencoders, matrix factorization, and various forms of clustering. 
Representational Distance Learning (RDL) 
We propose representational distance learning (RDL), a technique that allows transferring knowledge from a model of arbitrary type to a deep neural network (DNN). This method seeks to maximize the similarity between the representational dissimilarity, or distance, matrices (RDMs) of a model with desired knowledge, the teacher, and a DNN currently being trained, the student. This knowledge transfer is performed using auxiliary error functions. This allows DNNs to simultaneously learn from a teacher model and learn to perform some task within the framework of backpropagation. We test the use of RDL for knowledge distillation, also known as model compression, from a large teacher DNN to a small student DNN using the MNIST and CIFAR10 datasets. Also, we test the use of RDL for knowledge transfer between tasks using the CIFAR10 and CIFAR100 datasets. For each test, RDL significantly improves performance when compared to traditional backpropagation alone and performs similarly to, or better than, recently proposed methods for model compression and knowledge transfer. 
Representative Approach  We propose a fast and efficient strategy, called the representative approach, for big data analysis with linear models and generalized linear models. With a given partition of big dataset, this approach constructs a representative data point for each data block and fits the target model using the representative dataset. In terms of time complexity, it is as fast as the subsampling approaches in the literature. As for efficiency, its accuracy in estimating parameters is better than the divideandconquer method. With comprehensive simulation studies and theoretical justifications, we recommend two representative approaches. For linear models or generalized linear models with a flat inverse link function and moderate coefficients of continuous variables, we recommend mean representatives (MR). For other cases, we recommend scorematching representatives (SMR). As an illustrative application to the Airline ontime performance data, MR and SMR are as good as the full data estimate when available. Furthermore, the proposed representative strategy is ideal for analyzing massive data dispersed over a network of interconnected computers. 
Representer Values  We propose to explain the predictions of a deep neural network, by pointing to the set of what we call representer points in the training set, for a given test point prediction. Specifically, we show that we can decompose the preactivation prediction of a neural network into a linear combination of activations of training points, with the weights corresponding to what we call representer values, which thus capture the importance of that training point on the learned parameters of the network. But it provides a deeper understanding of the network than simply training point influence: with positive representer values corresponding to excitatory training points, and negative values corresponding to inhibitory points, which as we show provides considerably more insight. Our method is also much more scalable, allowing for realtime feedback in a manner not feasible with influence functions. 
RepSet  In several domains, data objects can be decomposed into sets of simpler objects. It is then natural to represent each object as the set of its components or parts. Many conventional machine learning algorithms are unable to process this kind of representations, since sets may vary in cardinality and elements lack a meaningful ordering. In this paper, we present a new neural network architecture, called RepSet, that can handle examples that are represented as sets of vectors. The proposed model computes the correspondences between an input set and some hidden sets by solving a series of network flow problems. This representation is then fed to a standard neural network architecture to produce the output. The architecture allows endtoend gradientbased learning. We demonstrate RepSet on classification tasks, including text categorization, and graph classification, and we show that the proposed neural network achieves performance better or comparable to stateoftheart algorithms. 
REPT  Recently, considerable efforts have been devoted to approximately computing the global and local (i.e., incident to each node) triangle counts of a large graph stream represented as a sequence of edges. Existing approximate triangle counting algorithms rely on sampling techniques to reduce the computational cost. However, their estimation errors are significantly determined by the covariance between sampled triangles. Moreover, little attention has been paid to developing parallel onepass streaming algorithms that can be used to fast and approximately count triangles on a multicore machine or a cluster of machines. To solve these problems, we develop a novel parallel method REPT to significantly reduce the covariance (even completely eliminate the covariance for some cases) between sampled triangles. We theoretically prove that REPT is more accurate than parallelizing existing triangle count estimation algorithms in a direct manner. In addition, we also conduct extensive experiments on a variety of realworld graphs, and the results demonstrate that our method REPT is several times more accurate than stateoftheart methods. 
Reptile  This paper considers metalearning problems, where there is a distribution of tasks, and we would like to obtain an agent that performs well (i.e., learns quickly) when presented with a previously unseen task sampled from this distribution. We present a remarkably simple metalearning algorithm called Reptile, which learns a parameter initialization that can be finetuned quickly on a new task. Reptile works by repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task. Unlike MAML, which also learns an initialization, Reptile doesn’t require differentiating through the optimization process, making it more suitable for optimization problems where many update steps are required. We show that Reptile performs well on some wellestablished benchmarks for fewshot classification. We provide some theoretical analysis aimed at understanding why Reptile works. 
Repulsion Loss  Detecting individual pedestrians in a crowd remains a challenging problem since the pedestrians often gather together and occlude each other in realworld scenarios. In this paper, we first explore how a stateoftheart pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem. Then, we propose a novel bounding box regression loss specifically designed for crowd scenes, termed repulsion loss. This loss is driven by two motivations: the attraction by target, and the repulsion by other surrounding objects. The repulsion term prevents the proposal from shifting to surrounding objects thus leading to more crowdrobust localization. Our detector trained by repulsion loss outperforms all the stateoftheart methods with a significant improvement in occlusion cases. 
Reputation System  A reputation system computes and publishes reputation scores for a set of objects (e.g. service providers, services, goods or entities) within a community or domain, based on a collection of opinions that other entities hold about the objects. The opinions are typically passed as ratings to a central place where all perceptions, opinions and ratings accumulated. A reputation center which uses a specific reputation algorithm to dynamically compute the reputation scores based on the received ratings. Reputation is a sign of trustworthiness manifested as testimony by other people. New expectations and realities about the transparency, availability, and privacy of people and institutions are emerging. Reputation management – the selective exposure of personal information and activitires – is an important element to how people function in networks as they establish credentials, build trust with others, and garther information to deal with problems or make decisions. 
Res2Net  Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multiscale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multiscale features in a layerwise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residuallike connections within one single residual block. The Res2Net represents multiscale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the stateoftheart backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widelyused datasets, e.g., CIFAR100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the stateoftheart baseline methods. The source code and trained models will be made publicly available. 
Resampling  In statistics, resampling is any of a variety of methods for doing one of the following: 1.Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping) 2.Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or rerandomization tests) 3.Validating models by using random subsets (bootstrapping, cross validation) Common resampling techniques include bootstrapping, jackknifing and permutation tests. 
Resampling Uncertainty Estimation (RUE) 
To use machine learning in high stakes applications (e.g. medicine), we need tools for building confidence in the system and evaluating whether it is reliable. Methods to improve model reliability are often applied at train time (e.g. using Bayesian inference to obtain uncertainty estimates). An alternative is to audit a fixed model subsequent to training. In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions. Intuitively, RUE estimates the amount that a single prediction would change if the model had been fit on different training data drawn from the same distribution by using the gradient and Hessian of the model’s loss on training data. Experimentally, we show that RUE more effectively detects inaccurate predictions than existing tools for auditing reliability subsequent to training. We also show that RUE can create predictive distributions that are competitive with stateoftheart methods like Monte Carlo dropout, probabilistic backpropagation, and deep ensembles, but does not depend on specific algorithms at traintime like these methods do. 
ResBinNet  Recent efforts on training lightweight binary neural networks offer promising execution/memory efficiency. This paper introduces ResBinNet, which is a composition of two interlinked methodologies aiming to address the slow convergence speed and limited accuracy of binary convolutional neural networks. The first method, called residual binarization, learns a multilevel binary representation for the features within a certain neural network layer. The second method, called temperature adjustment, gradually binarizes the weights of a particular layer. The two methods jointly learn a set of softbinarized parameters that improve the convergence rate and accuracy of binary neural networks. We corroborate the applicability and scalability of ResBinNet by implementing a prototype hardware accelerator. The accelerator is reconfigurable in terms of the numerical precision of the binarized features, offering a tradeoff between runtime and inference accuracy. 
Rescaled Gradient Descent (RGD) 
The connection between continuoustime dynamics and discretetime algorithms has led to the introduction of many methods in optimization. We add to this body of work by introducing a family of descent dynamics and descent algorithms with matching nonasymptotic convergence guarantees. We use this framework to introduce a new firstorder algorithm, called rescaled gradient descent (RGD), and show that RGD achieves a faster convergence rate than gradient descent provided the function is strongly smooth – a natural generalization of the standard smoothness assumption on the objective function. When the objective function is convex, we present two frameworks for accelerating descent methods, one in the style of Nesterov and the other in the style of Monteiro and Svaiter. Rescaled gradient descent can be accelerated under the same strong smoothness assumption using both frameworks. We provide several examples of strongly smooth loss functions in machine learning and numerical experiments that verify our theoretical findings. 
Reservoir Computing  Dynamic spectrum access (DSA) is regarded as an effective and efficient technology to share radio spectrum among different networks. As a secondary user (SU), a DSA device will face two critical problems: avoiding causing harmful interference to primary users (PUs), and conducting effective interference coordination with other secondary users. These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs. In this paper, we investigate communication strategies of a distributive DSA network under the presence of spectrum sensing errors. To be specific, we apply the powerful machine learning tool, deep reinforcement learning (DRL), for SUs to learn ‘appropriate’ spectrum access strategies in a distributed fashion assuming NO knowledge of the underlying system statistics. Furthermore, a special type of recurrent neural network (RNN), called the reservoir computing (RC), is utilized to realize DRL by taking advantage of the underlying temporal correlation of the DSA network. Using the introduced machine learningbased strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes. Through extensive experiments, our results suggest that the RCbased spectrum access strategy can help the SU to significantly reduce the chances of collision with PUs and other SUs. We also show that our scheme outperforms the myopic method which assumes the knowledge of system statistics, and converges faster than the Qlearning method when the number of channels is large. 
Reservoir Sampling  Reservoir sampling is randomly pulling out a known number of examples from an unknown (or very large) pool of streaming items. 
ReSet  Neural Network is a powerful Machine Learning tool that shows outstanding performance in Computer Vision, Natural Language Processing, and Artificial Intelligence. In particular, recently proposed ResNet architecture and its modifications produce stateoftheart results in image classification problems. ResNet and most of the previously proposed architectures have a fixed structure and apply the same transformation to all input images. In this work, we develop a ResNetbased model that dynamically selects Computational Units (CU) for each input object from a learned set of transformations. Dynamic selection allows the network to learn a sequence of useful transformations and apply only required units to predict the image label. We compare our model to ResNet38 architecture and achieve better results than the original ResNet on CIFAR10.1 test set. While examining the produced paths, we discovered that the network learned different routes for images from different classes and similar routes for similar images. 
Residual Analysis  The analysis of residuals plays an important role in validating the regression model. If the error term in the regression model satisfies the four assumptions noted earlier, then the model is considered valid. Since the statistical tests for significance are also based on these assumptions, the conclusions resulting from these significance tests are called into question if the assumptions regarding epsilon are not satisfied. 
Residual Classification Flow (RCF) 
➚ “Multilevel Wavelet Decomposition Network” 
Residual Gated Graph ConvNet  Graphstructured data such as functional brain networks, social networks, gene regulatory networks, communications networks have brought the interest in generalizing neural networks to graph domains. In this paper, we are interested to de sign efficient neural network architectures for graphs with variable length. Several existing works such as Scarselli et al. (2009); Li et al. (2016) have focused on recurrent neural networks (RNNs) to solve this task. A recent different approach was proposed in Sukhbaatar et al. (2016), where a vanilla graph convolutional neural network (ConvNets) was introduced. We believe the latter approach to be a better paradigm to solve graph learning problems because ConvNets are more pruned to deep networks than RNNs. For this reason, we propose the most generic class of residual multilayer graph ConvNets that make use of an edge gating mechanism, as proposed in Marcheggiani & Titov (2017). Gated edges appear to be a natural property in the context of graph learning tasks, as the system has the ability to learn which edges are important or not for the task to solve. We apply several graph neural models to two basic network science tasks; subgraph matching and semisupervised clustering for graphs with variable length. Numerical results show the performances of the new model. 
Residual Hourglass Recurrent Neural Network (RHRNet) 
Most current speech enhancement models use spectrogram features that require an expensive transformation and result in phase information loss. Previous work has overcome these issues by using convolutional networks to learn longrange temporal correlations across highresolution waveforms. These models, however, are limited by memoryintensive dilated convolution and aliasing artifacts from upsampling. We introduce an endtoend fullyrecurrent hourglassshaped neural network architecture with residual connections for waveformbased singlechannel speech enhancement. Our model can efficiently capture longrange temporal dependencies by reducing the features resolution without information loss. Experimental results show that our model outperforms stateoftheart approaches in six evaluation metrics. 
Residual Memory Neural Network (RMN) 
Training deep recurrent neural network (RNN) architectures is complicated due to the increased network complexity. This disrupts the learning of higher order abstracts using deep RNN. In case of feedforward networks training deep structures is simple and faster while learning longterm temporal information is not possible. In this paper we propose a residual memory neural network (RMN) architecture to model shorttime dependencies using deep feedforward layers having residual and time delayed connections. The residual connection paves way to construct deeper networks by enabling unhindered flow of gradients and the time delay units capture temporal information with shared weights. The number of layers in RMN signifies both the hierarchical processing depth and temporal depth. The computational complexity in training RMN is significantly less when compared to deep recurrent networks. RMN is further extended as bidirectional RMN (BRMN) to capture both past and future information. Experimental analysis is done on AMI corpus to substantiate the capability of RMN in learning longterm information and hierarchical information. Recognition performance of RMN trained with 300 hours of Switchboard corpus is compared with various stateoftheart LVCSR systems. The results indicate that RMN and BRMN gains 6 % and 3.8 % relative improvement over LSTM and BLSTM networks. 
Residual Network (ResNet) 

Residual Policy Learning (RPL) 
We present Residual Policy Learning (RPL): a simple method for improving nondifferentiable policies using modelfree deep reinforcement learning. RPL thrives in complex robotic manipulation tasks where good but imperfect controllers are available. In these tasks, reinforcement learning from scratch remains datainefficient or intractable, but learning a residual on top of the initial controller can yield substantial improvement. We study RPL in five challenging MuJoCo tasks involving partial observability, sensor noise, model misspecification, and controller miscalibration. By combining learning with control algorithms, RPL can perform longhorizon, sparsereward tasks for which reinforcement learning alone fails. Moreover, we find that RPL consistently and substantially improves on the initial controllers. We argue that RPL is a promising approach for combining the complementary strengths of deep reinforcement learning and robotic control, pushing the boundaries of what either can achieve independently. 
Residual Ratio Thresholding Greedy Algorithm for Robust deNoising (RRTGARD) 
Linear regression models contaminated by Gaussian noise (inlier) and possibly unbounded sparse outliers are common in many signal processing applications. Sparse recovery inspired robust regression (SRIRR) techniques are shown to deliver high quality estimation performance in such regression models. Unfortunately, most SRIRR techniques assume \textit{a priori} knowledge of noise statistics like inlier noise variance or outlier statistics like number of outliers. Both inlier and outlier noise statistics are rarely known \textit{a priori} and this limits the efficient operation of many SRIRR algorithms. This article proposes a novel noise statistics oblivious algorithm called residual ratio thresholding GARD (RRTGARD) for robust regression in the presence of sparse outliers. RRTGARD is developed by modifying the recently proposed noise statistics dependent greedy algorithm for robust denoising (GARD). Both finite sample and asymptotic analytical results indicate that RRTGARD performs nearly similar to GARD with \textit{a priori} knowledge of noise statistics. Numerical simulations in real and synthetic data sets also point to the highly competitive performance of RRTGARD. 
Residual RNN (R2N2) 
Multivariate timeseries modeling and forecasting is an important problem with numerous applications. Traditional approaches such as VAR (vector autoregressive) models and more recent approaches such as RNNs (recurrent neural networks) are indispensable tools in modeling timeseries data. In many multivariate time series modeling problems, there is usually a significant linear dependency component, for which VARs are suitable, and a nonlinear component, for which RNNs are suitable. Modeling such times series with only VAR or only RNNs can lead to poor predictive performance or complex models with large training times. In this work, we propose a hybrid model called R2N2 (Residual RNN), which first models the time series with a simple linear model (like VAR) and then models its residual errors using RNNs. R2N2s can be trained using existing algorithms for VARs and RNNs. Through an extensive empirical evaluation on two real world datasets (aviation and climate domains), we show that R2N2 is competitive, usually better than VAR or RNN, used alone. We also show that R2N2 is faster to train as compared to an RNN, while requiring less number of hidden units. 
Residual Sum of Squares (RSS, SSR, SSE) 
In statistics, the residual sum of squares (RSS) is the sum of squares of residuals. It is also known as the sum of squared residuals (SSR) or the sum of squared errors of prediction (SSE). It is a measure of the discrepancy between the data and an estimation model. A small RSS indicates a tight fit of the model to the data. In general, total sum of squares = explained sum of squares + residual sum of squares. For a proof of this in the multivariate ordinary least squares (OLS) case, see partitioning in the general OLS model. 
Residual Transfer Network (RTN) 
The recent success of deep neural networks relies on massive amounts of labeled data. For a target task where labeled data is unavailable, domain adaptation can transfer a learner from a different source domain. In this paper, we propose a new approach to domain adaptation in deep networks that can jointly learn adaptive classifiers and transferable features from labeled data in the source domain and unlabeled data in the target domain. We relax a sharedclassifier assumption made by previous methods and assume that the source classifier and target classifier differ by a residual function. We enable classifier adaptation by plugging several layers into deep network to explicitly learn the residual function with reference to the target classifier. We fuse features of multiple layers with tensor product and embed them into reproducing kernel Hilbert spaces to match distributions for feature adaptation. The adaptation can be achieved in most feedforward models by extending them with new residual layers and loss functions, which can be trained efficiently via backpropagation. Empirical evidence shows that the new approach outperforms state of the art methods on standard domain adaptation benchmarks. 
Residualbased Predictiveness Curve (RBP Curve) 
A visual tool, the RBP curve, to assess the performance of prediction models. RBPcurve 
ResidualinResidual Dense Block (RRDB) 
In this paper we propose a deep residual autoencoder exploiting ResidualinResidual Dense Blocks (RRDB) to remove artifacts in JPEG compressed images that is independent from the Quality Factor (QF) used. The proposed approach leverages both the learning capacity of deep residual networks and prior knowledge of the JPEG compression pipeline. The proposed model operates in the YCbCr color space and performs JPEG artifact restoration in two phases using two different autoencoders: the first one restores the luma channel exploiting 2D convolutions; the second one, using the restored luma channel as a guide, restores the chroma channels explotining 3D convolutions. Extensive experimental results on three widely used benchmark datasets (i.e. LIVE1, BDS500, and CLASSIC5) show that our model is able to outperform the state of the art with respect to all the evaluation metrics considered (i.e. PSNR, PSNRB, and SSIM). This results is remarkable since the approaches in the state of the art use a different set of weights for each compression quality, while the proposed model uses the same weights for all of them, making it applicable to images in the wild where the QF used for compression is unkwnown. Furthermore, the proposed model shows a greater robustness than stateoftheart methods when applied to compression qualities not seen during training. 
ReSIFT  This paper presents a fullreference image quality estimator based on SIFT descriptor matching over reliabilityweighted feature maps. Reliability assignment includes a smoothing operation, a transformation to perceptual color domain, a local normalization stage, and a spectral residual computation with global normalization. The proposed method ReSIFT is tested on the LIVE and the LIVE Multiply Distorted databases and compared with 11 stateoftheart fullreference quality estimators. In terms of the Pearson and the Spearman correlation, ReSIFT is the best performing quality estimator in the overall databases. Moreover, ReSIFT is the best performing quality estimator in at least one distortion group in compression, noise, and blur category. 
Resilience  We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a weaker condition than most other properties considered so far in the literature, and yet enables robust estimation in a broader variety of settings, including the previously unstudied problem of robust mean estimation in $\ell_p$norms. 
Resilient Computing  The term resiliency has been used in many fields like child psychology, ecology, business, and several others, with the common meaning of expressing the ability to successfully accommodate unforeseen environmental perturbations or disturbances. The adjective resilient has been in use for decades in the field of dependable computing systems however essentially as a synonym of faulttolerant, thus generally ignoring the unexpected aspect of the phenomena the systems may have to face. These phenomena become of primary relevance when moving to systems like the future large, networked, evolving systems constituting complex information infrastructures – perhaps involving everything from supercomputers and huge server ‘farms’ to myriads of small mobile computers and tiny embedded devices, with humans being central part of the operation of such systems. Such systems are in fact the dawning of the ubiquitous systems that will support Ambient Intelligence. With such ubiquitous systems, what is at stake is to maintain dependability, i.e., the ability to deliver service that can justifiably be trusted, in spite of continuous changes. Therefore the term resilience and resilient computing can be applied to the design of ubiquitous systems and defined as the search for the following property: the persistence of service delivery that can justifiably be trusted, when facing changes. 
Resilient Distributed Dataset (RDD,RDDS) 
Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform inmemory computations on large clusters in a faulttolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarsegrained transformations rather than finegrained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. Formally, an RDD is a readonly, partitioned collection of records. RDDs can only be created through deterministic operations on either (1) data in stable storage or (2) other RDDs. We call these operations transformations to differentiate them from other operations on RDDs. Examples of transformations include map, filter, and join.2 RDDs do not need to be materialized at all times. Instead, an RDD has enough information about how it was derived from other datasets (its lineage) to compute its partitions from data in stable storage. This is a powerful property: in essence, a program cannot reference an RDD that it cannot reconstruct after a failure. Finally, users can control two other aspects of RDDs: persistence and partitioning. Users can indicate which RDDs they will reuse and choose a storage strategy for them (e.g., inmemory storage). They can also ask that an RDD’s elements be partitioned across machines based on a key in each record. This is useful for placement optimizations, such as ensuring that two datasets that will be joined together are hashpartitioned in the same way. 
Resilient Linear Classification  Datadriven techniques are used in cyberphysical systems (CPS) for controlling autonomous vehicles, handling demand responses for energy management, and modeling human physiology for medical devices. These datadriven techniques extract models from training data, where their performance is often analyzed with respect to random errors in the training data. However, if the training data is maliciously altered by attackers, the effect of these attacks on the learning algorithms underpinning datadriven CPS have yet to be considered. In this paper, we analyze the resilience of classification algorithms to training data attacks. Specifically, a generic metric is proposed that is tailored to measure resilience of classification algorithms with respect to worstcase tampering of the training data. Using the metric, we show that traditional linear classification algorithms are resilient under restricted conditions. To overcome these limitations, we propose a linear classification algorithm with a majority constraint and prove that it is strictly more resilient than the traditional algorithms. Evaluations on both synthetic data and a realworld retrospective arrhythmia medical casestudy show that the traditional algorithms are vulnerable to tampered training data, whereas the proposed algorithm is more resilient (as measured by worstcase tampering). 
ResNet of ResNet (RoR) 
Review: RoR – ResNet of ResNet / Multilevel ResNet (Image Classification) 
Resource Description Framework (RDF) 
RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed. RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”). Using this simple model, it allows structured and semistructured data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easytounderstand visual explanations. 
ResourceAware Storm (RStorm) 
The era of big data has led to the emergence of new systems for realtime distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems lacks an intelligent scheduling mechanism. The default roundrobin scheduling currently deployed in Storm disregards resource demands and availability, and can therefore be inefficient at times. We present RStorm (ResourceAware Storm), a system that implements resourceaware scheduling within Storm. RStorm is designed to increase overall throughput by maximizing resource utilization while minimizing network latency. When scheduling tasks, RStorm can satisfy both soft and hard resource constraints as well as minimizing network distance between components that communicate with each other. We evaluate RStorm on set of microbenchmark Storm applications as well as Storm applications used in production at Yahoo! Inc. From our experimental results we conclude that RStorm achieves 3047% higher throughput and 69350% better CPU utilization than default Storm for the microbenchmarks. For the Yahoo! Storm applications, RStorm outperforms default Storm by around 50% based on overall throughput. We also demonstrate that RStorm performs much better when scheduling multiple Storm applications than default Storm. 
ResourceEfficient Neural Architect (RENA) 
Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy, but lacks consideration of computational resource use. We propose the ResourceEfficient Neural Architect (RENA), an efficient resourceconstrained NAS using reinforcement learning with network embedding. RENA uses a policy network to process the network embeddings to generate new configurations. We demonstrate RENA on image recognition and keyword spotting (KWS) problems. RENA can find novel architectures that achieve high performance even with tight resource constraints. For CIFAR10, it achieves 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size is less than 3M parameters. For Google Speech Commands Dataset, RENA achieves the stateoftheart accuracy without resource constraints, and it outperforms the optimized architectures with tight resource constraints. 
RESPCA  Robust principal component analysis (RPCA) has drawn significant attentions due to its powerful capability in recovering lowrank matrices as well as successful appplications in various real world problems. The current stateoftheart algorithms usually need to solve singular value decomposition of large matrices, which generally has at least a quadratic or even cubic complexity. This drawback has limited the application of RPCA in solving real world problems. To combat this drawback, in this paper we propose a new type of RPCA method, RESPCA, which is linearly efficient and scalable in both data size and dimension. For comparison purpose, AltProj, an existing scalable approach to RPCA requires the precise knowlwdge of the true rank; otherwise, it may fail to recover lowrank matrices. By contrast, our method works with or without knowing the true rank; even when both methods work, our method is faster. Extensive experiments have been performed and testified to the effectiveness of proposed method quantitatively and in visual quality, which suggests that our method is suitable to be employed as a lightweight, scalable component for RPCA in any application pipelines. 
Respondent Driven Sampling (RDS) 
Respondentdriven sampling (RDS), combines “snowball sampling” (getting individuals to refer those they know, these individuals in turn refer those they know and so on) with a mathematical model that weights the sample to compensate for the fact that the sample was collected in a nonrandom way. RDS represents an advance in sampling methodology because it resolves what had previously been an intractable dilemma, a dilemma that is especially severe when sampling hardtoreach groups, that is, groups that are small relative to the general population, and for which no exhaustive list of population members is available. This includes groups relevant to public health, such as drug injectors, prostitutes, and gay men, groups relevant to public policy such as street youth and the homeless, and groups relevant to arts and culture such as jazz musicians and other performance and expressive artists. The dilemma is that if a study focuses only on the most accessible part of the target population, standard probability sampling methods can be used but coverage of the target population is limited. For example, drug injectors can be sampled from needle exchanges and from the streets on which drugs are sold, but this approach misses many women, youth, and those who only recently started injecting. Therefore, a statistically representative sample is drawn of an unrepresentative part of the target population, so conclusions cannot be validly made about the entirety of the target population…. RDS 
Response Surface Method (RSM) 
In statistics, response surface methodology (RSM) explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an optimal response. Box and Wilson suggest using a seconddegree polynomial model to do this. They acknowledge that this model is only an approximation, but use it because such a model is easy to estimate and apply, even when little is known about the process. https://…/NBradley_thesis.pdf http://…/9783662462133 
ResSENet  One of the ways to train deep neural networks effectively is to use residual connections. Residual connections can be classified as being either identity connections or bridgeconnections with a reshaping convolution. Empirical observations on CIFAR10 and CIFAR100 datasets using a baseline Resnet model, with bridgeconnections removed, have shown a significant reduction in accuracy. This reduction is due to lack of contribution, in the form of feature maps, by the bridgeconnections. Hence bridgeconnections are vital for Resnet. However, all feature maps in the bridgeconnections are considered to be equally important. In this work, an upgraded architecture ‘ResSENet’ is proposed to further strengthen the contribution from the bridgeconnections by quantifying the importance of each feature map and weighting them accordingly using SqueezeandExcitation (SE) block. It is demonstrated that ResSENet generalizes much better than Resnet and SEResnet on the benchmark CIFAR10 and CIFAR100 datasets. 
Restless Bandit  
ReStoCNet  In this work, we propose ReStoCNet, a residual stochastic multilayer convolutional Spiking Neural Network (SNN) composed of binary kernels, to reduce the synaptic memory footprint and enhance the computational efficiency of SNNs for complex pattern recognition tasks. ReStoCNet consists of an input layer followed by stacked convolutional layers for hierarchical input feature extraction, pooling layers for dimensionality reduction, and fullyconnected layer for inference. In addition, we introduce residual connections between the stacked convolutional layers to improve the hierarchical feature learning capability of deep SNNs. We propose Spike Timing Dependent Plasticity (STDP) based probabilistic learning algorithm, referred to as HybridSTDP (HBSTDP), incorporating Hebbian and antiHebbian learning mechanisms, to train the binary kernels forming ReStoCNet in a layerwise unsupervised manner. We demonstrate the efficacy of ReStoCNet and the presented HBSTDP based unsupervised training methodology on the MNIST and CIFAR10 datasets. We show that residual connections enable the deeper convolutional layers to selflearn useful highlevel input features and mitigate the accuracy loss observed in deep SNNs devoid of residual connections. The proposed ReStoCNet offers >20x kernel memory compression compared to fullprecision (32bit) SNN while yielding high enough classification accuracy on the chosen pattern recognition tasks. 
RESTORE  In data mining, the data in various business cases (e.g., sales, marketing, and demography) gets refreshed periodically. During the refresh, the old dataset is replaced by a new one. Confirming the quality of the new dataset can be challenging because changes are inevitable. How do analysts distinguish reasonable realworld changes vs. errors related to data capture or data transformation? While some of the errors are easy to spot, the others may be more subtle. In order to detect such types of errors, an analyst will typically have to examine the data manually and assess if the data produced are ‘believable’. Due to the scale of data, such examination is tedious and laborious. Thus, to save the analyst’s time, it is important to detect these errors automatically. However, both the literature and the industry are still lacking methods to assess the difference between old and new versions of a dataset during the refresh process. In this paper, we present a comprehensive set of tests for the detection of abnormalities in a refreshed dataset, based on the information obtained from a previous vintage of the dataset. We implement these tests in automated test harness made available as an opensource package, called RESTORE, for R language. The harness accepts flat or hierarchical numeric datasets. We also present a validation case study, where we apply our test harness to hierarchical demographic datasets. The results of the study and feedback from data scientists using the package suggest that RESTORE enables fast and efficient detection of errors in the data as well as decreases the cost of testing. 
Restricted Boltzmann Machine (RBM) 
A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. RBMs were initially invented under the name Harmonium by Paul Smolensky in 1986, but only rose to prominence after Geoffrey Hinton and collaborators invented fast learning algorithms for them in the mid2000s. RBMs have found applications in dimensionality reduction, classification, collaborative filtering, feature learning and topic modelling. They can be trained in either supervised or unsupervised ways, depending on the task. 
Restricted Connection Orthogonal Matching Pursuit for Sparse Subspace Clustering (RCOMPSSC) 
Sparse Subspace Clustering (SSC) is one of the most popular methods for clustering data points into their underlying subspaces. However, SSC may suffer from heavy computational burden. Orthogonal Matching Pursuit applied on SSC accelerates the computation but the tradeoff is the loss of clustering accuracy. In this paper, we propose a noiserobust algorithm, Restricted Connection Orthogonal Matching Pursuit for Sparse Subspace Clustering (RCOMPSSC), to improve the clustering accuracy and maintain the low computational time by restricting the number of connections of each data point during the iteration of OMP. Also, we develop a framework of control matrix to realize RCOMPSCC. And the framework is scalable for other data point selection strategies. Our analysis and experiments on synthetic data and two realworld databases (EYaleB & Usps) demonstrate the superiority of our algorithm compared with other clustering methods in terms of accuracy and computational time. 
Restricted Maximum Likelihood (REML) 
In statistics, the restricted (or residual, or reduced) maximum likelihood (REML) approach is a particular form of maximum likelihood estimation which does not base estimates on a maximum likelihood fit of all the information, but instead uses a likelihood function calculated from a transformed set of data, so that nuisance parameters have no effect. In the case of variance component estimation, the original data set is replaced by a set of contrasts calculated from the data, and the likelihood function is calculated from the probability distribution of these contrasts, according to the model for the complete data set. In particular, REML is used as a method for fitting linear mixed models. In contrast to the earlier maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters. The idea underlying REML estimation was put forward by M. S. Bartlett in 1937. The first description of the approach applied to estimating components of variance in unbalanced data was by Desmond Patterson and Robin Thompson of the University of Edinburgh, although they did not use the term REML. A review of the early literature was given by Harville. REML estimation is available in a number of generalpurpose statistical software packages, including Genstat (the REML directive), SAS (the MIXED procedure), SPSS (the MIXED command), Stata (the mixed command), and R (the lme4 and older nlme packages), as well as in more specialist packages such as MLwiN, HLM, ASReml, Statistical Parametric Mapping and CropStat. 
Restricted Mean Survival Time (RMST) 
RMST = area under the survival curve up to t* · Can think of it as the ‘t*year life expectancy’ · A patient might be told that ‘your life expectancy with Z disease on X treatment over the next 18 months is 9 months’ · Or, ‘treatment A increases your life expectancy during the next 18 months by 2 months, compared with treatment B’ http://…tomerchurnrestrictedmeansurvivaltime survRM2 
Restricted Mean Survivor Average Causal Effect (RMSACE) 
In semicompeting risks problems, nonterminal timetoevent outcomes such as time to hospital readmission are subject to truncation by death. These settings are often modeled with illnessdeath models for the hazards of the terminal and nonterminal events, but evaluating causal treatment effects with hazard models is problematic due to conditioning on survival (a posttreatment outcome) that is embedded in the definition of a hazard. Extending an existing survivor average causal effect (SACE) estimand, we frame the evaluation of treatment effects in the context of semicompeting risks with principal stratification and introduce two new causal estimands: the timevarying survivor average causal effect (TVSACE) and the restricted mean survivor average causal effect (RMSACE). These principal causal effects are defined among units that would survive regardless of assigned treatment. We adopt a Bayesian estimation procedure that parameterizes illnessdeath models for both treatment arms. We outline a frailty specification that can accommodate withinperson correlation between nonterminal and terminal event times, and we discuss potential avenues for adding model flexibility. The method is demonstrated in the context of hospital readmission among latestage pancreatic cancer patients. 
Restricted Recurrent Neural Tensor Networks (RNTN) 
Increasing the capacity of recurrent neural networks (RNN) usually involves augmenting the size of the hidden layer, resulting in a significant increase of computational cost. An alternative is the recurrent neural tensor network (RNTN), which increases capacity by employing distinct hidden layer weights for each vocabulary word. The disadvantage of RNTNs is that memory usage scales linearly with vocabulary size, which can reach millions for wordlevel language models. In this paper, we introduce restricted recurrent neural tensor networks (rRNTN) which reserve distinct hidden layer weights for frequent vocabulary words while sharing a single set of weights for infrequent words. Perplexity evaluations using the Penn Treebank corpus show that rRNTNs improve language model performance over standard RNNs using only a small fraction of the parameters of unrestricted RNTNs. 
Restrictive Federated Model Selection (RFMS) 
A novel machine learning optimization process coined Restrictive Federated Model Selection (RFMS) is proposed under the scenario, for example, when data from healthcare units can not leave the site it is situated on and it is forbidden to carry out training algorithms on remote data sites due to either technical or privacy and trust concerns. To carry out a clinical research under this scenario, an analyst could train a machine learning model only on local data site, but it is still possible to execute a statistical query at a certain cost in the form of sending a machine learning model to some of the remote data sites and get the performance measures as feedback, maybe due to prediction being usually much cheaper. Compared to federated learning, which is optimizing the model parameters directly by carrying out training across all data sites, RFMS trains model parameters only on one local data site but optimizes hyperparameters across other data sites jointly since hyperparameters play an important role in machine learning performance. The aim is to get a Pareto optimal model with respective to both local and remote unseen prediction losses, which could generalize well across data sites. In this work, we specifically consider high dimensional data with shifted distributions over data sites. As an initial investigation, Bayesian Optimization especially multiobjective Bayesian Optimization is used to guide an adaptive hyperparameter optimization process to select models under the RFMS scenario. Empirical results show that solely using the local data site to tune hyperparameters generalizes poorly across data sites, compared to methods that utilize the local and remote performances. Furthermore, in terms of dominated hypervolumes, multiobjective Bayesian Optimization algorithms show increased performance across multiple data sites among other candidates. 
ResUNeta  Scene understanding of high resolution aerial images is of great importance for the task of automated monitoring in various remote sensing applications. Due to the large withinclass and small betweenclass variance in pixel values of objects of interest, this remains a challenging task. In recent years, deep convolutional neural networks have started being used in remote sensing applications and demonstrate stateoftheart performance for pixel level classification of objects. Here we present a novel deep learning architecture, ResUNeta, that combines ideas from various stateoftheart modules used in computer vision for semantic segmentation tasks. We analyse the performance of several flavours of the Generalized Dice loss for semantic segmentation, and we introduce a novel variant loss function for semantic segmentation of objects that has better convergence properties and behaves well even under the presence of highly imbalanced classes. The performance of our modelling framework is evaluated on the ISPRS 2D Potsdam dataset. Results show stateoftheart performance with an average F1 score of 92.1\% over all classes for our best model. 
Retainable Evaluator Execution Framework (REEF) 
REEF (Retainable Evaluator Execution Framework) is our approach to simplify and unify the lower layers of big data systems on modern resource managers. For managers like Apache YARN, Apache Mesos, Google Omega, and Facebook Corona, REEF provides a centralized control plane abstraction that can be used to build a decentralized data plane for supporting big data systems. Special consideration is given to graph computation and machine learning applications, both of which require data retention on allocated resources to execute multiple passes over the data. More broadly, applications that run on YARN will have the need for a variety of dataprocessing tasks e.g., data shuffle, group communication, aggregation, checkpointing, and many more. Rather than reimplement these for each application, REEF aims to provide them in a library form, so that they can be reused by higherlevel applications and tuned for a specific domain problem e.g., Machine Learning. In that sense, our longterm vision is that REEF will mature into a Big Data Application Server, that will host a variety of tool kits and applications, on modern resource managers. 
Retecs  Testing in Continuous Integration (CI) involves test case prioritization, selection, and execution at each cycle. Selecting the most promising test cases to detect bugs is hard if there are uncertainties on the impact of committed code changes or, if traceability links between code and tests are not available. This paper introduces Retecs, a new method for automatically learning test case selection and prioritization in CI with the goal to minimize the roundtrip time between code commits and developer feedback on failed test cases. The Retecs method uses reinforcement learning to select and prioritize test cases according to their duration, previous last execution and failure history. In a constantly changing environment, where new test cases are created and obsolete test cases are deleted, the Retecs method learns to prioritize errorprone test cases higher under guidance of a reward function and by observing previous CI cycles. By applying Retecs on data extracted from three industrial case studies, we show for the first time that reinforcement learning enables fruitful automatic adaptive test case selection and prioritization in CI and regression testing. 
RethinkDB  RethinkDB is an open source noSQL database that stores JSON documents. This can be great for open ended data analytics. The company officially provides drivers for Ruby, Python and NodeJS and community supported drivers and ORMs are available in around a dozen languages. The production ready version 2.0 was released very recently on April 14, 2015 after 5 years of development. RethinkDB is a boon when it comes to writing real time applications. Traditionally applications had to poll data bases to get the updated data which made them slow and hard to maintain. RethinkDB’s architecture solves this problem by pushing the updated results of a query when they are available. Apart from solving real time data push problem RethinkDB offers many advantages such as: · Its advanced query language, ReQL, supports table joins and subqueries. The monitoring api also integrates with the query language, this makes scaling distributed databases very easy. · Unlike some previous noSQL systems RethinkDB never acknowledges a write until it’s safely written to the disk. · Additionally, the database supports Mapreduce functionality out of the box & would not need an additional Hadoop type software to run the analysis. http://…/rethinkdbforyouradvancedanalytics 
RetrievalEnhanced Adversarial Training (REAT) 
Dialogue systems are usually built on either generationbased or retrievalbased approaches, yet they do not benefit from the advantages of different models. In this paper, we propose a RetrievalEnhanced Adversarial Training (REAT) method for neural response generation. Distinct from existing approaches, the REAT method leverages an encoderdecoder framework in terms of an adversarial training paradigm, while taking advantage of Nbest response candidates from a retrievalbased system to construct the discriminator. An empirical study on a large scale public available benchmark dataset shows that the REAT method significantly outperforms the vanilla Seq2Seq model as well as the conventional adversarial training approach. 
REtrospective and PRospective Inference SchEme (REPRISE) 
We introduce a dynamic artificial neural networkbased (ANN) adaptive inference process, which learns temporal predictive models of dynamical systems. We term the process REPRISE, a REtrospective and PRospective Inference SchEme. REPRISE infers the unobservable contextual state that best explains its recently encountered sensorimotor experiences as well as accompanying, contextdependent temporal predictive models retrospectively. Meanwhile, it executes prospective inference, optimizing upcoming motor activities in a goaldirected manner. In a first implementation, a recurrent neural network (RNN) is trained to learn a temporal forward model, which predicts the sensorimotor contingencies of different simulated dynamic vehicles. The RNN is augmented with contextual neurons, which enable the compact encoding of distinct, but related sensorimotor dynamics. We show that REPRISE is able to concurrently learn to separate and approximate the encountered sensorimotor dynamics. Moreover, we show that REPRISE can exploit the learned model to induce goaldirected, modelpredictive control, that is, approximate active inference: Given a goal state, the system imagines a motor command sequence optimizing it with the prospective objective to minimize the distance to a given goal. Meanwhile, the system evaluates the encountered sensorimotor contingencies retrospectively, adapting its neural hidden states for maintaining model coherence. The RNN activities thus continuously imagine the upcoming future and reflect on the recent past, optimizing both, hidden state and motor activities. In conclusion, the combination of temporal predictive structures with modulatory, generative encodings offers a way to develop compact event codes, which selectively activate particular types of sensorimotor eventspecific dynamics. 
Retrospective Convolution  Change detection has been a challenging visual task due to the dynamic nature of realworld scenes. Good performance of existing methods depends largely on prior background images or a longterm observation. These methods, however, suffer severe degradation when they are applied to detection of instantaneously occurred changes with only a few preceding frames provided. In this paper, we exploit spatiotemporal convolutional networks to address this challenge, and propose a novel retrospective convolution, which features efficient change information extraction between the current frame and frames from historical observation. To address the problem of foregroundspecific overfitting in learningbased methods, we further propose a data augmentation method, named static sample synthesis, to guide the network to focus on learning changecued information rather than specific spatial features of foreground. Trained endtoend with complex scenarios, our framework proves to be accurate in detecting instantaneous changes and robust in combating diverse noises. Extensive experiments demonstrate that our proposed method significantly outperforms existing methods. 
Return Decomposition for Delayed Rewards (RUDDER) 
We propose a novel reinforcement learning approach for finite Markov decision processes (MDPs) with delayed rewards. In this work, biases of temporal difference (TD) estimates are proved to be corrected only exponentially slowly in the number of delay steps. Furthermore, variances of Monte Carlo (MC) estimates are proved to increase the variance of other estimates, the number of which can exponentially grow in the number of delay steps. We introduce RUDDER, a return decomposition method, which creates a new MDP with same optimal policies as the original MDP but with redistributed rewards that have largely reduced delays. If the return decomposition is optimal, then the new MDP does not have delayed rewards and TD estimates are unbiased. In this case, the rewards track Qvalues so that the future expected reward is always zero. We experimentally confirm our theoretical results on bias and variance of TD and MC estimates. On artificial tasks with different lengths of reward delays, we show that RUDDER is exponentially faster than TD, MC, and MC Tree Search (MCTS). RUDDER outperforms rainbow, A3C, DDQN, Distributional DQN, Dueling DDQN, Noisy DQN, and Prioritized DDQN on the delayed reward Atari game Venture in only a fraction of the learning time. RUDDER considerably improves the stateoftheart on the delayed reward Atari game Bowling in much less learning time. Source code is available at https://…/baselinesrudder, with demonstration videos at https://goo.gl/EQerZV. 
Return on Data Assets (RDA) 
The return on data assets is a measure of how efficiently an organization is able to generate profits from their inventory of data. Creating visual representations of the data is one emerging technique to help company owners make sense of the immense volumes of raw data within their organization. By having data properly represented, company owners make better business decisions such as revenue lines that can be leveraged, costs that can be eliminated, or divisions that should be shut down – all of this creates value, and ultimately leads to higher returns and a higher sales multiple when selling a company. 
RetweetBuster (RTbust) 
Within OSNs, many of our supposedly online friends may instead be fake accounts called social bots, part of large groups that purposely reshare targeted content. Here, we study retweeting behaviors on Twitter, with the ultimate goal of detecting retweeting social bots. We collect a dataset of 10M retweets. We design a novel visualization that we leverage to highlight benign and malicious patterns of retweeting activity. In this way, we uncover a ‘normal’ retweeting pattern that is peculiar of humanoperated accounts, and 3 suspicious patterns related to bot activities. Then, we propose a bot detection technique that stems from the previous exploration of retweeting behaviors. Our technique, called RetweetBuster (RTbust), leverages unsupervised feature extraction and clustering. An LSTM autoencoder converts the retweet time series into compact and informative latent feature vectors, which are then clustered with a hierarchical densitybased algorithm. Accounts belonging to large clusters characterized by malicious retweeting patterns are labeled as bots. RTbust obtains excellent detection results, with F1 = 0.87, whereas competitors achieve F1 < 0.76. Finally, we apply RTbust to a large dataset of retweets, uncovering 2 previously unknown active botnets with hundreds of accounts. 
Reverse CuthillMcKee (RCM) 
Ordering vertices of a graph is key to minimize fillin and data structure size in sparse direct solvers, maximize locality in iterative solvers, and improve performance in graph algorithms. Except for naturally parallelizable ordering methods such as nested dissection, many important ordering methods have not been efficiently mapped to distributedmemory architectures. In this paper, we present the firstever distributedmemory implementation of the reverse CuthillMcKee (RCM) algorithm for reducing the profile of a sparse matrix. Our parallelization uses a twodimensional sparse matrix decomposition. We achieve high performance by decomposing the problem into a small number of primitives and utilizing optimized implementations of these primitives. Our implementation shows strong scaling up to 1024 cores for smaller matrices and up to 4096 cores for larger matrices. 
Reversible Data Hiding (RDH) 
Reversible data hiding (RDH) is one special type of information hiding, by which the host sequence as well as the embedded data can be both restored from the marked sequence without loss. Beside media annotation and integrity authentication, recently some scholars begin to apply RDH in many other fields innovatively. 
Reversible Neural Network (RevNet) 
Generative models with an encoding component such as autoencoders currently receive great interest. However, training of autoencoders is typically complicated by the need for training of a separate encoder and decoder model that have to be enforced to be reciprocal to each other. Here, we propose to use the bydesign reversible neural networks (RevNets) as a new class of generative models. We investigate the generative performance of RevNets on the CelebA dataset, showing that generative RevNets can indeed generate coherent faces with similar quality as Variational Autoencoders. This first attempt to use RevNets as a generative model still slightly underperformed relative to recent advanced generative models using an autoencoder component on CelebA, but this gap may diminish with further optimization of the training setup of generative RevNets. In addition to the experiments on CelebA, we show a proofofprinciple experiment on the MNIST dataset suggesting that adversaryfree trained RevNets can discover meaningful dimensions without prespecifying the number of latent dimensions of the sampling distribution. In summary, this study shows that RevNets enable generative applications with an encoding component while overcoming the need of training separate encoder and decoder models. 
Reversible Recurrent Neural Network  Recurrent neural networks (RNNs) provide stateoftheart performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. Reversible RNNs—RNNs for which the hiddentohidden transition can be reversed—offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation. We first show that perfectly reversible RNNs, which require no storage of the hidden activations, are fundamentally limited because they cannot forget information from their hidden state. We then provide a scheme for storing a small number of bits in order to allow perfect reversal with forgetting. Our method achieves comparable performance to traditional models while reducing the activation memory cost by a factor of 10–15. We extend our technique to attentionbased sequencetosequence models, where it maintains performance while reducing activation memory cost by a factor of 5–10 in the encoder, and a factor of 10–15 in the decoder. 
Review Reading Comprehension (RRC) 
Questionanswering plays an important role in ecommerce as it allows potential customers to actively seek crucial information about products or services to help their purchase decision making. Inspired by the recent success of machine reading comprehension (MRC) on formal documents, this paper explores the potential of turning customer reviews into a large source of knowledge that can be exploited to answer user questions.~We call this problem Review Reading Comprehension (RRC). To the best of our knowledge, no existing work has been done on RRC. In this work, we first build an RRC dataset called ReviewRC based on a popular benchmark for aspectbased sentiment analysis. Since ReviewRC has limited training examples for RRC (and also for aspectbased sentiment analysis), we then explore a novel posttraining approach on the popular language model BERT to enhance the performance of finetuning of BERT for RRC. To show the generality of the approach, the proposed posttraining is also applied to some other reviewbased tasks such as aspect extraction and aspect sentiment classification in aspectbased sentiment analysis. Experimental results demonstrate that the proposed posttraining is highly effective. The datasets and code are available at https://…/. 
revisit  In recent years there has been widespread concern in the scientific community over a reproducibility crisis. Among the major causes that have been identified is statistical: In many scientific research the statistical analysis (including data preparation) suffers from a lack of transparency and methodological problems, major obstructions to reproducibility. The revisit package aims toward remedying this problem, by generating a ‘software paper trail’ of the statistical operations applied to a dataset. This record can be ‘replayed’ for verification purposes, as well as be modified to enable alternative analyses. The software also issues warnings of certain kinds of potential errors in statistical methodology, again related to the reproducibility issue. 
Reward Augmented Maximum Likelihood (RAML) 

RGPM  In this work we propose RGPM, a parallel computing framework for graph pattern mining (GPM) through a userdefined subgraph relation. More specifically, we enable the computation of statistics of patterns through their subgraph classes, generalizing traditional GPM methods. RGPM provides efficient estimators for these statistics by employing a MCMC sampling algorithm combined with several optimizations. We provide both theoretical guarantees and empirical evaluations of our estimators in application scenarios such as stochastic optimization of deep highorder graph neural network models and pattern (motif) counting. We also propose and evaluate optimizations that enable improvements of our estimators accuracy, while reducing their computational costs in up to 3ordersofmagnitude. Finally,we show that RGPM is scalable, providing nearlinear speedups on 44 cores in all of our tests. 
RGrams  This paper introduces a novel type of datadriven segmented unit that we call rgrams. We illustrate one algorithm for calculating rgrams, and discuss its properties and impact on the frequency distribution of text representations. The proposed approach is evaluated by demonstrating its viability in embedding techniques, both in monolingual and multilingual test settings. We also provide a number of qualitative examples of the proposed methodology, demonstrating its viability as a languageinvariant segmentation procedure. 
Rheem  Today, organizations typically perform tedious and costly tasks to juggle their code and data across different data processing platforms. Addressing this pain and achieving automatic crossplatform data processing is quite challenging because it requires quite good expertise for all the available data processing platforms. In this report, we present Rheem, a generalpurpose crossplatform data processing system that alleviates users from the pain of finding the most efficient data processing platform for a given task. It also splits a task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). To offer crossplatform functionality, it features (i) a robust interface to easily compose data analytic tasks; (ii) a novel costbased optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms. As a result, it allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Rheem is released under an open source license. 
RHEEMix  In pursuit of efficient and scalable data analytics, the insight that ‘one size does not fit all’ has given rise to a plethora of specialized data processing platforms and today’s complex data analytics are moving beyond the limits of a single platform. To cope with these new requirements, we present a crossplatform optimizer that allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i)~a mechanism based on graph transformations to explore alternative execution strategies; (ii)~a novel graphbased approach to efficiently plan data movement among subtasks and platforms; and (iii)~an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. The results show that our optimizer is capable of selecting the most efficient platform combination for a given task, freeing data analysts from the need to choose and orchestrate platforms. In particular, our optimizer allows certain tasks to run more than one order of magnitude faster than on stateoftheart platforms, such as Spark. 
RHIPE  RHIPE is a R package which provides an API to use Hadoop, similar to Rhadoop. RHIPE 
RHub  The infrastructure available for developing, building, testing, and validating R packages is of critical importance to the R community. CRAN and RForge have traditionally met these needs, however the maintenance and enhancement of RForge has significant costs in both money and time. This proposal outlines rhub, a service that is complementary to CRAN and RForge, that would add capabilities, improve extensibility, and create a platform for community contributions to rhub itself. 
RICE  By their nature, the composition of black box models is opaque. This makes the ability to generate explanations for the response to stimuli challenging. The importance of explaining black box models has become increasingly important given the prevalence of AI and ML systems and the need to build legal and regulatory frameworks around them. Such explanations can also increase trust in these uncertain systems. In our paper we present RICE, a method for generating explanations of the behaviour of black box models by (1) probing a model to extract model output examples using sensitivity analysis; (2) applying CNPInduce, a method for inductive logic program synthesis, to generate logic programs based on critical inputoutput pairs; and (3) interpreting the target program as a humanreadable explanation. We demonstrate the application of our method by generating explanations of an artificial neural network trained to follow simple traffic rules in a hypothetical selfdriving car simulation. We conclude with a discussion on the scalability and usability of our approach and its potential applications to explanationcritical scenarios. 
Rich Component Analysis (RCA) 
In many settings, we have multiple data sets (also called views) that capture different and overlapping aspects of the same phenomenon. We are often interested in finding patterns that are unique to one or to a subset of the views. For example, we might have one set of molecular observations and one set of physiological observations on the same group of individuals, and we want to quantify molecular patterns that are uncorrelated with physiology. Despite being a common problem, this is highly challenging when the correlations come from complex distributions. In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, highdimensional distribution. We introduce algorithms based on cumulant extraction that provably learn each of the components without having to model the other components. We show how to integrate RCA with stochastic gradient descent into a metaalgorithm for learning general models, and demonstrate substantial improvement in accuracy on several synthetic and real datasets in both supervised and unsupervised tasks. Our method makes it possible to learn latent variable models when we don’t have samples from the true model but only samples after complex perturbations. 
Ridge Polynomial Neural Network with ErrorOutput Feedback (RPNNEOF) 
Time series forecasting gets much attention due to its impact on many practical applications. Higherorder neural network with recurrent feedback is a powerful technique which used successfully for forecasting. It maintains fast learning and the ability to learn the dynamics of the series over time. For that, in this paper, we propose a novel model which is called Ridge Polynomial Neural Network with ErrorOutput Feedbacks (RPNNEOFs) that combines the properties of higher order and erroroutput feedbacks. The wellknown MackeyGlass time series is used to test the forecasting capability of RPNNEOFS. Simulation results showed that the proposed RPNNEOFs provides better understanding for the MackeyGlass time series with root mean square error equal to 0.00416. This result is smaller than other models in the literature. Therefore, we can conclude that the RPNNEOFs can be applied successfully for time series forecasting. 
Ridge Regression  Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of illposed problems. In statistics, the method is known as ridge regression, and, with multiple independent discoveries, it is also variously known as the TikhonovMiller method, the PhillipsTwomey method, the constrained linear inversion method, and the method of linear regularization. It is related to the LevenbergMarquardt algorithm for nonlinear leastsquares problems. bigRR 
Ridge Regularized Linear Models (RRLM) 
Ridge regularized linear models (RRLMs), such as ridge regression and the SVM, are a popular group of methods that are used in conjunction with coefficient hypothesis testing to discover explanatory variables with a significant multivariate association to a response. 
Ridgeline Plot  Ridgeline plots provide a convenient way of visualizing changes in distributions over time or space. ggridges 
Ridit Analysis  Fleiss (1981, ISBN:0471064289) ridittools 
RiemannTheta Boltzmann Machine  A general Boltzmann machine with continuous visible and discrete integer valued hidden states is introduced. Under mild assumptions about the connection matrices, the probability density function of the visible units can be solved for analytically, yielding a novel parametric density function involving a ratio of RiemannTheta functions. The conditional expectation of a hidden state for given visible states can also be calculated analytically, yielding a derivative of the logarithmic RiemannTheta function. The conditional expectation can be used as activation function in a feedforward neural network, thereby increasing the modelling capacity of the network. Both the Boltzmann machine and the derived feedforward neural network can be successfully trained via standard gradient and nongradientbased optimization techniques. 
RInClose_CVC2  RInClose_CVC is an efficient (take polynomial time per bicluster), complete (find all maximal biclusters), correct (all biclusters attend the userdefined level of consistency) and nonredundant (all the obtained biclusters are maximal and the same bicluster is not enumerated more than once) enumerative algorithm for mining maximal biclusters with constant values on columns in numerical datasets. Despite RInClose_CVC has all these outstanding properties, it has a high computational cost in terms of memory usage because it must keep a symbol table in memory to prevent a maximal bicluster to be found more than once. In this paper, we propose a new version of RInClose_CVC, named RInClose_CVC2, that does not use a symbol table to prevent redundant biclusters, and keeps all these four properties. We also prove that these algorithms actually possess these properties. Experiments are carried out with synthetic and realworld datasets to compare RInClose_CVC and RInClose_CVC2 in terms of memory usage and runtime. The experimental results show that RInClose_CVC2 brings a large reduction in memory usage and, in average, significant runtime gain when compared to its predecessor. 
Ripple Network  To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embeddingbased and pathbased methods for knowledgegraphaware recommendation, we propose Ripple Network, an endtoend framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user’s potential interests along links in the knowledge graph. The multiple ‘ripples’ activated by a user’s historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on realworld datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several stateoftheart baselines. 
RISE Analysis  Described in Bodily, Nyland, and Wiley (2017) <doi:10.19173/irrodl.v18i2.2952>. Automates the process of identifying learning materials that are not effectively supporting student learning in technologymediated courses by synthesizing information about access to course content and performance on assessments. The RISE (Resource Inspection, Selection, and Enhancement) Framework is a framework supporting the continuous improvement of open educational resources (OER). The framework is an automated process that identifies learning resources that should be evaluated and either eliminated or improved. This is particularly useful in OER contexts where the copyright permissions of resources allow for remixing, editing, and improving content. The RISE Framework presents a scatterplot with resource usage on the xaxis and grade on the assessments associated with that resource on the yaxis. This scatterplot is broken down into four different quadrants (the mean of each variable being the origin) to find resources that are candidates for improvement. Resources that reside deep within their respective quadrant (farthest from the origin) should be further analyzed for continuous course improvement. We present a case study applying our framework with an Introduction to Business course. Aggregate resource use data was collected from Google Analytics and aggregate assessment data was collected from an online assessment system. Using the RISE Framework, we successfully identified resources, time periods, and modules in the course that should be further evaluated for improvement. rise 
RiskAverse Imitation Learning (RAIL) 
Imitation learning algorithms learn viable policies by imitating an expert’s behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a stateoftheart algorithm for learning policies when the expert’s behavior is available as a fixed set of trajectories. We evaluate in terms of the expert’s cost function and observe that the distribution of trajectorycosts is often more heavytailed for GAILagents than the expert at a number of benchmark continuouscontrol tasks. Thus, highcost trajectories, corresponding to tailend events of catastrophic failure, are more likely to be encountered by the GAILagents than the expert. This makes the reliability of GAILagents questionable when it comes to deployment in safetycritical applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tailend events by minimizing tailrisk within the GAIL framework. We quantify tailrisk by the ConditionalValueatRisk (CVaR) of trajectories and develop the RiskAverse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tailend risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in safetycritical applications. 
RiskAverse Robust Adversarial Reinforcement Learning (RARARL) 
Deep reinforcement learning has recently made significant progress in solving computer games and robotic control tasks. A known problem, though, is that policies overfit to the training environment and may not avoid rare, catastrophic events such as automotive accidents. A classical technique for improving the robustness of reinforcement learning algorithms is to train on a set of randomized environments, but this approach only guards against common situations. Recently, robust adversarial reinforcement learning (RARL) was developed, which allows efficient applications of random and systematic perturbations by a trained adversary. A limitation of RARL is that only the expected control objective is optimized; there is no explicit modeling or optimization of risk. Thus the agents do not consider the probability of catastrophic events (i.e., those inducing abnormally large negative reward), except through their effect on the expected objective. In this paper we introduce riskaverse robust adversarial reinforcement learning (RARARL), using a riskaverse protagonist and a riskseeking adversary. We test our approach on a selfdriving vehicle controller. We use an ensemble of policy networks to model risk as the variance of value functions. We show through experiments that a riskaverse agent is better equipped to handle a riskseeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary. 
RiskAverse TreeSearch (RATS) 
This work tackles the problem of robust zeroshot planning in nonstationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider ModelBased Reinforcement Learning algorithms in this setting. We make two hypotheses: 1) the environment evolves continuously and its evolution rate is bounded, 2) a current model is known at each decision epoch but not its evolution. Our contribution can be presented in four points. First, we define this specific class of MDPs that we call NonStationary MDPs (NSMDPs). We introduce the notion of regular evolution by making an hypothesis of LipschitzContinuity on the transition and reward functions w.r.t. time. Secondly, we consider a planning agent using the current model of the environment, but unaware of its future evolution. This leads us to consider a worstcase method where the environment is seen as an adversarial agent. Third, following this approach, we propose the RiskAverse TreeSearch (RATS) algorithm. This is a zeroshot ModelBased method similar to Minimax search. Finally, we illustrate the benefits brought by RATS empirically and compare its performance with reference ModelBased algorithms. 
RiskConstrained RL Framework  The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a longrun objective such as the infinitehorizon discounted or longrun average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., meanvariance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinitehorizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the riskconstrained RL framework, cover popular risk measures based on variance, conditional valueatrisk and cumulative prospect theory, and present a template for a risksensitive RL algorithm. We survey some of our recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This nonexhaustive survey is aimed at giving a flavor of the challenges involved in solving a risksensitive RL problem, and outlining some potential future research directions. 
RiskSensitive GAIL (RSGAIL) 
We study risksensitive imitation learning where the agent’s goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risksensitive imitation learning setting. We consider the generative adversarial approach to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call risksensitive GAIL (RSGAIL). We then derive two different versions of our RSGAIL optimization problem that aim at matching the risk profiles of the agent and the expert w.r.t. JensenShannon (JS) divergence and Wasserstein distance, and develop risksensitive generative adversarial imitation learning algorithms based on these optimization problems. We evaluate the performance of our JSbased algorithm and compare it with GAIL and the riskaverse imitation learning (RAIL) algorithm in two MuJoCo tasks. 
Ristretto  Ristretto, a fast and automated framework for CNN approximation. Ristretto simulates the hardware arithmetic of a custom hardware accelerator. The framework reduces the bitwidth of network parameters and outputs of resourceintense layers, which reduces the chip area for multiplication units significantly. Alternatively, Ristretto can remove the need for multipliers altogether, resulting in an adderonly arithmetic. The tool finetunes trimmed networks to achieve high classification accuracy. Since training of deep neural networks can be timeconsuming, Ristretto uses highly optimized routines which run on the GPU. This enables fast compression of any given network. Given a maximum tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8bit. The code for Ristretto is available. 
River Definition Language (RDL) 
The primary goal with the River development model (and language – RDL) is to significantly improve the development experience of business software applications. This includes writing the application code, testing it and evolving/maintaining it over time. We also target various development scenarios with a range of variables: ondemand/on premise, oneoff projects vs. products, mobile/web, extensions of core, etc. 
RLGANNet  We present RLGANNet, where a reinforcement learning (RL) agent provides fast and robust control of a generative adversarial network (GAN). Our framework is applied to point cloud shape completion that converts noisy, partial point cloud data into a highfidelity completed shape by controlling the GAN. While a GAN is unstable and hard to train, we circumvent the problem by (1) training the GAN on the latent space representation whose dimension is reduced compared to the raw point cloud input and (2) using an RL agent to find the correct input to the GAN to generate the latent space representation of the shape that best fits the current input of incomplete point cloud. The suggested pipeline robustly completes point cloud with large missing regions. To the best of our knowledge, this is the first attempt to train an RL agent to control the GAN, which effectively learns the highly nonlinear mapping from the input noise of the GAN to the latent space of point cloud. The RL agent replaces the need for complex optimization and consequently makes our technique real time. Additionally, we demonstrate that our pipelines can be used to enhance the classification accuracy of point cloud with missing data. 
RLgraph  Reinforcement learning (RL) tasks are challenging to implement, execute and test due to algorithmic instability, hyperparameter sensitivity, and heterogeneous distributed communication patterns. We argue for the separation of logical component composition, backend graph definition, and distributed execution. To this end, we introduce RLgraph, a library for designing and executing high performance RL computation graphs in both static graph and definebyrun paradigms. The resulting implementations yield high performance across different deep learning frameworks and distributed backends. 
RLLChatbot  Current conversational systems can follow simple commands and answer basic questions, but they have difficulty maintaining coherent and openended conversations about specific topics. Competitions like the Conversational Intelligence (ConvAI) challenge are being organized to push the research development towards that goal. This article presents in detail the RLLChatbot that participated in the 2017 ConvAI challenge. The goal of this research is to better understand how current deep learning and reinforcement learning tools can be used to build a robust yet flexible open domain conversational agent. We provide a thorough description of how a dialog system can be built and trained from mostly publicdomain datasets using an ensemble model. The first contribution of this work is a detailed description and analysis of different text generation models in addition to novel message ranking and selection methods. Moreover, a new opensource conversational dataset is presented. Training on this data significantly improves the Recall@k score of the ranking and selection mechanisms compared to our baseline model responsible for selecting the message returned at each interaction. 
RMSProp  RMSProp is an adaptative learning rate method. Divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight. This is the minibatch version of just using the sign of the gradient. http://…/lecture_slides_lec6.pdf https://…/neuralnets http://…/rmsprop.html#tieleman2012rmsprop 
RMSProp+AF  Source localization is of pivotal importance in several areas such as wireless sensor networks and Internet of Things (IoT), where the location information can be used for a variety of purposes, e.g. surveillance, monitoring, tracking, etc. Time Difference of Arrival (TDOA) is one of the wellknown localization approaches where the source broadcasts a signal and a number of receivers record the arriving time of the transmitted signal. By means of computing the time difference from various receivers, the source location can be estimated. On the other hand, in the recent few years novel optimization algorithms have appeared in the literature for $(i)$ processing big data and for $(ii)$ training deep neural networks. Most of these techniques are enhanced variants of the classical stochastic gradient descent (SGD) but with additional features that promote faster convergence. In this paper, we compare the performance of the classical SGD with the novel techniques mentioned above. In addition, we propose an optimization procedure called RMSProp+AF, which is based on RMSProp algorithm but with the advantage of incorporating adaptation of the decaying factor. We show through simulations that all of these techniques—which are commonly used in the machine learning domain—can also be successfully applied to signal processing problems and are capable of attaining improved convergence and stability. Finally, it is also shown through simulations that the proposed method can outperform other competing approaches as both its convergence and stability are superior. 
RNNSecureNet  Recurrent neural network (RNN) is an effective neural network in solving very complex supervised and unsupervised tasks. There has been a significant improvement in RNN field such as natural language processing, speech processing, computer vision and other multiple domains. This paper deals with RNN application on different use cases like Incident Detection, Fraud Detection, and Android Malware Classification. The best performing neural network architecture is chosen by conducting different chain of experiments for different network parameters and structures. The network is run up to 1000 epochs with learning rate set in the range of 0.01 to 0.5.Obviously, RNN performed very well when compared to classical machine learning algorithms. This is mainly possible because RNNs implicitly extracts the underlying features and also identifies the characteristics of the data. This helps to achieve better accuracy. 
RobbinsMonro Algorithm  The RobbinsMonro algorithm, introduced in 1951 by Herbert Robbins and Sutton Monro, presented a methodology for solving a root finding problem, where the function is represented as an expected value. Assume that we have a function M(x), and a constant \alpha, such that the equation M(x) = \alpha has a unique root at x=\theta. It is assumed that while we cannot directly observe the function M(x), we can instead obtain measurements of the random variable N(x) where \mathbb E[N(x)] = M(x). 
Robinsonian Matrix  A Robinson (dis)similarity matrix is a symmetric matrix whose entries (increase) decrease monotonically along rows and columns when moving away from the diagonal, and such matrices arise in the classical seriation problem. 
Robotic Process Automation (RPA) 
Robotic process automation (or RPA) is an emerging form of clerical process automation technology based on the notion of software robots or artificial intelligence (AI) workers. A software ‘robot’ is a software application that replicates the actions of a human being interacting with the user interface of a computer system. For example, the execution of data entry into an ERP system – or indeed a full endtoend business process – would be a typical activity for a software robot. The software robot operates on the user interface (UI) in the same way that a human would; this is a significant departure from traditional forms of IT integration which have historically been based on Application Programming Interfaces (or APIs) – that is to say, machinetomachine forms of communication based on data layers which operate at an architectural layer beneath the UI. 
Robotic Processing Automation (RPA) 
➘ “Robotic Process Automation” 
RoboTurk  Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification. However, research in this area has been limited to modestsized datasets due to the difficulty of collecting large quantities of task demonstrations through existing mechanisms. This work introduces RoboTurk to address this challenge. RoboTurk is a crowdsourcing platform for high quality 6DoF trajectory based teleoperation through the use of widely available mobile devices (e.g. iPhone). We evaluate RoboTurk on three manipulation tasks of varying timescales (15120s) and observe that our user interface is statistically similar to special purpose hardware such as virtual reality controllers in terms of task completion times. Furthermore, we observe that poor network conditions, such as low bandwidth and high delay links, do not substantially affect the remote users’ ability to perform task demonstrations successfully on RoboTurk. Lastly, we demonstrate the efficacy of RoboTurk through the collection of a pilot dataset; using RoboTurk, we collected 137.5 hours of manipulation data from remote workers, amounting to over 2200 successful task demonstrations in 22 hours of total system usage. We show that the data obtained through RoboTurk enables policy learning on multistep manipulation tasks with sparse rewards and that using larger quantities of demonstrations during policy learning provides benefits in terms of both learning consistency and final performance. For additional results, videos, and to download our pilot dataset, visit $\href{http://…/}{\texttt{roboturk.stanford.edu}}$ 
Robust Accelerated Gradient  We study the tradeoff between rate of convergence and robustness to gradient errors in designing a firstorder algorithm. In particular, we focus on gradient descent (GD) and Nesterov’s accelerated gradient (AG) method for strongly convex quadratic objectives when the gradient has random errors in the form of additive white noise. To characterize robustness, we consider the asymptotic normalized variance of the centered iterate sequence which measures the asymptotic accuracy of the iterates. Using tools from robust control theory, we develop a tractable algorithm that allows us to set the parameters of each algorithm to achieve a particular tradeoff between these two performance objectives. Our results show that there is a fundamental lower bound on the robustness level of an algorithm for any achievable rate. For the same achievable rate, we show that AG with tuned parameters is always more robust than GD to gradient errors. Similarly, for the same robustness level, we show that AG can be tuned to be always faster than GD. Our results show that AG can achieve acceleration while being more robust to random gradient errors. This behavior is quite different than previously reported in the deterministic gradient noise setting. 
Robust Adversarial Perturbation (RAP) 
Adversarial noises are useful tools to probe the weakness of deep learning based computer vision algorithms. In this paper, we describe a robust adversarial perturbation (RAP) method to attack deep proposalbased object detectors and instance segmentation algorithms. Our method focuses on attacking the common component in these algorithms, namely Region Proposal Network (RPN), to universally degrade their performance in a blackbox fashion. To do so, we design a loss function that combines a label loss and a novel shape loss, and optimize it with respect to image using a gradient based iterative algorithm. Evaluations are performed on the MS COCO 2014 dataset for the adversarial attacking of 6 stateoftheart object detectors and 2 instance segmentation algorithms. Experimental results demonstrate the efficacy of the proposed method. 
Robust Adversarial Reinforcement Learning (RARL) 
Deep neural networks coupled with fast simulation and improved computation have led to recent successes in the field of reinforcement learning (RL). However, most current RLbased approaches fail to generalize since: (a) the gap between simulation and real world is so large that policylearning approaches fail to transfer; (b) even if policy learning is done in real world, the data scarcity leads to failed generalization from training to test scenarios (e.g., due to different friction or object masses). Inspired from Hinfinity control methods, we note that both modeling errors and differences in training and test scenarios can be viewed as extra forces/disturbances in the system. This paper proposes the idea of robust adversarial reinforcement learning (RARL), where we train an agent to operate in the presence of a destabilizing adversary that applies disturbance forces to the system. The jointly trained adversary is reinforced — that is, it learns an optimal destabilization policy. We formulate the policy learning as a zerosum, minimax objective function. Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah, Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a) improves training stability; (b) is robust to differences in training/test conditions; and c) outperform the baseline even in the absence of the adversary. 
Robust Anomaly Detection (RAD) 
Outlier detection can be a pain point for all data driven companies, especially as data volumes grow. At Netflix we have multiple datasets growing by 10B+ record/day and so there’s a need for automated anomaly detection tools ensuring data quality and identifying suspicious anomalies. Today we are opensourcing our outlier detection function, called Robust Anomaly Detection (RAD), as part of our Surus project. As we built RAD we identified four generic challenges that are ubiquitous in outlier detection on “big data.” · High cardinality dimensions: High cardinality data sets – especially those with large combinatorial permutations of column groupings – makes human inspection impractical. · Minimizing False Positives: A successful anomaly detection tool must minimize false positives. In our experience there are many alerting platforms that “sound an alarm” that goes ultimately unresolved. The goal is to create alerting mechanisms that can be tuned to appropriately balance noise and information. · Seasonality: Hourly/Weekly/Biweekly/Monthly seasonal effects are common and can be misidentified as outliers deserving attention if not handled properly. Seasonal variability needs to be ignored. · Data is not always normally distributed: This has been a particular challenge since Netflix has been growing over the last 24 months. Generally though, an outlier tool must be robust so that it works on data that is not normally distributed. In addition to addressing the challenges above, we wanted a solution with a generic interface (supporting application development). We met these objectives with a novel algorithm encased in a wrapper for easy deployment in our ETL environment. 
Robust Compound Regression (RCR) 
The errorsinvariables (EIV) regression model, being more realistic by accounting for measurement errors in both the dependent and the independent variables, is widely adopted in applied sciences. The traditional EIV model estimators, however, can be highly biased by outliers and other departures from the underlying assumptions. In this paper, we develop a novel nonparametric regression approach – the robust compound regression (RCR) analysis method for the robust estimation of EIV models. We first introduce a robust and efficient estimator called least sine squares (LSS). Taking full advantage of both the new LSS method and the compound regression analysis method developed in our own group, we subsequently propose the RCR approach as a generalization of those two, which provides a robust counterpart of the entire class of the maximum likelihood estimation (MLE) solutions of the EIV model, in a 11 mapping. Technically, our approach gives users the flexibility to select from a class of RCR estimates the optimal one with a predefined regression efficiency criterion satisfied. Simulation studies and reallife examples are provided to illustrate the effectiveness of the RCR approach. 
Robust Conditional GAN (RCGAN) 
We study the problem of learning conditional generators from noisy labeled samples, where the labels are corrupted by random noise. A standard training of conditional GANs will not only produce samples with wrong labels, but also generate poor quality samples. We consider two scenarios, depending on whether the noise model is known or not. When the distribution of the noise is known, we introduce a novel architecture which we call Robust Conditional GAN (RCGAN). The main idea is to corrupt the label of the generated sample before feeding to the adversarial discriminator, forcing the generator to produce samples with clean labels. This approach of passing through a matching noisy channel is justified by corresponding multiplicative approximation bounds between the loss of the RCGAN and the distance between the clean real distribution and the generator distribution. This shows that the proposed approach is robust, when used with a carefully chosen discriminator architecture, known as projection discriminator. When the distribution of the noise is not known, we provide an extension of our architecture, which we call RCGANU, that learns the noise model simultaneously while training the generator. We show experimentally on MNIST and CIFAR10 datasets that both the approaches consistently improve upon baseline approaches, and RCGANU closely matches the performance of RCGAN. 
Robust Conditional Generative Adversarial Network (RoCGAN) 
Conditional generative adversarial networks (cGAN) have led to large improvements in the task of conditional image generation, which lies at the heart of computer vision. The major focus so far has been on performance improvement, while there has been little effort in making cGAN more robust to noise or leveraging structure in the output space of the model. The endtoend regression (of the generator) might lead to arbitrarily large errors in the output, which is unsuitable for the application of such networks to realworld systems. In this work, we introduce a novel conditional GAN, called RoCGAN, which adds implicit constraints to address the issue. Our proposed model augments the generator with an unsupervised pathway, which encourages the outputs of the generator to span the target manifold even in the presence of large amounts of noise. We prove that RoCGAN shares similar theoretical properties as GAN and experimentally verify that the proposed model outperforms existing stateoftheart cGAN architectures by a large margin in a variety of domains including images from natural scenes and faces. 
Robust Continuous CoClustering (ROCCO) 
Clustering consists of grouping together samples giving their similar properties. The problem of modeling simultaneously groups of samples and features is known as CoClustering. This paper introduces ROCCO – a Robust Continuous CoClustering algorithm. ROCCO is a scalable, hyperparameterfree, easy and ready to use algorithm to address CoClustering problems in practice over massive crossdomain datasets. It operates by learning a graphbased twosided representation of the input matrix. The underlying proposed optimization problem is nonconvex, which assures a flexible pool of solutions. Moreover, we prove that it can be solved with a near linear time complexity on the input size. An exhaustive largescale experimental testbed conducted with both synthetic and realworld datasets demonstrates ROCCO’s properties in practice: (i) Stateoftheart performance in crossdomain realworld problems including Biomedicine and Text Mining; (ii) very low sensitivity to hyperparameter settings; (iii) robustness to noise and (iv) a linear empirical scalability in practice. These results highlight ROCCO as a powerful generalpurpose coclustering algorithm for crossdomain practitioners, regardless of their technical background. 
Robust Decision Making (RDM) 
Robust decisionmaking is an iterative decision analytic framework that helps identify potential robust strategies, characterize the vulnerabilities of such strategies, and evaluate the tradeoffs among them. RDM focuses on informing decisions under conditions of what is called ‘deep uncertainty,’ that is, conditions where the parties to a decision do not know or do not agree on the system model(s) relating actions to consequences or the prior probability distributions for the key input parameters to those model(s). 
Robust Dynamic Programming  This paper presents a new theory, known as robust dynamic programming, for a class of continuoustime dynamical systems. Different from traditional dynamic programming (DP) methods, this new theory serves as a fundamental tool to analyze the robustness of DP algorithms, and in particular, to develop novel adaptive optimal control and reinforcement learning methods. In order to demonstrate the potential of this new framework, four illustrative applications in the fields of stochastic optimal control and adaptive DP are presented. Three numerical examples arising from both finance and engineering industries are also given, along with several possible extensions of the proposed framework. 
Robust Elastic Net (REN) 
We construct rich vector spaces of continuous functions with prescribed curved or linear pathwise quadratic variations. We also construct a class of functions whose quadratic variation may depend in a local and nonlinear way on the function value. These functions can then be used as integrators in F\’ollmer’s pathwise It\=o calculus. Our construction of the latter class of functions relies on an extension of the Doss–Sussman method to a class of nonlinear It\=o differential equations for the F\’ollmer integral. As an application, we provide a deterministic variant of the support theorem for diffusions. We also establish that many of the constructed functions are nowhere differentiable. 
Robust Frequent Directions (RFD) 
The frequent directions (FD) technique is a deterministic approach for online sketching that has many applications in machine learning. The conventional FD is a heuristic procedure that often outputs rank deficient matrices. To overcome the rank deficiency problem, we propose a new sketching strategy called robust frequent directions (RFD) by introducing a regularization term. RFD can be derived from an optimization problem. It updates the sketch matrix and the regularization term adaptively and jointly. RFD reduces the approximation error of FD without increasing the computational cost. We also apply RFD to online learning and propose an effective hyperparameterfree online Newton algorithm. We derive a regret bound for our online Newton algorithm based on RFD, which guarantees the robustness of the algorithm. The experimental studies demonstrate that the proposed method outperforms stateoftheart second order online learning algorithms. 
Robust Graphical Lasso (RGLasso) 
Anomalies and outliers are common in realworld data, and they can arise from many sources, such as sensor faults. Accordingly, anomaly detection is important both for analyzing the anomalies themselves and for cleaning the data for further analysis of its ambient structure. Nonetheless, a precise definition of anomalies is important for automated detection and herein we approach such problems from the perspective of detecting sparse latent effects embedded in large collections of noisy data. Standard Graphical Lassobased techniques can identify the conditional dependency structure of a collection of random variables based on their sample covariance matrix. However, classic Graphical Lasso is sensitive to outliers in the sample covariance matrix. In particular, several outliers in a sample covariance matrix can destroy the sparsity of its inverse. Accordingly, we propose a novel optimization problem that is similar in spirit to Robust Principal Component Analysis (RPCA) and splits the sample covariance matrix $M$ into two parts, $M=F+S$, where $F$ is the cleaned sample covariance whose inverse is sparse and computable by Graphical Lasso, and $S$ contains the outliers in $M$. We accomplish this decomposition by adding an additional $ \ell_1$ penalty to classic Graphical Lasso, and name it ‘Robust Graphical Lasso (Rglasso)’. Moreover, we propose an Alternating Direction Method of Multipliers (ADMM) solution to the optimization problem which scales to large numbers of unknowns. We evaluate our algorithm on both real and synthetic datasets, obtaining interpretable results and outperforming the standard robust Minimum Covariance Determinant (MCD) method and Robust Principal Component Analysis (RPCA) regarding both accuracy and speed. 
Robust Intelligence (RI) 

Robust Kernel Principal Component Analysis (RKPCA) 
We propose a novel method called robust kernel principal component analysis (RKPCA) to decompose a partially corrupted matrix as a sparse matrix plus a high or fullrank matrix whose columns are drawn from a nonlinear lowdimensional latent variable model. RKPCA can be applied to many problems such as noise removal and subspace clustering and is so far the only unsupervised nonlinear method robust to sparse noises. We also provide theoretical guarantees for RKPCA. The optimization of RKPCA is challenging because it involves nonconvex and indifferentiable problems simultaneously. We propose two nonconvex optimization algorithms for RKPCA: alternating direction method of multipliers with backtracking line search and proximal linearized minimization with adaptive step size. Comparative studies on synthetic data and nature images corroborate the effectiveness and superiority of RKPCA in noise removal and robust subspace clustering. 
Robust Matrix Elastic net Based Canonical Correlation Analysis (RMENCCA) 
This paper presents a robust matrix elastic net based canonical correlation analysis (RMENCCA) for multiple view unsupervised learning problems, which emphasizes the combination of CCA and the robust matrix elastic net (RMEN) used as coupled feature selection. The RMENCCA leverages the strength of the RMEN to distill naturally meaningful features without any prior assumption and to measure effectively correlations between different ‘views’. We can further employ directly the kernel trick to extend the RMENCCA to the kernel scenario with theoretical guarantees, which takes advantage of the kernel trick for highly complicated nonlinear feature learning. Rather than simply incorporating existing regularization minimization terms into CCA, this paper provides a new learning paradigm for CCA and is the first to derive a coupled feature selection based CCA algorithm that guarantees convergence. More significantly, for CCA, the newlyderived RMENCCA bridges the gap between measurement of relevance and coupled feature selection. Moreover, it is nontrivial to tackle directly the RMENCCA by previous optimization approaches derived from its sophisticated model architecture. Therefore, this paper further offers a bridge between a new optimization problem and an existing efficient iterative approach. As a consequence, the RMENCCA can overcome the limitation of CCA and address largescale and streaming data problems. Experimental results on four popular competing datasets illustrate that the RMENCCA performs more effectively and efficiently than do stateoftheart approaches. 
Robust Mixture Discriminant Analysis (RMDA) 
robustDA 
Robust Model Predictive Control (MPC) 
In this paper, we present Robust Model Predictive Control (MPC) problems with \emph{adjustable uncertainty sets}. In contrast to standard Robust MPC problems with known uncertainty sets, we treat the uncertainty sets in our problems as additional decision variables. In particular, given a metric for adjusting the uncertainty sets, we address the question of determining the optimal size and shape of those uncertainty sets, while ensuring robust constraint satisfaction. The focus of this paper is on ensuring constraint satisfaction over an \emph{infinite horizon}, also known as persistent feasibility. We show that, similar as in standard Robust MPC, persistent feasibility can be guaranteed if the terminal set is an invariant set with respect to both the state of the system \emph{and} the adjustable uncertainty set. We also present an algorithm for computing such invariant sets, and illustrate the effectiveness of our approach in a cooperative adaptive cruise control application. 
Robust Multiple Signal Classification (MUSIC) 
In this paper, we introduce a new framework for robust multiple signal classification (MUSIC). The proposed framework, called robust measuretransformed (MT) MUSIC, is based on applying a transform to the probability distribution of the received signals, i.e., transformation of the probability measure defined on the observation space. In robust MTMUSIC, the sample covariance is replaced by the empirical MTcovariance. By judicious choice of the transform we show that: 1) the resulting empirical MTcovariance is Brobust, with bounded influence function that takes negligible values for large norm outliers, and 2) under the assumption of spherically contoured noise distribution, the noise subspace can be determined from the eigendecomposition of the MTcovariance. Furthermore, we derive a new robust measuretransformed minimum description length (MDL) criterion for estimating the number of signals, and extend the MTMUSIC framework to the case of coherent signals. The proposed approach is illustrated in simulation examples that show its advantages as compared to other robust MUSIC and MDL generalizations. 
Robust Optimization  Robust optimization is a field of optimization theory that deals with optimization problems in which a certain measure of robustness is sought against uncertainty that can be represented as deterministic variability in the value of the parameters of the problem itself and/or its solution. 
Robust Options Deep Q Network  Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses valuebased methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty. We utilize ROPI to learn robust options with the Robust Options Deep Q Network (RODQN) that solves multiple tasks and mitigates model misspecification due to model uncertainty. We present experimental results which suggest that policy iteration with linear features may have an inherent form of robustness when using coarse feature representations. In addition, we present experimental results which demonstrate that robustness helps policy iteration implemented on top of deep neural networks to generalize over a much broader range of dynamics than nonrobust policy iteration. 
Robust Options Policy Iteration  Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses valuebased methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty. We utilize ROPI to learn robust options with the Robust Options Deep Q Network (RODQN) that solves multiple tasks and mitigates model misspecification due to model uncertainty. We present experimental results which suggest that policy iteration with linear features may have an inherent form of robustness when using coarse feature representations. In addition, we present experimental results which demonstrate that robustness helps policy iteration implemented on top of deep neural networks to generalize over a much broader range of dynamics than nonrobust policy iteration. 
Robust Principal Component Analysis (RPCA) 
Robust Principal Component Analysis (RPCA) is a modification of the widely used statistical procedure Principal component analysis (PCA) which works well with respect to grossly corrupted observations. A number of different approaches exist for Robust PCA, including an idealized version of Robust PCA, which aims to recover a lowrank matrix L0 from highly corrupted measurements M = L0 +S0. This decomposition in lowrank and sparse matrices can be achieved by techniques such as Principal Component Pursuit method (PCP), Stable PCP, Quantized PCP , Block based PCP, and Local PCP. Then, optimization methods are used such as the Augmented Lagrange Multiplier Method (ALM), Alternating Direction Method (ADM), Fast Alternating Minimization (FAM) or Iteratively Reweighted Least Squares. Bouwmans and Zahzah have made a complete survey in 2014. 
Robust Principal Component Analysis (ROBPCA) 
We introduce a new method for robust principal component analysis (PCA). Classical PCA is based on the empirical covariance matrix of the data and hence is highly sensitive to outlying observations. Two robust approaches have been developed to date. The first approach is based on the eigenvectors of a robust scatter matrix such as the minimum covariance determinant or an Sestimator and is limited to relatively lowdimensional data. The second approach is based on projection pursuit and can handle highdimensional data. Here we propose the ROBPCA approach, which combines projection pursuit ideas with robust scatter matrix estimation. ROBPCA yields more accurate estimates at noncontaminated datasets and more robust estimates at contaminated data. ROBPCA can be computed rapidly, and is able to detect exactfit situations. As a byproduct, ROBPCA produces a diagnostic plot that displays and classifies the outliers. We apply the algorithm to several datasets from chemometrics and engineering. 
Robust Regression  In robust statistics, robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and nonparametric methods. Regression analysis seeks to find the relationship between one or more independent variables and a dependent variable. Certain widely used methods of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results if those assumptions are not true; thus ordinary least squares is said to be not robust to violations of its assumptions. Robust regression methods are designed to be not overly affected by violations of assumptions by the underlying datagenerating process. In particular, least squares estimates for regression models are highly sensitive to (not robust against) outliers. While there is no precise definition of an outlier, outliers are observations which do not follow the pattern of the other observations. This is not normally a problem if the outlier is simply an extreme observation drawn from the tail of a normal distribution, but if the outlier results from nonnormal measurement error or some other violation of standard ordinary least squares assumptions, then it compromises the validity of the regression results if a nonrobust regression technique is used. 
RObust regression algorithm via Online Feature Selection (RoOFS) 
The presence of data corruption in usergenerated streaming data, such as social media, motivates a new fundamental problem that learns reliable regression coefficient when features are not accessible entirely at one time. Until now, several important challenges still cannot be handled concurrently: 1) corrupted data estimation when only partial features are accessible; 2) online feature selection when data contains adversarial corruption; and 3) scaling to a massive dataset. This paper proposes a novel RObust regression algorithm via Online Feature Selection (\textit{RoOFS}) that concurrently addresses all the above challenges. Specifically, the algorithm iteratively updates the regression coefficients and the uncorrupted set via a robust online feature substitution method. We also prove that our algorithm has a restricted error bound compared to the optimal solution. Extensive empirical experiments in both synthetic and realworld datasets demonstrated that the effectiveness of our new method is superior to that of existing methods in the recovery of both feature selection and regression coefficients, with very competitive efficiency. 
Robust Regression Extended with Ensemble Loss Function (RELF) 
Ensemble techniques are powerful approaches that combine several weak learners to build a stronger one. As a metalearning framework, ensemble techniques can easily be applied to many machine learning methods. Inspired by ensemble techniques, in this paper we propose an ensemble loss functions applied to a simple regressor. We then propose a halfquadratic learning algorithm in order to find the parameter of the regressor and the optimal weights associated with each loss function. Moreover, we show that our proposed loss function is robust in noisy environments. For a particular class of loss functions, we show that our proposed ensemble loss function is Bayes consistent and robust. Experimental evaluations on several datasets demonstrate that our proposed ensemble loss function significantly improves the performance of a simple regressor in comparison with stateoftheart methods. 
Robust Representation Learning  Book: Robust Representation for Data Analytics 
Robust Sparse Principal Component Analysis (ROSPCA) 
A new sparse PCA algorithm is presented, which is robust against outliers. The approach is based on the ROBPCA algorithm that generates robust but nonsparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately finding the sparse structure of datasets, even when challenging outliers are present. In comparison with a projection pursuitbased algorithm, ROSPCA demonstrates superior robustness properties and comparable sparsity estimation capability, as well as significantly faster computation time. rospca 
Robust Statistics  Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distributions. For example, robust methods work well for mixtures of two normal distributions with different standarddeviations, for example, one and three; under this model, nonrobust methods like a ttest work badly. 
Robust Stochastic Optimization (DRO) 
A common goal in statistics and machine learning is to learn models that can perform well against distributional shifts, such as latent heterogeneous subpopulations, unknown covariate shifts, or unmodeled temporal effects. We develop and analyze a distributionally robust stochastic optimization (DRO) framework that learns a model that provides good performance against perturbations to the datagenerating distribution. We give a convex optimization formulation for the problem, providing several convergence guarantees. We prove finitesample minimax upper and lower bounds, showing that distributinoal robustness sometimes comes at a cost in convergence rates. We give limit theorems for the learned parameters, where we fully specify the limiting distribution so that confidence intervals can be computed. On real tasks including generalizing to unknown subpopulations, finegrained recognition, and providing good tail performance, the distributionally robust approach often exhibits improved performance. 
Robust Student Network Learning  Deep neural networks bring in impressive accuracy in various applications, but the success often relies on the heavy network architecture. Taking welltrained heavy networks as teachers, classical teacherstudent learning paradigm aims to learn a student network that is lightweight yet accurate. In this way, a portable student network with significantly fewer parameters can achieve a considerable accuracy which is comparable to that of teacher network. However, beyond accuracy, robustness of the learned student network against perturbation is also essential for practical uses. Existing teacherstudent learning frameworks mainly focus on accuracy and compression ratios, but ignore the robustness. In this paper, we make the student network produce more confident predictions with the help of the teacher network, and analyze the lower bound of the perturbation that will destroy the confidence of the student network. Two important objectives regarding prediction scores and gradients of examples are developed to maximize this lower bound, so as to enhance the robustness of the student network without sacrificing the performance. Experiments on benchmark datasets demonstrate the efficiency of the proposed approach to learn robust student networks which have satisfying accuracy and compact sizes. 
Robust Subspace Recovery Layer (RSR Layer) 
We propose a neural network for unsupervised anomaly detection with a novel robust subspace recovery layer (RSR layer). This layer seeks to extract the underlying subspace from a latent representation of the given data and remove outliers that lie away from this subspace. It is used together with an encoder and a decoder. The encoder maps the data into the latent space, from which the RSR layer extracts the subspace. The decoder then smoothly maps back the underlying subspace to a “manifold’ close to the original data. We illustrate algorithmic choices and performance for artificial data with corrupted manifold structure. We also demonstrate competitive precision and recall for image datasets. 
Robust Trimmed Clustering (RTC) 
tclust 
Robust Variable Power Fractional LMS Algorithm (RVPFLMS) 
In this paper, we propose an adaptive framework for the variable power of the fractional least mean square (FLMS) algorithm. The proposed algorithm named as robust variable power FLMS (RVPFLMS) dynamically adapts the fractional power of the FLMS to achieve high convergence rate with low steady state error. For the evaluation purpose, the problems of system identification and channel equalization are considered. The experiments clearly show that the proposed approach achieves better convergence rate and lower steadystate error compared to the FLMS. The MATLAB code for the related simulation is available online at https://goo.gl/dGTGmP. 
Robust Variable Step Size – Fractional Least Mean Square (RVSSFLMS) 
In this paper, we propose an adaptive framework for the variable step size of the fractional least mean square (FLMS) algorithm. The proposed algorithm named the robust variable step sizeFLMS (RVSSFLMS), dynamically updates the step size of the FLMS to achieve high convergence rate with low steady state error. For the evaluation purpose, the problem of system identification is considered. The experiments clearly show that the proposed approach achieves better convergence rate compared to the FLMS and adaptive stepsize modified FLMS (AMFLMS). 
Robust Variational Autoencoder  Machine learning methods often need a large amount of labeled training data. Since the training data is assumed to be the ground truth, outliers can severely degrade learned representations and performance of trained models. Here we apply concepts from robust statistics to derive a novel variational autoencoder that is robust to outliers in the training data. Variational autoencoders (VAEs) extract a lower dimensional encoded feature representation from which we can generate new data samples. Robustness of autoencoders to outliers is critical for generating a reliable representation of particular data types in the encoded space when using corrupted training data. Our robust VAE is based on betadivergence rather than the standard KullbackLeibler (KL) divergence. Our proposed model has the same computational complexity as the VAE, and contains a single tuning parameter to control the degree of robustness. We demonstrate performance of the betadivergence based autoencoder for a range of image data types, showing improved robustness to outliers both qualitatively and quantitatively. We also illustrate the use of the robust VAE for outlier detection. 
Robustness  In computer science, robustness is the ability of a computer system to cope with errors during execution. Robustness can also be defined as the ability of an algorithm to continue operating despite abnormalities in input, calculations, etc. Robustness can encompass many areas of computer science, such as robust programming, robust machine learning, and Robust Security Network. Formal techniques, such as fuzz testing, are essential to showing robustness since this type of testing involves invalid or unexpected inputs. Alternatively, fault injection can be used to test robustness. Various commercial products perform robustness testing of software systems, and is a process of failure assessment analysis. 
RobustSTL  Decomposing complex time series into trend, seasonality, and remainder components is an important task to facilitate time series anomaly detection and forecasting. Although numerous methods have been proposed, there are still many time series characteristics exhibiting in realworld data which are not addressed properly, including 1) ability to handle seasonality fluctuation and shift, and abrupt change in trend and reminder; 2) robustness on data with anomalies; 3) applicability on time series with long seasonality period. In the paper, we propose a novel and generic time series decomposition algorithm to address these challenges. Specifically, we extract the trend component robustly by solving a regression problem using the least absolute deviations loss with sparse regularization. Based on the extracted trend, we apply the the nonlocal seasonal filtering to extract the seasonality component. This process is repeated until accurate decomposition is obtained. Experiments on different synthetic and realworld time series datasets demonstrate that our method outperforms existing solutions. 
Rocker Project  Docker Containers for the R Environment. 
Rodeo  Rodeo is a data centric IDE for Python. You can think of it as an alternative UI to the notebook for the IPython Kernel. It’s heavily inspired by great projects like Sublime Text and Eclipse. http://…/introducingrodeo.html 
ROI regularization (ROIreg) 
We propose ROI regularization (ROIreg) as a semisupervised learning method for image classification. ROIreg focuses on the maximum probability of a posterior probability distribution g(x) obtained when inputting an unlabeled data sample x into a convolutional neural network (CNN). ROIreg divides the pixel set of x into multiple blocks and evaluates, for each block, its contribution to the maximum probability. A masked data sample x_ROI is generated by replacing blocks with relatively small degrees of contribution with random images. Then, ROIreg trains CNN so that g(x_ROI ) does not change as much as possible from g(x). Therefore, ROIreg can be said to refine the classification ability of CNN more. On the other hand, Virtual Adverserial Training (VAT), which is an excellent semisupervised learning method, generates data sample x_VAT by perturbing x in the direction in which g(x) changes most. Then, VAT trains CNN so that g(x_VAT ) does not change from g(x) as much as possible. Therefore, VAT can be said to be a method to improve CNN’s weakness. Thus, ROIreg and VAT have complementary training effects. In fact, the combination of VAT and ROIreg improves the results obtained when using VAT or ROIreg alone. This combination also improves the stateoftheart on ‘SVHN with and without data augmentation’ and ‘CIFAR10 without data augmentation’. We also propose a method called ROI augmentation (ROIaug) as a method to apply ROIreg to data augmentation in supervised learning. However, the evaluation function used there is different from the standard crossentropy. ROIaug improves the performance of supervised learning for both SVHN and CIFAR10. Finally, we investigate the performance degradation of VAT and VAT+ROIreg when data samples not belonging to classification classes are included in unlabeled data. 
RoleRelevance Algorithm  Personalized search provides a potentially powerful tool, however, it is limited due to the large number of roles that a person has: parent, employee, consumer, etc. We present the rolerelevance algorithm: a search technique that favors search results relevant to the user’s current role. The rolerelevance algorithm uses three factors to score documents: (1) the number of keywords each document contains; (2) each document’s geographic relevance to the user’s role (if applicable); and (3) each document’s topical relevance to the user’s role (if applicable). Topical relevance is assessed using a novel extension to Latent Dirichlet Allocation (LDA) that allows standard LDA to score document relevance to userdefined topics. Overall results on a prelabeled corpus show an average improvement in search precision of approximately 20% compared to keyword search alone. 
Rolling Entry Matching  rollmatch 
Rolling Forecast  With a rolling forecast the number of periods in the forecast remain constant so that if for example the periods of your forecast are monthly for 12 months then as each month is traded it drops out of the forecast and another month is added onto the end of the forecast so you are always forecasting 12 monthly periods out into the future. 
RONA  The soaring demand for intelligent mobile applications calls for deploying powerful deep neural networks (DNNs) on mobile devices. However, the outstanding performance of DNNs notoriously relies on increasingly complex models, which in turn is associated with an increase in computational expense far surpassing mobile devices’ capacity. What is worse, app service providers need to collect and utilize a large volume of users’ data, which contain sensitive information, to build the sophisticated DNN models. Directly deploying these models on public mobile devices presents prohibitive privacy risk. To benefit from the ondevice deep learning without the capacity and privacy concerns, we design a private model compression framework RONA. Following the knowledge distillation paradigm, we jointly use hint learning, distillation learning, and self learning to train a compact and fast neural network. The knowledge distilled from the cumbersome model is adaptively bounded and carefully perturbed to enforce differential privacy. We further propose an elegant query sample selection method to reduce the number of queries and control the privacy loss. A series of empirical evaluations as well as the implementation on an Android mobile device show that RONA can not only compress cumbersome models efficiently but also provide a strong privacy guarantee. For example, on SVHN, when a meaningful $(9.83,10^{6})$differential privacy is guaranteed, the compact model trained by RONA can obtain 20$\times$ compression ratio and 19$\times$ speedup with merely 0.97% accuracy loss. 
Root Cause Analysis (RCA) 
RCA practice solve problems by attempting to identify and correct the root causes of events, as opposed to simply addressing their symptoms. Focusing correction on root causes has the goal of preventing problem recurrence. RCFA (Root Cause Failure Analysis) recognizes that complete prevention of recurrence by one corrective action is not always possible. Conversely, there may be several effective measures (methods) that address the root causes of a problem. Thus, RCA is an iterative process and a tool of continuous improvement. RCA is typically used as a reactive method of identifying event(s) causes, revealing problems and solving them. Analysis is done after an event has occurred. Insights in RCA may make it useful as a preemptive method. In that event, RCA can be used to forecast or predict probable events even before they occur. While one follows the other, RCA is a completely separate process to Incident Management. 
Root Cause Analysis Solver Engine (RCASE) 
Root Cause Analysis Solver Engine (informally RCASE) is a proprietary algorithm developed from research originally at the Warwick Manufacturing Group (WMG) at Warwick University. RCASE development commenced in 2003 to provide an automated version of root cause analysis, the method of problem solving that tries to identify the root causes of faults or problems. 
Root Mean Square Error (RMSE) 
Taking the square root of MSE yields the rootmeansquare error or rootmeansquare deviation (RMSE or RMSD), which has the same units as the quantity being estimated; for an unbiased estimator, the RMSE is the square root of the variance, known as the standard deviation. 
Root Mean Squared Logarithmic Error (RMSLE) 
The evaluation metric that Kaggle uses to rank competing algorithms is the Root Mean Squared Logarithmic Error (RMSLE). 
Rooted Tree  A rooted tree is a tree in which one vertex has been designated the root. The edges of a rooted tree can be assigned a natural orientation, either away from or towards the root, in which case the structure becomes a directed rooted tree. 
RoPAD  For enterprise, personal and societal applications, there is now an increasing demand for automated authentication of identity from images using computer vision. However, current authentication technologies are still vulnerable to presentation attacks. We present RoPAD, an endtoend deep learning model for presentation attack detection that employs unsupervised adversarial invariance to ignore visual distractors in images for increased robustness and reduced overfitting. Experiments show that the proposed framework exhibits stateoftheart performance on presentation attack detection on several benchmark datasets. 
RORPack  This document contains the mathematical introduction to RORPack – a Python software library for robust output tracking and disturbance rejection for linear PDE systems. The RORPack library is opensource and freely available at https://…/rorpack The package contains functionality for automated construction of robust internal model based controllers, simulation of the controlled systems, visualisation of the results, as well as a collection of example cases on robust output regulation of controlled heat and wave equations. 
Rosette  Rosette is an API for multilingual text analysis and information extraction. rosetteApi 
Rosner’s Outlier Test  This test will detect outliers that are either much smaller or much larger than the rest of the data. Rosner’s approach is designed to avoid the problem of masking, where an outlier that is close in value to another outlier can go undetected. Rosner’s test is appropriate only when the data, excluding the suspected outliers, are approximately normally distributed, and when the sample size is greater than or equal to 25. Data should not be excluded from analysis solely on the basis of the results of this or any other statistical test. If any values are flagged as possible outliers, further investigation is recommended to determine whether there is a plausible explanation that justifies removing or replacing them. 
RotatE  We study the problem of learning representations of entities and relations in knowledge graphs for predicting missing links. The success of such a task heavily relies on the ability of modeling and inferring the patterns of (or between) the relations. In this paper, we present a new approach for knowledge graph embedding called RotatE, which is able to model and infer various relation patterns including: symmetry/antisymmetry, inversion, and composition. Specifically, the RotatE model defines each relation as a rotation from the source entity to the target entity in the complex vector space. In addition, we propose a novel selfadversarial negative sampling technique for efficiently and effectively training the RotatE model. Experimental results on multiple benchmark knowledge graphs show that the proposed RotatE model is not only scalable, but also able to infer and model various relation patterns and significantly outperform existing stateoftheart models for link prediction. 
Rotated Feature Network (RFN) 
General detectors follow the pipeline that feature maps extracted from ConvNets are shared between classification and regression tasks. However, there exists obvious conflicting requirements in multiorientation object detection that classification is insensitive to orientations, while regression is quite sensitive. To address this issue, we provide an EncoderDecoder architecture, called Rotated Feature Network (RFN), which produces rotationsensitive feature maps (RS) for regression and rotationinvariant feature maps (RI) for classification. Specifically, the Encoder unit assigns weights for rotated feature maps. The Decoder unit extracts RS and RI by performing resuming operator on rotated and reweighed feature maps, respectively. To make the rotationinvariant characteristics more reliable, we adopt a metric to quantitatively evaluate the rotationinvariance by adding a constrain item in the loss, yielding a promising detection performance. Compared with the stateoftheart methods, our method can achieve significant improvement on NWPU VHR10 and RSOD datasets. We further evaluate the RFN on the scene classification in remote sensing images and object detection in natural images, demonstrating its good generalization ability. The proposed RFN can be integrated into an existing framework, leading to great performance with only a slight increase in model complexity. 
Rotated KleeMinty Problem  A Linear Constrained Optimization Benchmark For Probabilistic Search Algorithms: The Rotated KleeMinty Problem 
Rotation Equivariant Vector Field Networks  We propose a method to encode rotation equivariance or invariance into convolutional neural networks (CNNs). Each convolutional filter is applied with several orientations and returns a vector field that represents the magnitude and angle of the highest scoring rotation at the given spatial location. To propagate information about the main orientation of the different features to each layer in the network, we propose an enriched orientation pooling, i.e. max and argmax operators over the orientation space, allowing to keep the dimensionality of the feature maps low and to propagate only useful information. We name this approach RotEqNet. We apply RotEqNet to three datasets: first, a rotation invariant classification problem, the MNISTrot benchmark, in which we improve over the stateoftheart results. Then, a neuron membrane segmentation benchmark, where we show that RotEqNet can be applied successfully to obtain equivariance to rotation with a simple fully convolutional architecture. Finally, we improve significantly the stateoftheart on the problem of estimating cars’ absolute orientation in aerial images, a problem where the output is required to be covariant with respect to the object’s orientation. 
Rotation Forest  Rotation forest is an ensemble method where each base classifier (tree) is fit on the principal components of the variables of random partitions of the feature set. A method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and principal component analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name ‘forest’. Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the rotation forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with bagging, AdaBoost, and random forest. The results were favorable to rotation forest and prompted an investigation into diversityaccuracy landscape of the ensemble models. Diversityerror diagrams revealed that rotation forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and random forest, and more diverse than these in bagging, sometimes more accurate as well. http://…/01677518.pdf?arnumber=1677518 http://…/Rotation%20Forest.ppt http://…/9K_ANovel.pdf Rotation Forest rotationForest 
Rotation Invariance Neural Network  Rotation invariance and translation invariance have great values in image recognition tasks. In this paper, we bring a new architecture in convolutional neural network (CNN) named cyclic convolutional layer to achieve rotation invariance in 2D symbol recognition. We can also get the position and orientation of the 2D symbol by the network to achieve detection purpose for multiple nonoverlap target. Last but not least, this architecture can achieve oneshot learning in some cases using those invariance. 
Rough Concept Analysis  The theory introduced, presented and developed in this paper, is concerned with Rough Concept Analysis. This theory is a synthesis of the theory of Rough Sets pioneered by Zdzislaw Pawlak with the theory of Formal Concept Analysis pioneered by Rudolf Wille. The central notion in this paper of a rough formal concept combines in a natural fashion the notion of a rough set with the notion of a formal concept: ‘rough set + formal concept = rough formal concept’. A followup paper will provide a synthesis of the two important data modeling techniques: conceptual scaling of Formal Concept Analysis and EntityRelationship database modeling. 
Rough Formal Concept  ➘ “Rough Concept Analysis” 
Rough Inclusion Function (RIF) 

Rough Set  In computer science, a rough set, first described by Polish computer scientist Zdzislaw I. Pawlak, is a formal approximation of a crisp set (i.e., conventional set) in terms of a pair of sets which give the lower and the upper approximation of the original set. In the standard version of rough set theory (Pawlak 1991), the lower and upperapproximation sets are crisp sets, but in other variations, the approximating sets may be fuzzy sets. RoughSetKnowledgeReduction 
Route Constrained Optimization (RCO) 
Distillationbased learning boosts the performance of the miniaturized neural network based on the hypothesis that the representation of a teacher model can be used as structured and relatively weak supervision, and thus would be easily learned by a miniaturized model. However, we find that the representation of a converged heavy model is still a strong constraint for training a small student model, which leads to a high lower bound of congruence loss. In this work, inspired by curriculum learning we consider the knowledge distillation from the perspective of curriculum learning by routing. Instead of supervising the student model with a converged teacher model, we supervised it with some anchor points selected from the route in parameter space that the teacher model passed by, as we called route constrained optimization (RCO). We experimentally demonstrate this simple operation greatly reduces the lower bound of congruence loss for knowledge distillation, hint and mimicking learning. On closeset classification tasks like CIFAR100 and ImageNet, RCO improves knowledge distillation by 2.14% and 1.5% respectively. For the sake of evaluating the generalization, we also test RCO on the openset face recognition task MegaFace. 
Row Space Pursuit (RSP) 
Over the past several decades, subspace clustering has been receiving increasing interest and continuous progress. However, due to the lack of scalability and/or robustness, existing methods still have difficulty in dealing with the data that possesses simultaneously three characteristics: highdimensional, massive and grossly corrupted. To tackle the scalability and robustness issues simultaneously, in this paper we suggest to consider a problem called compressive robust subspace clustering, which is to perform robust subspace clustering with the compressed data, and which is generated by projecting the original highdimensional data onto a lowerdimensional subspace chosen at random. Given these random projections, the proposed method, row space pursuit (RSP), recovers not only the authentic row space, which provably leads to correct clustering results under certain conditions, but also the gross errors possibly existing in data. The compressive nature of the random projections gives our RSP high computational and storage efficiency, and the recovery property enables the ability for RSP to deal with the grossly corrupted data. Extensive experiments on highdimensional and/or largescale datasets show that RSP can maintain comparable accuracies to to prevalent methods with significant reductions in the computational time. 
ROWL  In our experience, some ontology users find it much easier to convey logical statements using rules rather than OWL (or description logic) axioms. Based on recent theoretical developments on transformations between rules and description logics, we develop ROWL, a Protege plugin that allows users to enter OWL axioms by way of rules; the plugin then automatically converts these rules into OWL DL axioms if possible, and prompts the user in case such a conversion is not possible without weakening the semantics of the rule. 
ROWLTab  It has been argued that it is much easier to convey logical statements using rules rather than OWL (or description logic (DL)) axioms. Based on recent theoretical developments on transformations between rules and DLs, we have developed ROWLTab, a Protege plugin that allows users to enter OWL axioms by way of rules; the plugin then automatically converts these rules into OWL 2 DL axioms if possible, and prompts the user in case such a conversion is not possible without weakening the semantics of the rule. In this paper, we present ROWLTab, together with a user evaluation of its effectiveness compared to entering axioms using the standard Protege interface. Our evaluation shows that modeling with ROWLTab is much quicker than the standard interface, while at the same time, also less prone to errors for hard modeling tasks. 
RQDA  RDQA is a R package for Qualitative Data Analysis, a free (free as freedom) qualitative analysis software application (BSD license). It works on Windows, Linux/FreeBSD and the Mac OSX platforms. RQDA is an easy to use tool to assist in the analysis of textual data. At the moment it only supports plain text formatted data. All the information is stored in a SQLite database via the R package of RSQLite. The GUI is based on RGtk2, via the aid of gWidgetsRGtk2. It includes a number of standard ComputerAided Qualitative Data Analysis features. In addition it seamlessly integrates with R, which means that a) statistical analysis on the coding is possible, and b) functions for data manipulation and analysis can be easily extended by writing R functions. To some extent, RQDA and R make an integrated platform for both quantitative and qualitative data analysis. 
RRDtool  RRDtool (roundrobin database tool) aims to handle time series data such as network bandwidth, temperatures or CPU load. The data is stored in a circular buffer based database, thus the system storage footprint remains constant over time. RRDtool 
RSquared Value (RSQ) 
RSquared Value (RSQ), described in the paper: Jialiang Li (2013) <doi:10.1093/biostatistics/kxs047>. mcca 
RStudio Connect  RStudio Connect is a new publishing platform for the work your teams create in R. Share Shiny applications, R Markdown reports, dashboards, plots, and more in one convenient place. Use pushbutton publishing from the RStudio IDE, scheduled execution of reports, and flexible security policies to bring the power of data science to your entire enterprise. 
RTree  Rtrees are tree data structures used for spatial access methods, i.e., for indexing multidimensional information such as geographical coordinates, rectangles or polygons. The Rtree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts. A common realworld usage for an Rtree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as ‘Find all museums within 2 km of my current location’, ‘retrieve all road segments within 2 km of my location’ (to display them in a navigation system) or ‘find the nearest gas station’ (although not taking roads into account). The Rtree can also accelerate nearest neighbor search for various distance metrics, including greatcircle distance. 
Rubin Causal Model (RCM) 
The Rubin causal model (RCM), also known as the NeymanRubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name ‘Rubin causal model’ was first coined by Rubin’s graduate school colleague, Paul W. Holland. The potential outcomes framework was first proposed by Jerzy Neyman in his 1923 Master’s thesis, though he discussed it only in the context of completely randomized experiments. Rubin, together with other contemporary statisticians, extended it into a general framework for thinking about causation in both observational and experimental studies. 
RubnerTavan Network  
Rucio  Rucio is an open source software framework that provides scientific collaborations with the functionality to organize, manage, and access their volumes of data. The data can be distributed across heterogeneous data centers at widely distributed locations. Rucio has been originally developed to meet the requirements of the highenergy physics experiment ATLAS, and is continuously extended to support the LHC experiments and other diverse scientific communities. In this article we detail the fundamental concepts of Rucio, describe the architecture along with implementation details, and give operational experience from production usage. 
Rufus  Rufus turns your website into a customer conversation. We use a proprietary blend of curated language sets and state of the art machine learning to engage customers more intelligently than ever. Nobody understands your customers like you do, that’s why we work directly with you to create a custom solution that fits your needs. Let Rufus engage your passing web visitors and turn them into highly qualified leads. Rufus can also handle your tier 1 and 2 support cases, and ties into your existing customer support tracking software. Think of Rufus as a customer relationship manager that can chat with 1000 people at a time. It will know every single aspect or detail about your product or service and deliver it’s messaging in a fun and extremely engaging manner. With custom language sets and purposeful personas, that match your brand and your targeted audience – Rufus will truly become your company’s new best friend 
Rug Plot  A rug plot is a compact way of illustrating the marginal distributions of a variable along x and y. Positions of the data points along x and y are denoted by tick marks, reminiscent of the tassels on a rug. Known Issues: Rug marks are overlaid onto the same axis as the original data. Changing the axis dimensions after calling rug will therefore cause the tick marks to become disassociated from the axes. http://…skerneldensityestimationandrugplots 
Rule Induction  Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. The rules extracted may represent a full scientific model of the data, or merely represent local patterns in the data. 
Rule of Five (RO5) 
There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population. 
RuleEmbedded Neural Network (ReNN) 
The artificial neural network shows powerful ability of inference, but it is still criticized for lack of interpretability and prerequisite needs of big dataset. This paper proposes the Ruleembedded Neural Network (ReNN) to overcome the shortages. ReNN first makes localbased inferences to detect local patterns, and then uses rules based on domain knowledge about the local patterns to generate rulemodulated map. After that, ReNN makes globalbased inferences that synthesizes the local patterns and the rulemodulated map. To solve the optimization problem caused by rules, we use a twostage optimization strategy to train the ReNN model. By introducing rules into ReNN, we can strengthen traditional neural networks with longterm dependencies which are difficult to learn with limited empirical dataset, thus improving inference accuracy. The complexity of neural networks can be reduced since longterm dependencies are not modeled with neural connections, and thus the amount of data needed to optimize the neural networks can be reduced. Besides, inferences from ReNN can be analyzed with both local patterns and rules, and thus have better interpretability. In this paper, ReNN has been validated with a timeseries detection problem. 
RuleFit  The RuleFit algorithm from Friedman and Propescu is an interesting regression and classification approach that uses decision rules in a linear model. RuleFit: When disassembled trees meet Lasso Rule based Learning Ensembles 
RuleGuided Embedding (RUGE) 
Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a onetime injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose RuleGuided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over stateoftheart baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly beneficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https://…/RUGE. 
RuleMatrix  With the growing adoption of machine learning techniques, there is a surge of research interest towards making machine learning systems more transparent and interpretable. Various visualizations have been developed to help model developers understand, diagnose, and refine machine learning models. However, a large number of potential but neglected users are the domain experts with little knowledge of machine learning but are expected to work with machine learning systems. In this paper, we present an interactive visualization technique to help users with little expertise in machine learning to understand, explore and validate predictive models. By viewing the model as a black box, we extract a standardized rulebased knowledge representation from its inputoutput behavior. We design RuleMatrix, a matrixbased visualization of rules to help users navigate and verify the rules and the blackbox model. We evaluate the effectiveness of RuleMatrix via two use cases and a usability study. 
Rulex  Rulex is a new kind of AI platform, born from advanced government and academic machine learning research, and proven by years of deployment in diverse industries. Rulex´s unique logicbased approach to predictive analytics enables business and process experts to rapidly create and deploy AI applications with no need for math or programming skills. 
RungeKutta Convolutional Neural Network (RKNet) 
A convolutional neural network for image classification can be constructed following some mathematical ways since it models the ventral stream in visual cortex which is regarded as a multiperiod dynamical system. In this paper, a new point of view is proposed for constructing network models as well as providing a direction to get inspiration or explanation for neural network. If each period in ventral stream was deemed to be a dynamical system with time as the independent variable, there should be a set of ordinary differential equations (ODEs) for this system. RungeKutta methods are common means to solve ODE. Thus, network model ought to be built using these methods. Moreover, convolutional networks could be employed to emulate the increments within every timestep. The model constructed in the above way is named RungeKutta Convolutional Neural Network (RKNet). According to this idea, Dense Convolutional Networks (DenseNets) and Residual Networks (ResNets) were varied to RKNets. To prove the feasibility of RKNets, these variants were verified on benchmark datasets, CIFAR and ImageNet. The experimental results show that the RKNets transformed from DenseNets gained similar or even higher parameter efficiency. The success of the experiments denotes that RungeKutta methods can be utilized to construct convolutional neural networks for image classification efficiently. Furthermore, the network models might be structured more rationally in the future basing on RKNet and priori knowledge. 
Runtime Neuron Activation Pattern Monitoring  For using neural networks in safety critical domains, it is important to know if a decision made by a neural network is supported by prior similarities in training. We propose runtime neuron activation pattern monitoring – after the standard training process, one creates a monitor by feeding the training data to the network again in order to store the neuron activation patterns in abstract form. In operation, a classification decision over an input is further supplemented by examining if a pattern similar (measured by Hamming distance) to the generated pattern is contained in the monitor. If the monitor does not contain any pattern similar to the generated pattern, it raises a warning that the decision is not based on the training data. Our experiments show that, by adjusting the similaritythreshold for activation patterns, the monitors can report a significant portion of misclassfications to be not supported by training with a small falsepositive rate, when evaluated on a test set. 
Runtime Verification (RV) 
Runtime verification is a computing system analysis and execution approach based on extracting information from a running system and using it to detect and possibly react to observed behaviors satisfying or violating certain properties. Some very particular properties, such as datarace and deadlock freedom, are typically desired to be satisfied by all systems and may be best implemented algorithmically. Other properties can be more conveniently captured as formal specifications. Runtime verification specifications are typically expressed in trace predicate formalisms, such as finite state machines, regular expressions, contextfree patterns, linear temporal logics, etc., or extensions of these. This allows for a less adhoc approach than normal testing. However, any mechanism for monitoring an executing system is considered runtime verification, including verifying against test oracles and reference implementations. When formal requirements specifications are provided, monitors are synthesized from them and infused within the system by means of instrumentation. Runtime verification can be used for many purposes, such as security or safety policy monitoring, debugging, testing, verification, validation, profiling, fault protection, behavior modification (e.g., recovery), etc. Runtime verification avoids the complexity of traditional formal verification techniques, such as model checking and theorem proving, by analyzing only one or a few execution traces and by working directly with the actual system, thus scaling up relatively well and giving more confidence in the results of the analysis (because it avoids the tedious and errorprone step of formally modelling the system), at the expense of less coverage. Moreover, through its reflective capabilities runtime verification can be made an integral part of the target system, monitoring and guiding its execution during deployment. 
Rupture Detection  There are some graphs that you cannot forget. One graph that I found puzzling was mentioned on Andrew Gelman’s blog, a few years back, and was related to rupture detection. What I remember from this graph is that if you want to get a rupture, you can easily find one… 
ruptures  ruptures is a Python library for offline change point detection. This package provides methods for the analysis and segmentation of nonstationary signals. Implemented algorithms include exact and approximate detection for various parametric and nonparametric models. ruptures focuses on ease of use by providing a welldocumented and consistent interface. In addition, thanks to its modular structure, different algorithms and models can be connected and extended within this package. 
RuSentRel  In this paper we present the RuSentRel corpus including analytical texts in the sphere of international relations. For each document we annotated sentiments from the author to mentioned named entities, and sentiments of relations between mentioned entities. In the current experiments, we considered the problem of extracting sentiment relations between entities for the whole documents as a threeclass machine learning task. We experimented with conventional machinelearning methods (Naive Bayes, SVM, Random Forest). 
RusselRao Distance  RussellRao dissimilarity between two Boolean vectors 
Ruuh  Dialogue systems and conversational agents are becoming increasingly popular in the modern society but building an agent capable of holding intelligent conversation with its users is a challenging problem for artificial intelligence. In this demo, we demonstrate a deep learning based conversational social agent called ‘Ruuh’ (facebook.com/Ruuh) designed by a team at Microsoft India to converse on a wide range of topics. Ruuh needs to think beyond the utilitarian notion of merely generating ‘relevant’ responses and meet a wider range of user social needs, like expressing happiness when user’s favorite team wins, sharing a cute comment on showing the pictures of the user’s pet and so on. The agent also needs to detect and respond to abusive language, sensitive topics and trolling behavior of the users. Many of these problems pose significant research challenges which will be demonstrated in our demo. Our agent has interacted with over 2 million real world users till date which has generated over 150 million user conversations. 
rValue  Given a large collection of measurement units, the rvalue, r, of a particular unit is a reported percentile that may be interpreted as the smallest percentile at which the unit should be placed in the top rfraction of units. rvalues 
Rysearch  In our work, we propose to represent HTM as a set of flat models, or layers, and a set of topical hierarchies, or edges. We suggest several quality measures for edges of hierarchical models, resembling those proposed for flat models. We conduct an assessment experimentation and show strong correlation between the proposed measures and human judgement on topical edge quality. We also introduce heterogeneous algorithm to build hierarchical topic models for heterogeneous data sources. We show how making certain adjustments to learning process helps to retain original structure of customized models while allowing for slight coherent modifications for new documents. We evaluate this approach using the proposed measures and show that the proposed heterogeneous algorithm significantly outperforms the baseline concat approach. Finally, we implement our own ESE called Rysearch, which demonstrates the potential of ARTM approach for visualizing large heterogeneous document collections. 