# WhatIs-R

 R R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R Consortium The R Consortium, Inc. is a group of businesses organized under an open source governance and foundation model to provide support to the R community, the R Foundation and groups and individuals, using, maintaining and distributing R software. The R language is an open source environment for statistical computing and graphics, and runs on a wide variety of computing platforms. The R language has enjoyed significant growth, and now supports over 2 million users. A broad range of industries have adopted the R language, including biotech, finance, research and high technology industries. The R language is often integrated into third party analysis, visualization and reporting applications. The central mission of the R Consortium is to work with and provide support to the R Foundation and to the key organizations developing, maintaining, distributing and using R software through the identification, development and implementation of infrastructure projects. From a governance perspective, the business of the consortium is managed by a Board of Directors. The technical aspects of the project, including the development and implementation of infrastructure projects, is overseen by an Infrastructure Steering Committee. While the initial members of the Infrastructure Steering Committee consist of representatives of the founding members of the R Consortium, Inc., project leads of key infrastructure projects will become voting members of the Infrastructure Steering Committee. Potential infrastructure projects include: · strengthening the R Forge infrastructure; · assisting the Stanford University group running user!R 2016; · developing documentation; and · encouraging increased communication and collaboration among users and developers of the R language. R Service Bus(RSB) Having the right algorithm is a first big step to get advanced analytics solve your problem and inform your decisions. The next one is to have the algorithm work for you and integrate it in your workflows and business processes. The R Service Bus is a swiss army knife that allows you to plug R into your processes independently of the technology used by other software applications involved in the workflow. The prime objective of the R Service Bus is to smoothly integrate into your existing infrastructure and it therefore supports communication using a plethora of protocols such as · SOAP and RESTful web services · various e-mail protocols · folder monitoring, (s)ftp · messaging protocols such a JMS or STOMP · … The R Service Bus is based on mature open source projects and was developed to maximize reliability, flexibility, high availability and scalability of R-based analytics applications. It is in use at major pharmaceutical and financial institutions to power business-critical modeling activities. The R Service Bus is open source and freely available from our downloads page. The R Service Bus has also been packaged for all current versions of Debian/Ubuntu and is available from our repository. R.NET R.NET enables the .NET Framework to interoperate with the R statistical language in the same process. R.NET requires .NET Framework 4 and the native R DLLs installed with the R environment. R.NET works on Windows, Linux and MacOS. Enjoy statistics and programming in your special language with R. R2CNN++ Object detection plays a vital role in natural scene and aerial scene and is full of challenges. Although many advanced algorithms have succeeded in the natural scene, the progress in the aerial scene has been slow due to the complexity of the aerial image and the large degree of freedom of remote sensing objects in scale, orientation, and density. In this paper, a novel multi-category rotation detector is proposed, which can efficiently detect small objects, arbitrary direction objects, and dense objects in complex remote sensing images. Specifically, the proposed model adopts a targeted feature fusion strategy called inception fusion network, which fully considers factors such as feature fusion, anchor sampling, and receptive field to improve the ability to handle small objects. Then we combine the pixel attention network and the channel attention network to weaken the noise information and highlight the objects feature. Finally, the rotational object detection algorithm is realized by redefining the rotating bounding box. Experiments on public datasets including DOTA, NWPU VHR-10 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods. The code and models will be available at https://…/R2CNN-Plus-Plus_Tensorflow. r2d3 The r2d3 package provides a suite of tools for using D3 visualizations with R, including: • Translating R objects into D3 friendly data structures • Rendering D3 scripts within the RStudio Viewer and R Notebooks • Publishing D3 visualizations to the web • Incorporating D3 scripts into R Markdown reports, presentations, and dashboards • Creating interactive D3 applications with Shiny • Distributing D3 based htmlwidgets in R packages Rabix An open-source toolkit for developing and running portable workflows based on the Common Workflow Language specification and Docker. liftr Race Track Concordance Charts One way to help keep track of things from the perspective of a particular driver, rather than the race leader, is to rebase the origin of the x-axis relative to the that driver. RadegastXDB A lot of advances in the processing of XML data have been proposed in the previous decade. There were many approaches focused on the efficient processing of twig pattern queries (TPQ). However, including the TPQ into an XQuery compiler is not a straightforward problem and current XML DBMSs process XQueries without any TPQ detection. In this paper, we demonstrate our prototype of a native XML DBMS called RadegastXDB that uses a TPQ detection to accelerate structural XQueries. Such a detection allows us to utilize state-of-the-art TPQ processing algorithms. Our experiments show that, for the structural queries, these algorithms and state-of-the-art XML indexing techniques make our prototype faster than all of the current XML DBMSs, especially for large data collections. We also show that using the same techniques is also efficient for the processing of queries with value predicates. Radial Basis Function(RBF) A radial basis function (RBF) is a real-valued function whose value depends only on the distance from the origin, so that Phi(x) = Phi(||x||); or alternatively on the distance from some other point c, called a center. Any function Phi that satisfies this property is a radial function. The norm is usually Euclidean distance, although other distance functions are also possible. For example, using Lukaszyk-Karmowski metric, it is possible for some radial functions to avoid problems with ill conditioning of the matrix solved to determine coefficients wi, since the ||x|| is always greater than zero. Sums of radial basis functions are typically used to approximate given functions. This approximation process can also be interpreted as a simple kind of neural network; this was the context in which they were originally invented, by David Broomhead and David Lowe in 1988. RBFs are also used as a kernel in support vector classification. Radial Basis Function Kernel(RBF) In machine learning, the (Gaussian) radial basis function kernel, or RBF kernel, is a popular kernel function used in support vector machine classification. Radial Basis Function Networks(RBF) In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the Royal Signals and Radar Establishment. RadialGAN Training complex machine learning models for prediction often requires a large amount of data that is not always readily available. Leveraging these external datasets from related but different sources is therefore an important task if good predictive models are to be built for deployment in settings where data can be rare. In this paper we propose a novel approach to the problem in which we use multiple GAN architectures to learn to translate from one dataset to another, thereby allowing us to effectively enlarge the target dataset, and therefore learn better predictive models than if we simply used the target dataset. We show the utility of such an approach, demonstrating that our method improves the prediction performance on the target domain over using just the target dataset and also show that our framework outperforms several other benchmarks on a collection of real-world medical datasets. RadiX-Net The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity of hardware to store and train them. Research over the past few decades has explored the prospect of sparsifying DNNs before, during, and after training by pruning edges from the underlying topology. The resulting neural network is known as a sparse neural network. More recent work has demonstrated the remarkable result that certain sparse DNNs can train to the same precision as dense DNNs at lower runtime and storage cost. An intriguing class of these sparse DNNs is the X-Nets, which are initialized and trained upon a sparse topology with neither reference to a parent dense DNN nor subsequent pruning. We present an algorithm that deterministically generates RadiX-Nets: sparse DNN topologies that, as a whole, are much more diverse than X-Net topologies, while preserving X-Nets’ desired characteristics. We further present a functional-analytic conjecture based on the longstanding observation that sparse neural network topologies can attain the same expressive power as dense counterparts RadViz3D This paper develops methodology for 3D radial visualization of high-dimensional datasets. Our display engine is called RadViz3D and extends the classic RadViz that visualizes multivariate data in the 2D plane by mapping every record to a point inside the unit circle. The classic RadViz display has equally-spaced anchor points on the unit circle, with each of them associated with an attribute or feature of the dataset. RadViz3D obtains equi-spaced anchor points exactly for the five Platonic solids and approximately for the other cases via a Fibonacci grid. We show that distributing anchor points at least approximately uniformly on the 3D unit sphere provides a better visualization than in 2D. We also propose a Max-Ratio Projection (MRP) method that utilizes the group information in high dimensions to provide distinctive lower-dimensional projections that are then displayed using Radviz3D. Our methodology is extended to datasets with discrete and mixed features where a generalized distributional transform is used in conjuction with copula models before applying MRP and RadViz3D visualization. Rafiki Big data analytics is gaining massive momentum in the last few years. Applying machine learning models to big data has become an implicit requirement or an expectation for most analysis tasks, especially on high-stakes applications.Typical applications include sentiment analysis against reviews for analyzing on-line products, image classification in food logging applications for monitoring user’s daily intake and stock movement prediction. Extending traditional database systems to support the above analysis is intriguing but challenging. First, it is almost impossible to implement all machine learning models in the database engines. Second, expertise knowledge is required to optimize the training and inference procedures in terms of efficiency and effectiveness, which imposes heavy burden on the system users. In this paper, we develop and present a system, called Rafiki, to provide the training and inference service of machine learning models, and facilitate complex analytics on top of cloud platforms. Rafiki provides distributed hyper-parameter tuning for the training service, and online ensemble modeling for the inference service which trades off between latency and accuracy. Experimental results confirm the efficiency, effectiveness, scalability and usability of Rafiki. Rainforest Plots Research has shown that forest plots are a gold standard in the visualization of meta-analytic results. However, research on the general interpretation of forest plots and the role of researchers’ meta-analysis experience and field of study is still unavailable. Additionally, the traditional display of effect sizes, confidence intervals, and weights have repeatedly been criticized. The current work presents an online statistical cognition experiment in which a total of 279 researchers with experience in meta-analysis from 36 countries evaluated conventional forest plots and two novel versions of forest plots, namely, thick forest plots and rainforest plots. The results indicate certain biases in the interpretation of forest plots, especially with regard to heterogeneity, the distribution of weights, and the theoretical concept of confidence intervals. Although the two novel displays (thick forest plots and rainforest plots) are associated with slightly longer viewing times, they are at least as well-suited and esthetically and perceptively pleasing as the conventional displays while facilitating the correct and exhaustive interpretation of the meta-analytic information. Furthermore, it is advisable to combine conventional forest plots with distribution information of the individual effects, make confidence lines more visually striking, and to display a background grid in the graph. metaviz Ramer-Douglas-Peucker Algorithm(RDP) The Ramer-Douglas-Peucker algorithm (RDP) is an algorithm for reducing the number of points in a curve that is approximated by a series of points. The initial form of the algorithm was independently suggested in 1972 by Urs Ramer and 1973 by David Douglas and Thomas Peucker and several others in the following decade. This algorithm is also known under the names Douglas-Peucker algorithm, iterative end-point fit algorithm and split-and-merge algorithm.The purpose of the algorithm is, given a curve composed of line segments, to find a similar curve with fewer points. The algorithm defines ‘dissimilar’ based on the maximum distance between the original curve and the simplified curve. The simplified curve consists of a subset of the points that defined the original curve. http://…/rdp RAMODO Learning expressive low-dimensional representations of ultrahigh-dimensional data, e.g., data with thousands/millions of features, has been a major way to enable learning methods to address the curse of dimensionality. However, existing unsupervised representation learning methods mainly focus on preserving the data regularity information and learning the representations independently of subsequent outlier detection methods, which can result in suboptimal and unstable performance of detecting irregularities (i.e., outliers). This paper introduces a ranking model-based framework, called RAMODO, to address this issue. RAMODO unifies representation learning and outlier detection to learn low-dimensional representations that are tailored for a state-of-the-art outlier detection approach – the random distance-based approach. This customized learning yields more optimal and stable representations for the targeted outlier detectors. Additionally, RAMODO can leverage little labeled data as prior knowledge to learn more expressive and application-relevant representations. We instantiate RAMODO to an efficient method called REPEN to demonstrate the performance of RAMODO. Extensive empirical results on eight real-world ultrahigh dimensional data sets show that REPEN (i) enables a random distance-based detector to obtain significantly better AUC performance and two orders of magnitude speedup; (ii) performs substantially better and more stably than four state-of-the-art representation learning methods; and (iii) leverages less than 1% labeled data to achieve up to 32% AUC improvement. Ramp-Based Twin Support Vector Clustering(RampTWSVC) Traditional plane-based clustering methods measure the cost of within-cluster and between-cluster by quadratic, linear or some other unbounded functions, which may amplify the impact of cost. This letter introduces a ramp cost function into the plane-based clustering to propose a new clustering method, called ramp-based twin support vector clustering (RampTWSVC). RampTWSVC is more robust because of its boundness, and thus it is more easier to find the intrinsic clusters than other plane-based clustering methods. The non-convex programming problem in RampTWSVC is solved efficiently through an alternating iteration algorithm, and its local solution can be obtained in a finite number of iterations theoretically. In addition, the nonlinear manifold-based formation of RampTWSVC is also proposed by kernel trick. Experimental results on several benchmark datasets show the better performance of our RampTWSVC compared with other plane-based clustering methods. Rand Index The Rand index or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. From a mathematical standpoint, Rand index is related to the accuracy, but is applicable even when class labels are not used. mri Rand-Interleaving Ranking functions return ranked lists of items, and users often interact with these items. How to evaluate ranking functions using historical interaction logs, also known as off-policy evaluation, is an important but challenging problem. The commonly used Inverse Propensity Scores (IPS) approaches work better for the single item case, but suffer from extremely low data efficiency for the ranked list case. In this paper, we study how to improve the data efficiency of IPS approaches in the offline comparison setting. We propose two approaches Trunc-match and Rand-interleaving for offline comparison using uniformly randomized data. We show that these methods can improve the data efficiency and also the comparison sensitivity based on one of the largest email search engines. Random Assignment Random assignment or random placement is an experimental technique for assigning subjects to different treatments (or no treatment). The thinking behind random assignment is that by randomizing treatment assignment, then the group attributes for the different treatments will be roughly equivalent and therefore any effect observed between treatment groups can be linked to the treatment effect and is not a characteristic of the individuals in the group. In experimental design, random assignment of participants in experiments or treatment and control groups help to ensure that any differences between and within the groups are not systematic at the outset of the experiment. Random assignment does not guarantee that the groups are “matched” or equivalent, only that any differences are due to chance. Random assignment facilitates comparison in experiments by creating similar groups. Example compares “Apple to Apple” and “Orange to Orange”. Random assignment Step 1: Begin with a collection of subjects. Example 20 people. Step 2: Devise a method to randomize that is purely mechanical ( e.g. flip a coin) Step 3: Assign subjects with “Heads” to one group : Control Group. Assign subjects with “Tails” to the other group: Experimental Group Random Average Shifted Histogram(RASH) A new density estimator called RASH, for Random Average Shifted Histogram, obtained by averaging several histograms as proposed in Average Shifted Histograms, is presented. The principal difference between the two methods is that in RASH each histogram is built over a grid with random shifted breakpoints. The asymptotic behavior of this estimator is established and its performance through several simulations is analyzed. RASH is compared to several classic density estimators and to some recent ensemble methods. Although RASH does not always outperform the other methods, it is very simple to implement, being also more intuitive. Random Boost(RB) Inspired by theoretical readings on randomization techniques in boosting, I developed a new algorithm, that I called Random Boost (RB). In its essence, Random Boost sequentially grows regression trees with random depth. More precisely, the algorithm is almost identical to and has the exact same input arguments as MART. The only difference is the parameter d_{max}. In MART, d_{max} determines the maximum depth of all trees in the ensemble. In Random Boost, the argument constitutes the upper bound of possible tree sizes. In each boosting iteration i, a random number d_i between 1 and d_{max} is drawn, which then defines the maximum depth of that tree T_i(d_i). Random Conditional Distribution The need to condition distributional properties such as expectation, variance, and entropy arises in algorithmic fairness, model simplification, robustness and many other areas. At face value however, distributional properties are not random variables, and hence conditioning them is a semantic error and type error in probabilistic programming languages. On the other hand, distributional properties are contingent on other variables in the model, change in value when we observe more information, and hence in a precise sense are random variables too. In order to capture the uncertain over distributional properties, we introduce a probability construct — the random conditional distribution — and incorporate it into a probabilistic programming language Omega. A random conditional distribution is a higher-order random variable whose realizations are themselves conditional random variables. In Omega we extend distributional properties of random variables to random conditional distributions, such that for example while the expectation a real valued random variable is a real value, the expectation of a random conditional distribution is a distribution over expectations. As a consequence, it requires minimal syntax to encode inference problems over distributional properties, which so far have evaded treatment within probabilistic programming systems and probabilistic modeling in general. We demonstrate our approach case studies in algorithmic fairness and robustness. Random Connectivity LSTM Time series prediction can be generalized as a process that extracts useful information from historical records and then determines future values. Learning long-range dependencies that are embedded in time series is often an obstacle for most algorithms, whereas Long Short-Term Memory (LSTM) solutions, as a specific kind of scheme in deep learning, promise to effectively overcome the problem. In this article, we first give a brief introduction to the structure and forward propagation mechanism of the LSTM model. Then, aiming at reducing the considerable computing cost of LSTM, we put forward the Random Connectivity LSTM (RCLSTM) model and test it by predicting traffic and user mobility in telecommunication networks. Compared to LSTM, RCLSTM is formed via stochastic connectivity between neurons, which achieves a significant breakthrough in the architecture formation of neural networks. In this way, the RCLSTM model exhibits a certain level of sparsity, which leads to an appealing decrease in the computational complexity and makes the RCLSTM model become more applicable in latency-stringent application scenarios. In the field of telecommunication networks, the prediction of traffic series and mobility traces could directly benefit from this improvement as we further demonstrate that the prediction accuracy of RCLSTM is comparable to that of the conventional LSTM no matter how we change the number of training samples or the length of input sequences. Random Cut Forest In this paper we focus on the anomaly detection problem for dynamic data streams through the lens of random cut forests. We investigate a robust random cut data structure that can be used as a sketch or synopsis of the input stream. We provide a plausible definition of non-parametric anomalies based on the influence of an unseen point on the remainder of the data, i.e., the externality imposed by that point. We show how the sketch can be efficiently updated in a dynamic data stream. We demonstrate the viability of the algorithm on publicly available real data. Random Decision Forests(RDF) Random Dot Product Graph(RDPG) Random Effects Model In statistics, a random effect(s) model, also called a variance components model, is a kind of hierarchical linear model. It assumes that the dataset being analysed consists of a hierarchy of different populations whose differences relate to that hierarchy. In econometrics, random effects models are used in the analysis of hierarchical or panel data when one assumes no fixed effects (it allows for individual effects). The random effects model is a special case of the fixed effects model. Contrast this to the biostatistics definitions, as biostatisticians use ‘fixed’ and ‘random’ effects to respectively refer to the population-average and subject-specific effects (and where the latter are generally assumed to be unknown, latent variables). Random Energy Model In statistical physics of disordered systems, the random energy model is a toy model of a system with quenched disorder. It concerns the statistics of a system of N particles, such that the number of possible states for the systems grow as {\displaystyle 2^N, while the energy of such states is a Gaussian stochastic variable. The model has an exact solution. Its simplicity makes this model suitable for pedagogical introduction of concepts like quenched disorder and replica symmetry. Random Energy Models, Optimal Learning Machines and Beyond Random Erasing In this paper, we introduce Random Erasing, a simple yet effective data augmentation techniques for training the convolutional neural network (CNN). In training phase, Random Erasing randomly selects a rectangle region in an image, and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduce the risk of network overfitting and make the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated into most of the CNN-based recognition models. Albeit simple, Random Erasing yields consistent improvement in image classification, object detection and person re-identification (re-ID). For image classification, our method improves WRN-28-10: top-1 error rate from 3.72% to 3.08% on CIFAR10, and from 18.68% to 17.65% on CIFAR100. For object detection on PASCAL VOC 2007, Random Erasing improves Fast-RCNN from 74.8% to 76.2% in mAP. For person re-ID, when using Random Erasing in recent deep models, we achieve the state-of-the-art accuracy: the rank-1 accuracy is 89.13% for Market-1501, 84.02% for DukeMTMC-reID, and 63.93% for CUHK03 under the new evaluation protocol. Random Ferns Method / Classifier Random ferns is a machine learning algorithm proposed by Ozuysal, Fua, and Lepetit (2007) for matching the same elements between two images of the same scene, allowing one to recognize certain objects or trace them on videos. The original motivation behind this method was to create a simple and e cient algorithm by extending the naive Bayes classifier; still the authors acknowledged its strong connection to decision tree ensembles like the random forest algorithm (Breiman 2001). Since introduction, random ferns have been applied in numerous computer vision applications, like image recognition (Bosch, Zisserman, and Munoz 2007), action recognition (Oshin, Gilbert, Illingworth, and Bowden 2009) or augmented reality (Wagner, Reitmayr, Mulloni, Drummond, and Schmalstieg 2010). However, it has not gathered attention outside this eld; thus, this work aims to bring this algorithm to a much wider spectrum of applications. In order to do that, I propose a generalized version of the algorithm, implemented in the R (R Core Team 2014) package rFerns (Kursa 2014) which is available from the Comprehensive R Archive Network (CRAN) at http://…/package=rFerns. rFerns Random Fields A random field is a generalization of a stochastic process such that the underlying parameter need no longer be a simple real or integer valued “time”, but can instead take values that are multidimensional vectors, or points on some manifold. At its most basic, discrete case, a random field is a list of random numbers whose indices are mapped onto a space (of n dimensions). When used in the natural sciences, values in a random field are often spatially correlated in one way or another. In its most basic form this might mean that adjacent values (i.e. values with adjacent indices) do not differ as much as values that are further apart. This is an example of a covariance structure, many different types of which may be modeled in a random field. More generally, the values might be defined over a continuous domain, and the random field might be thought of as a “function valued” random variable. Random Forest Random forests are an ensemble learning method for classification (and regression) that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes output by individual trees. The algorithm for inducing a random forest was developed by Leo Breiman and Adele Cutler, and “Random Forests” is their trademark. The term came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in 1995. The method combines Breiman’s “bagging” idea and the random selection of features, introduced independently by Ho and Amit and Geman in order to construct a collection of decision trees with controlled variance. The selection of a random subset of features is an example of the random subspace method, which, in Ho’s formulation, is a way to implement classification proposed by Eugene Kleinberg. ranger Random Geometric Graph(RGG) We propose an interdependent random geometric graph (RGG) model for interdependent networks. Based on this model, we study the robustness of two interdependent spatially embedded networks where interdependence exists between geographically nearby nodes in the two networks. We study the emergence of the giant mutual component in two interdependent RGGs as node densities increase, and define the percolation threshold as a pair of node densities above which the giant mutual component first appears. In contrast to the case for a single RGG, where the percolation threshold is a unique scalar for a given connection distance, for two interdependent RGGs, multiple pairs of percolation thresholds may exist, given that a smaller node density in one RGG may increase the minimum node density in the other RGG in order for a giant mutual component to exist. We derive analytical upper bounds on the percolation thresholds of two interdependent RGGs by discretization, and obtain $99\%$ confidence intervals for the percolation thresholds by simulation. Based on these results, we derive conditions for the interdependent RGGs to be robust under random failures and geographical attacks. Random Image Cropping and Patching(RICAP) Deep convolutional neural networks (CNNs) have achieved remarkable results in image processing tasks. However, their high expression ability risks overfitting. Consequently, data augmentation techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters are rendering traditional data augmentation techniques insufficient. In this study, we propose a new data augmentation technique called random image cropping and patching (RICAP) which randomly crops four images and patches them to create a new training image. Moreover, RICAP mixes the class labels of the four images, resulting in an advantage similar to label smoothing. We evaluated RICAP with current state-of-the-art CNNs (e.g., the shake-shake regularization model) by comparison with competitive data augmentation techniques such as cutout and mixup. RICAP achieves a new state-of-the-art test error of $2.19\%$ on CIFAR-10. We also confirmed that deep CNNs with RICAP achieve better results on classification tasks using CIFAR-100 and ImageNet and an image-caption retrieval task using Microsoft COCO. Random KNN(RKNN) Random KNN consists of an ensemble of base k-nearest neighbor models, each constructed from a random subset of the input variables. Random KNN can be used to select important features using the RKNN-FS algorithm. RKNN-FS is an innovative feature selection procedure for ‘small n, large p problems.’ Random KNN (no bootstrapping) is fast and stable compared with Random Forests. The rknn R package implements Random KNN classification, regression and variable selection algorithms. · KNN is stable, no hierarchical structure · Final model can be a single KNN (vs. many trees) · Local method: robust for complex data structure · Automatically re-train, incremental learning · Easy to implement rknn Random KNN Feature Selection(RKNN-FS) We present RKNN-FS, an innovative feature selection procedure for ‘small n, large p problems.’ RKNN-FS is based on Random KNN (RKNN), a novel generalization of traditional nearest-neighbor modeling. RKNN consists of an ensemble of base k-nearest neighbor models, each constructed from a random subset of the input variables. To rank the importance of the variables, we define a criterion on the RKNN framework, using the notion of support. A two-stage backward model selection method is then developed based on this criterion. Empirical results on microarray data sets with thousands of variables and relatively few samples show that RKNN-FS is an effective feature selection approach for high-dimensional data. RKNN is similar to Random Forests in terms of classification accuracy without feature selection. However, RKNN provides much better classification accuracy than RF when each method incorporates a feature-selection step. Our results show that RKNN is significantly more stable and more robust than Random Forests for feature selection when the input data are noisy and/or unbalanced. Further, RKNN-FS is much faster than the Random Forests feature selection method (RF-FS), especially for large scale problems, involving thousands of variables and multiple classes. rknn Random Labeled Point Process(RLPP) Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is to first impute the missing values, and then apply the clustering algorithm on the completed data. We consider missing values in the context of optimal clustering, which finds an optimal clustering operator with reference to an underlying random labeled point process (RLPP). We show how the missing-value problem fits neatly into the overall framework of optimal clustering by incorporating the missing value mechanism into the random labeled point process and then marginalizing out the missing-value process. In particular, we demonstrate the proposed framework for the Gaussian model with arbitrary covariance structures. Comprehensive experimental studies on both synthetic and real-world RNA-seq data show the superior performance of the proposed optimal clustering with missing values when compared to various clustering approaches. Optimal clustering with missing values obviates the need for imputation-based pre-processing of the data, while at the same time possessing smaller clustering errors. Random Linear Feedback Finite Time Adaptive Stabilization of LQ Systems Random Multimodel Deep Learning(RMDL) The exponential growth in the number of complex datasets every year requires more enhancement in machine learning methods to provide robust and accurate data classification. Lately, deep learning approaches have achieved surpassing results in comparison to previous machine learning algorithms. However, finding the suitable structure for these models has been a challenge for researchers. This paper introduces Random Multimodel Deep Learning (RMDL): a new ensemble, deep learning approach for classification. RMDL solves the problem of finding the best deep learning structure and architecture while simultaneously improving robustness and accuracy through ensembles of deep learning architectures. In short, RMDL trains multiple randomly generated models of Deep Neural Network (DNN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) in parallel and combines their results to produce better result of any of those models individually. In this paper, we describe RMDL model and compare the results for image and text classification as well as face recognition. We used MNIST and CIFAR-10 datasets as ground truth datasets for image classification and WOS, Reuters, IMDB, and 20newsgroup datasets for text classification. Lastly, we used ORL dataset to compare the model performance on face recognition task. Random Network Distillation(RND) We’ve developed Random Network Distillation (RND), a prediction-based method for encouraging reinforcement learning agents to explore their environments through curiosity, which for the first time1 exceeds average human performance on Montezuma’s Revenge. RND achieves state-of-the-art performance, periodically finds all 24 rooms and solves the first level without using demonstrations or having access to the underlying state of the game. RND incentivizes visiting unfamiliar states by measuring how hard it is to predict the output of a fixed random neural network on visited states. In unfamiliar states it’s hard to guess the output, and hence the reward is high. It can be applied to any reinforcement learning algorithm, is simple to implement and efficient to scale. Below we release a reference implementation of RND that can reproduce the results from our paper. Random Projection Random Projection is a foundational research topic that connects a bunch of machine learning algorithms under a similar mathematical basis. It is used to reduce the dimensionality of the dataset by projecting the data points efficiently to a smaller dimensions while preserving the original relative distance between the data points. In this paper, we are intended to explain random projection method, by explaining its mathematical background and foundation, the applications that are currently adopting it, and an overview on its current research perspective. Random Projection Ensemble Classification The random projection ensemble classifier is a very general method for classification of high-dimensional data, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lower-dimensional space. The random projections are divided into non-overlapping blocks, and within each block the projection yielding the smallest estimate of the test error is selected. The random projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a data-driven voting threshold to determine the final assignment. http://…/randproj.pdf RPEnsemble Random Projection Forest(rpForest) K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests (rpForests), for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable. Random Regression Model(RRM) Random regressions are types of hierarchical models in which data are structured in groups and (regression) coefficients can vary by groups. MultiRR Random Sample Consensus(RANSAC) Random sample consensus (RANSAC) is a successful algorithm in model fitting applications. It is vital to have strong exploration phase when there are an enormous amount of outliers within the dataset. Achieving a proper model is guaranteed by pure exploration strategy of RANSAC. However, finding the optimum result requires exploitation. GASAC is an evolutionary paradigm to add exploitation capability to the algorithm. Although GASAC improves the results of RANSAC, it has a fixed strategy for balancing between exploration and exploitation. In this paper, a new paradigm is proposed based on genetic algorithm with an adaptive strategy. We utilize an adaptive genetic operator to select high fitness individuals as parents and mutate low fitness ones. In the mutation phase, a training method is used to gradually learn which gene is the best replacement for the mutated gene. The proposed method adaptively balance between exploration and exploitation by learning about genes. During the final Iterations, the algorithm draws on this information to improve the final results. The proposed method is extensively evaluated on two set of experiments. In all tests, our method outperformed the other methods in terms of both the number of inliers found and the speed of the algorithm. Random Self-Ensemble(RSE) Recent studies have revealed the vulnerability of deep neural networks – A small adversarial perturbation that is imperceptible to human can easily make a well-trained deep neural network mis-classify. This makes it unsafe to apply neural networks in security-critical applications. In this paper, we propose a new defensive algorithm called Random Self-Ensemble (RSE) by combining two important concepts: ${\bf randomness}$ and ${\bf ensemble}$. To protect a targeted model, RSE adds random noise layers to the neural network to prevent from state-of-the-art gradient-based attacks, and ensembles the prediction over random noises to stabilize the performance. We show that our algorithm is equivalent to ensemble an infinite number of noisy models $f_\epsilon$ without any additional memory overhead, and the proposed training procedure based on noisy stochastic gradient descent can ensure the ensemble model has good predictive capability. Our algorithm significantly outperforms previous defense techniques on real datasets. For instance, on CIFAR-10 with VGG network (which has $92\%$ accuracy without any attack), under the state-of-the-art C&W attack within a certain distortion tolerance, the accuracy of unprotected model drops to less than $10\%$, the best previous defense technique has $48\%$ accuracy, while our method still has $86\%$ prediction accuracy under the same level of attack. Finally, our method is simple and easy to integrate into any neural network. Random Self-Reducibility Random self-reducibility (RSR) is the rule that a good algorithm for the average case implies a good algorithm for the worst case. RSR is the ability to solve all instances of a problem by solving a large fraction of the instances. Random Subsampling Random sub­sampling, which is also known as Monte Carlo crossvalidation, as multiple holdout or as repeated evaluation set, is based on randomly splitting the data into subsets, whereby the size of the subsets is defined by the user. The random partitioning of the data can be repeated arbitrarily often. In contrast to a full crossvalidation procedure, random subsampling has been shown to be asymptotically consistent resulting in more pessimistic predictions of the test data compared with crossvalidation. The predictions of the test data give a realistic estimation of the predictions of external validation data . Random Swap We formulate probabilistic clustering method based on a sequence of random swaps of cluster centroids. We show that the algorithm has linear dependency on the number of data vectors, quadratic on the number of clusters, and inverse dependency on the dimensionality. Each halving of the probability of failure (e.g. from 1% to 0.5%) is achieved at the cost of only linear increase in the processing time. Efficiency of random swap clustering Random Utility Model(RUM) Random Variable In probability and statistics, a random variable, aleatory variable or stochastic variable is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense). A random variable can take on a set of possible different values (similarly to other mathematical variables), each with an associated probability, in contrast to other mathematical variables. A random variable’s possible values might represent the possible outcomes of a yet-to-be-performed experiment, or the possible outcomes of a past experiment whose already-existing value is uncertain (for example, due to imprecise measurements or quantum uncertainty). They may also conceptually represent either the results of an ‘objectively’ random process (such as rolling a die) or the ‘subjective’ randomness that results from incomplete knowledge of a quantity. The meaning of the probabilities assigned to the potential values of a random variable is not part of probability theory itself but is instead related to philosophical arguments over the interpretation of probability. The mathematics works the same regardless of the particular interpretation in use. The mathematical function describing the possible values of a random variable and their associated probabilities is known as a probability distribution. Random variables can be discrete, that is, taking any of a specified finite or countable list of values, endowed with a probability mass function, characteristic of a probability distribution; or continuous, taking any numerical value in an interval or collection of intervals, via a probability density function that is characteristic of a probability distribution; or a mixture of both types. The realizations of a random variable, that is, the results of randomly choosing values according to the variable’s probability distribution function, are called random variates. The formal mathematical treatment of random variables is a topic in probability theory. In that context, a random variable is understood as a function defined on a sample space whose outputs are numerical values. Random Vector Functional Link Network(RVFL+) In school, a teacher plays an important role in various classroom teaching patterns. Likewise to this human learning activity, the learning using privileged information (LUPI) paradigm provides additional information generated by the teacher to ‘teach’ learning algorithms during the training stage. Therefore, this novel learning paradigm is a typical Teacher-Student Interaction mechanism. This paper is the first to present a random vector functional link network based on the LUPI paradigm, called RVFL+. Rather than simply combining two existing approaches, the newly-derived RVFL+ fills the gap between neural networks and the LUPI paradigm, which offers an alternative way to train RVFL networks. Moreover, the proposed RVFL+ can perform in conjunction with the kernel trick for highly complicated nonlinear feature learning, which is termed KRVFL+. Furthermore, the statistical property of the proposed RVFL+ is investigated, and we derive a sharp and high-quality generalization error bound based on the Rademacher complexity. Competitive experimental results on 14 real-world datasets illustrate the great effectiveness and efficiency of the novel RVFL+ and KRVFL+, which can achieve better generalization performance than state-of-the-art algorithms. Random Walk Covariance Model rwc Random Warping Series(RWS) Time series data analytics has been a problem of substantial interests for decades, and Dynamic Time Warping (DTW) has been the most widely adopted technique to measure dissimilarity between time series. A number of global-alignment kernels have since been proposed in the spirit of DTW to extend its use to kernel-based estimation method such as support vector machine. However, those kernels suffer from diagonal dominance of the Gram matrix and a quadratic complexity w.r.t. the sample size. In this work, we study a family of alignment-aware positive definite (p.d.) kernels, with its feature embedding given by a distribution of \emph{Random Warping Series (RWS)}. The proposed kernel does not suffer from the issue of diagonal dominance while naturally enjoys a \emph{Random Features} (RF) approximation, which reduces the computational complexity of existing DTW-based techniques from quadratic to linear in terms of both the number and the length of time-series. We also study the convergence of the RF approximation for the domain of time series of unbounded length. Our extensive experiments on 16 benchmark datasets demonstrate that RWS outperforms or matches state-of-the-art classification and clustering methods in both accuracy and computational time. Our code and data is available at { \url{https://…/RandomWarpingSeries}}. Random Weighting This paper provides an entire inference procedure for the autoregressive model under (conditional) heteroscedasticity of unknown form with a finite variance. We first establish the asymptotic normality of the weighted least absolute deviations estimator (LADE) for the model. Second, we develop the random weighting (RW) method to estimate its asymptotic covariance matrix, leading to the implementation of the Wald test. Third, we construct a portmanteau test for model checking, and use the RW method to obtain its critical values. As a special weighted LADE, the feasible adaptive LADE (ALADE) is proposed and proved to have the same efficiency as its infeasible counterpart. The importance of our entire methodology based on the feasible ALADE is illustrated by simulation results and the real data analysis on three U.S. economic data sets. Randomised Bayesian Least-Squares Policy Iteration(RBLSPI) We introduce Bayesian least-squares policy iteration (BLSPI), an off-policy, model-free, policy iteration algorithm that uses the Bayesian least-squares temporal-difference (BLSTD) learning algorithm to evaluate policies. An online variant of BLSPI has been also proposed, called randomised Bayesian least-squares policy iteration (RBLSPI), that improves its policy based on an incomplete policy evaluation step. In online setting, the exploration-exploitation dilemma should be addressed as we try to discover the optimal policy by using samples collected by ourselves. RBLSPI exploits the advantage of BLSTD to quantify our uncertainty about the value function. Inspired by Thompson sampling, RBLSPI first samples a value function from a posterior distribution over value functions, and then selects actions based on the sampled value function. The effectiveness and the exploration abilities of RBLSPI are demonstrated experimentally in several environments. Randomized Adversarial Training(RAT) Since the discovery of adversarial examples in machine learning, researchers have designed several techniques to train neural networks that are robust against different types of attacks (most notably $\ell_\infty$ and $\ell_2$ based attacks). However, it has been observed that the defense mechanisms designed to protect against one type of attack often offer poor performance against the other. In this paper, we introduce Randomized Adversarial Training (RAT), a technique that is efficient both against $\ell_2$ and $\ell_\infty$ attacks. To obtain this result, we build upon adversarial training, a technique that is efficient against $\ell_\infty$ attacks, and demonstrate that adding random noise at training and inference time further improves performance against \ltwo attacks. We then show that RAT is as efficient as adversarial training against $\ell_\infty$ attacks while being robust against strong $\ell_2$ attacks. Our final comparative experiments demonstrate that RAT outperforms all state-of-the-art approaches against $\ell_2$ and $\ell_\infty$ attacks. Randomized Block Cubic Newton(RBCN) We study the problem of minimizing the sum of three convex functions: a differentiable, twice-differentiable and a non-smooth term in a high dimensional setting. To this effect we propose and analyze a randomized block cubic Newton (RBCN) method, which in each iteration builds a model of the objective function formed as the sum of the natural models of its three components: a linear model with a quadratic regularizer for the differentiable term, a quadratic model with a cubic regularizer for the twice differentiable term, and perfect (proximal) model for the nonsmooth term. Our method in each iteration minimizes the model over a random subset of blocks of the search variable. RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases. We establish ${\cal O}(1/\epsilon)$, ${\cal O}(1/\sqrt{\epsilon})$ and ${\cal O}(\log (1/\epsilon))$ rates under different assumptions on the component functions. Lastly, we show numerically that our method outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression. Randomized Canonical Correlation Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples – the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernel-based non-random versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernel-based measures. Randomized Generalized Variance Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples – the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernel-based non-random versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernel-based measures. Randomized Gradient Boosting Machine(RGBM) Gradient Boosting Machine (GBM) introduced by Friedman is an extremely powerful supervised learning algorithm that is widely used in practice — it routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In spite of the usefulness of GBM in practice, there is a big gap between its theoretical understanding and its success in practice. In this work, we propose Randomized Gradient Boosting Machine (RGBM) which leads to significant computational gains compared to GBM, by using a randomization scheme to reduce the search in the space of weak learners. Our analysis provides a formal justification of commonly used ad hoc heuristics employed by GBM implementations such as XGBoost, and suggests alternatives. In particular, we also provide a principled guideline towards better step-size selection in RGBM that does not require a line search. The analysis of RGBM is inspired by a special variant of coordinate descent that combines the benefits of randomized coordinate descent and greedy coordinate descent; and may be of independent interest as an optimization algorithm. As a special case, our results for RGBM lead to superior computational guarantees for GBM. Our computational guarantees depend upon a curious geometric quantity that we call Minimal Cosine Angle, which relates to the density of weak learners in the prediction space. We demonstrate the effectiveness of RGBM over GBM in terms of obtaining a model with good training/test data fidelity with a fraction of the computational cost, via numerical experiments on several real datasets. Randomized Hierarchical Alternating Least Squares Nonnegative matrix factorization (NMF) is a powerful tool for data mining. However, the emergence of big data’ has severely challenged our ability to compute this fundamental decomposition using deterministic algorithms. This paper presents a randomized hierarchical alternating least squares (HALS) algorithm to compute the NMF. By deriving a smaller matrix from the nonnegative input data, a more efficient nonnegative decomposition can be computed. Our algorithm scales to big data applications while attaining a near-optimal factorization, i.e., the algorithm scales with the target rank of the data rather than the ambient dimension of measurement space. The proposed algorithm is evaluated using synthetic and real world data and shows substantial speedups compared to deterministic HALS. Randomized Independent Component Analysis(RICA) Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples – the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernel-based non-random versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernel-based measures. Randomized Principal Component Analysis(RPCA) Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy – even on parallel processors – unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently out-of-core.) We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer’s RAM. Read More: https://…/100804139 Randomized Response Randomized response is a research method used in structured survey interview. It was first proposed by S. L. Warner in 19651 and later modified by B. G. Greenberg in 1969.2 It allows respondents to respond to sensitive issues (such as criminal behavior or sexuality) while maintaining confidentiality. Chance decides, unknown to the interviewer, whether the question is to be answered truthfully, or “yes”, regardless of the truth. For example, social scientists have used it to ask people whether they use drugs, whether they have illegally installed telephones, or whether they have evaded paying taxes. Before abortions were legal, social scientists used the method to ask women whether they had had abortions. rr Randomized Singular Value Decomposition(rSVD) Matrix completion is a widely used technique for image inpainting and personalized recommender system, etc. In this work, we focus on accelerating the matrix completion using faster randomized singular value decomposition (rSVD). Firstly, two fast randomized algorithms (rSVD-PI and rSVD- BKI) are proposed for handling sparse matrix. They make use of an eigSVD procedure and several accelerating skills. Then, with the rSVD-BKI algorithm and a new subspace recycling technique, we accelerate the singular value thresholding (SVT) method in [1] to realize faster matrix completion. Experiments show that the proposed rSVD algorithms can be 6X faster than the basic rSVD algorithm [2] while keeping same accuracy. For image inpainting and movie-rating estimation problems, the proposed accelerated SVT algorithm consumes 15X and 8X less CPU time than the methods using svds and lansvd respectively, without loss of accuracy. Randomized Weighted Majority Algorithm(RWMA) The randomized weighted majority algorithm is an algorithm in machine learning theory. It improves the mistake bound of the weighted majority algorithm. Imagine that every morning before the stock market opens, we get a prediction from each of our ‘experts’ about whether the stock market will go up or down. Our goal is to somehow combine this set of predictions into a single prediction that we then use to make a buy or sell decision for the day. The RWMA gives us a way to do this combination such that our prediction record will be nearly as good as that of the single best expert in hindsight. ➘ “Weighted Majority Algorithm” Range Entropy Sample entropy ($SampEn$) has been accepted as an alternate, and sometimes a replacement, measure to approximate entropy ($ApEn$) for characterizing temporal complexity of time series. However, it still suffers from issues such as inconsistency over short-length signals and its tolerance parameter $r$, susceptibility to signal amplitude changes and insensitivity to self-similarity of time series. We propose modifications to the $ApEn$ and $SampEn$ measures which are defined for 0<$r$<1, are more robust to signal amplitude changes and sensitive to self-similarity property of time series. We modified $ApEn$ and $SampEn$ by redefining the distance function used originally in their definitions. We then evaluated the new entropy measures, called range entropies ($RangeEn$) using different random processes and nonlinear deterministic signals. We further applied the proposed entropies to normal and epileptic electroencephalographic (EEG) signals under different states. Our results suggest that, unlike $ApEn$ and $SampEn$, $RangeEn$ measures are robust to stationary and nonstationary signal amplitude variations and that their trajectories in the tolerance r-plane are constrained between 0 (maximum entropy) and 1 (minimum entropy). We also showed that $RangeEn$ have direct relationships with the Hurst exponent; suggesting that the new definitions are sensitive to self-similarity structures of signals. $RangeEn$ analysis of epileptic EEG data showed distinct behaviours in the $r$-domain for extracranial versus intracranial recordings as well as different states of epileptic EEG data. The constrained trajectory of $RangeEn$ in the r-plane makes them a good candidate for studying complex biological signals such as EEG during seizure and non-seizure states. The Python package used to generate the results shown in this paper is publicly available at: https://…/RangeEn. Rank-1 Convolutional Neural Network In this paper, we propose a convolutional neural network(CNN) with 3-D rank-1 filters which are composed by the outer product of 1-D filters. After being trained, the 3-D rank-1 filters can be decomposed into 1-D filters in the test time for fast inference. The reason that we train 3-D rank-1 filters in the training stage instead of consecutive 1-D filters is that a better gradient flow can be obtained with this setting, which makes the training possible even in the case where the network with consecutive 1-D filters cannot be trained. The 3-D rank-1 filters are updated by both the gradient flow and the outer product of the 1-D filters in every epoch, where the gradient flow tries to obtain a solution which minimizes the loss function, while the outer product operation tries to make the parameters of the filter to live on a rank-1 sub-space. Furthermore, we show that the convolution with the rank-1 filters results in low rank outputs, constraining the final output of the CNN also to live on a low dimensional subspace. Rank-Aware Factorization Machine(RaFM) Factorization machines (FM) are a popular model class to learn pairwise interactions by a low-rank approximation. Different from existing FM-based approaches which use a fixed rank for all features, this paper proposes a Rank-Aware Factorization machine (RaFM) model which adopts pairwise interactions from embeddings with different ranks. The proposed model achieves a better performance on real-world datasets where different features have significantly varying frequencies of occurrences. Moreover, we prove that the RaFM model can be stored, evaluated, and trained as efficiently as one single FM, and under some reasonable conditions it can be even significantly more efficient than FM. RaFM improves the performance of FMs in both regression tasks and classification tasks while incurring less computational burden, therefore also has attractive potential in industrial applications. Rank-Breaking-Then-Composite-Marginal-Likelihood(RBCML) We propose a novel and flexible rank-breaking-then-composite-marginal-likelihood (RBCML) framework for learning random utility models (RUMs), which include the Plackett-Luce model. We characterize conditions for the objective function of RBCML to be strictly log-concave by proving that strict log-concavity is preserved under convolution and marginalization. We characterize necessary and sufficient conditions for RBCML to satisfy consistency and asymptotic normality. Experiments on synthetic data show that RBCML for Gaussian RUMs achieves better statistical efficiency and computational efficiency than the state-of-the-art algorithm and our RBCML for the Plackett-Luce model provides flexible tradeoffs between running time and statistical efficiency. RankCGAN In this paper, we investigate the use of generative adversarial networks in the task of image generation according to subjective measures of semantic attributes. Unlike the standard (CGAN) that generates images from discrete categorical labels, our architecture handles both continuous and discrete scales. Given pairwise comparisons of images, our model, called RankCGAN, performs two tasks: it learns to rank images using a subjective measure; and it learns a generative model that can be controlled by that measure. RankCGAN associates each subjective measure of interest to a distinct dimension of some latent space. We perform experiments on UT-Zap50K, PubFig and OSR datasets and demonstrate that the model is expressive and diverse enough to conduct two-attribute exploration and image editing. Ranked Set Sampling(RSS) Ranked set sampling (RSS) is introduced as an advanced method for data collection which is substantial for the statistical and methodological analysis in scientific studies by McIntyre (1952) (reprinted in 2005) . RSSampling Ranking Distillation(RD) We propose a novel way to train ranking models, such as recommender systems, that are both effective and efficient. Knowledge distillation (KD) was shown to be successful in image recognition to achieve both effectiveness and efficiency. We propose a KD technique for learning to rank problems, called \emph{ranking distillation (RD)}. Specifically, we train a smaller student model to learn to rank documents/items from both the training data and the supervision of a larger teacher model. The student model achieves a similar ranking performance to that of the large teacher model, but its smaller model size makes the online inference more efficient. RD is flexible because it is orthogonal to the choices of ranking models for the teacher and student. We address the challenges of RD for ranking problems. The experiments on public data sets and state-of-the-art recommendation models showed that RD achieves its design purposes: the student model learnt with RD has a model size less than half of the teacher model while achieving a ranking performance similar to the teacher model and much better than the student model learnt without RD. Ranking Relative Principal Component Attributes Network Model(REL-PCANet) In 2018, at the World Economic Forum in Davos it was presented a new countries’ economic performance metric named the Inclusive Development Index (IDI) composed of 12 indicators. The new metric implies that countries might need to realize structural reforms for improving both economic expansion and social inclusion performance. That is why, it is vital for the IDI calculation method to have strong statistical and mathematical basis, so that results are accurate and transparent for public purposes. In the current work, we propose a novel approach for the IDI estimation – the Ranking Relative Principal Component Attributes Network Model (REL-PCANet). The model is based on RELARM and RankNet principles and combines elements of PCA, techniques applied in image recognition and learning to rank mechanisms. Also, we define a new approach for estimation of target probabilities matrix to reflect dynamic changes in countries’ inclusive development. Empirical study proved that REL-PCANet ensures reliable and robust scores and rankings, thus is recommended for practical implementation. RankLib RankLib is a library of learning to rank algorithms. Currently eight popular algorithms have been implemented: · MART (Multiple Additive Regression Trees, a.k.a. Gradient boosted regression tree) · RankNet · RankBoost · AdaRank · Coordinate Ascent · LambdaMART · ListNet · Random Forests · With appropriate parameters for Random Forests, it can also do bagging several MART/LambdaMART rankers. It also implements many retrieval metrics as well as provides many ways to carry out evaluation. Rank-Ordered Logit(RO-Logit) Tan et al. (2017) The control of confounding is an area of extensive epidemiological research, especially in the field of causal inference for observational studies. Matched cohort and case-control study designs are commonly implemented to control for confounding effects without specifying the functional form of the relationship between the outcome and confounders. This paper extends the commonly used regression models in matched designs for binary and survival outcomes (i.e. conditional logistic and stratified Cox proportional hazards) to studies of continuous outcomes through a novel interpretation and application of logit-based regression models from the econometrics and marketing research literature. We compare the performance of the maximum likelihood estimators using simulated data and propose a heuristic argument for obtaining the residuals for model diagnostics. We illustrate our proposed approach with two real data applications. Our simulation studies demonstrate that our stratification approach is robust to model misspecification and that the distribution of the estimated residuals provides a useful diagnostic when the strata are of moderate size. In our applications to real data, we demonstrate that parity and menopausal status are associated with percent mammographic density, and that the mean level and variability of inpatient blood glucose readings vary between medical and surgical wards within a national tertiary hospital. Our work highlights how the same class of regression models, available in most statistical software, can be used to adjust for confounding in the study of binary, time-to-event and continuous outcomes. ROlogit RankPL In this paper we introduce RankPL, a modeling language that can be thought of as a qualitative variant of a probabilistic programming language with a semantics based on Spohn’s ranking theory. Broadly speaking, RankPL can be used to represent and reason about processes that exhibit uncertainty expressible by distinguishing ‘normal’ from’ surprising’ events. RankPL allows (iterated) revision of rankings over alternative program states and supports various types of reasoning, including abduction and causal inference. We present the language, its denotational semantics, and a number of practical examples. We also discuss an implementation of RankPL that is available for download. Rank-Regret Representative(RRR) We propose the rank-regret representative as a way of choosing a small subset of the database guaranteed to contain at least one of the top-k of any linear ranking function. We provide the techniques for finding such set and conduct experiments on real datasets to confirm the efficiency and effectiveness of our proposal. Rant Rant is an all-purpose procedural text engine that is most simply described as the opposite of Regex. It has been refined to include a dizzying array of features for handling everything from the most basic of string generation tasks to advanced dialogue generation, code templating, automatic formatting, and more. The goal of the project is to enable developers of all kinds to automate repetitive writing tasks with a high degree of creative freedom. Rao-Scott Cochran-Armitage by Slices Trend Test(RSCABS) RSCABS RApache rApache is a project supporting web application development using the R statistical language and environment and the Apache web server. The current software distribution runs on UNIX/Linux and Mac OS X operating systems. Apache servers with threaded Multi-Processing Modules are now supported, but the the Apache Prefork Multi-Processing Module is still recommended (refer to the Multi-Processing Modules chapter from Apache for more about this). The rApache software distribution provides the Apache module named mod_R that embeds the R interpreter inside the web server. It also comes bundled with libapreq, an Apache module for manipulating client request data. Together, they provide the glue to transform R into a server-side scripting environment. Another important project that’s not bundled with rApache, but plays an important role in server-side scripting, is the R package brew (also available on CRAN). It implements a templating framework for report generation, and it’s perfect for generating HTML on the fly. it’s syntax is similar to PHP, Ruby’s erb module, Java Server Pages, and Python’s psp module. brew can be used stand-alone as well, so it’s not part of the distribution. http://…/rscript-as-service-api Rapid Automatic Keyword Extraction(RAKE) Keywords are widely used to define queries within information retrieval (IR) systems as they are easy to define, revise, remember, and share. This chapter describes the rapid automatic keyword extraction (RAKE), an unsupervised, domain-independent, and language-independent method for extracting keywords from individual documents. It provides details of the algorithm and its configuration parameters, and present results on a benchmark dataset of technical abstracts, showing that RAKE is more computationally efficient than TextRank while achieving higher precision and comparable recall scores. The chapter then describes a novel method for generating stoplists, which is used to configure RAKE for specific domains and corpora. Finally, it applies RAKE to a corpus of news articles and defines metrics for evaluating the exclusivity, essentiality, and generality of extracted keywords, enabling a system to identify keywords that are essential or general to documents in the absence of manual annotations. rapidraker Rapid Orthogonal Approximate Slepian Transform(ROAST) In this paper, we provide a Rapid Orthogonal Approximate Slepian Transform (ROAST) for the discrete vector one obtains when collecting a finite set of uniform samples from a baseband analog signal. The ROAST offers an orthogonal projection which is an approximation to the orthogonal projection onto the leading discrete prolate spheroidal sequence (DPSS) vectors (also known as Slepian basis vectors). As such, the ROAST is guaranteed to accurately and compactly represent not only oversampled bandlimited signals but also the leading DPSS vectors themselves. Moreover, the subspace angle between the ROAST subspace and the corresponding DPSS subspace can be made arbitrarily small. The complexity of computing the representation of a signal using the ROAST is comparable to the FFT, which is much less than the complexity of using the DPSS basis vectors. We also give non-asymptotic results to guarantee that the proposed basis not only provides a very high degree of approximation accuracy in a mean-square error sense for bandlimited sample vectors, but also that it can provide high-quality approximations of all sampled sinusoids within the band of interest. RAPIDNN Classification of very high resolution (VHR) satellite images has three major challenges: 1) inherent low intra-class and high inter-class spectral similarities, 2) mismatching resolution of available bands, and 3) the need to regularize noisy classification maps. Conventional methods have addressed these challenges by adopting separate stages of image fusion, feature extraction, and post-classification map regularization. These processing stages, however, are not jointly optimizing the classification task at hand. In this study, we propose a single-stage framework embedding the processing stages in a recurrent multiresolution convolutional network trained in an end-to-end manner. The feedforward version of the network, called FuseNet, aims to match the resolution of the panchromatic and multispectral bands in a VHR image using convolutional layers with corresponding downsampling and upsampling operations. Contextual label information is incorporated into FuseNet by means of a recurrent version called ReuseNet. We compared FuseNet and ReuseNet against the use of separate processing steps for both image fusion, e.g. pansharpening and resampling through interpolation, and map regularization such as conditional random fields. We carried out our experiments on a land cover classification task using a Worldview-03 image of Quezon City, Philippines and the ISPRS 2D semantic labeling benchmark dataset of Vaihingen, Germany. FuseNet and ReuseNet surpass the baseline approaches in both quantitative and qualitative results. Rasch Model The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between (a) the respondent’s abilities, attitudes or personality traits and (b) the item difficulty. For example, they may be used to estimate a student’s reading ability, or the extremity of a person’s attitude to capital punishment from responses on a questionnaire. In addition to psychometrics and educational research, the Rasch model and its extensions are used in other areas, including the health profession and market research because of their general applicability. The mathematical theory underlying Rasch models is a special case of item response theory and, more generally, a special case of a generalized linear model. However, there are important differences in the interpretation of the model parameters and its philosophical implications that separate proponents of the Rasch model from the item response modeling tradition. A central aspect of this divide relates to the role of specific objectivity, a defining property of the Rasch model according to Georg Rasch, as a requirement for successful measurement. ➚ “Item Response Theory” mixRasch Raster Time Series The raster model is widely used in Geographic Information Systems to represent data that vary continuously in space, such as temperatures, precipitations, elevation, among other spatial attributes. In applications like weather forecast systems, not just a single raster, but a sequence of rasters covering the same region at different timestamps, known as a raster time series, needs to be stored and queried. Compact data structures have proven successful to provide space-efficient representations of rasters with query capabilities. Hence, a naive approach to save space is to use such a representation for each raster in a time series. Rating Scale A rating scale is a set of categories designed to elicit information about a quantitative or a qualitative attribute. In the social sciences, particularly psychology, common examples are the Likert scale and 1-10 rating scales in which a person selects the number which is considered to reflect the perceived quality of a product. Rationalization We introduce AI rationalization, an approach for generating explanations of autonomous system behavior as if a human had done the behavior. We describe a rationalization technique that uses neural machine translation to translate internal state-action representations of the autonomous agent into natural language. We evaluate our technique in the Frogger game environment. The natural language is collected from human players thinking out loud as they play the game. We motivate the use of rationalization as an approach to explanation generation, show the results of experiments on the accuracy of our rationalization technique, and describe future research agenda. Raw Data Raw data (also known as primary data) is a term for data collected from a source. Raw data has not been subjected to processing or any other manipulation, and are also referred to as primary data. Raw data is a relative term. Raw data can be input to a computer program or used in manual procedures such as analyzing statistics from a survey. Ray The next generation of AI applications will continuously interact with the environment and learn from these interactions. These applications impose new and demanding systems requirements, both in terms of performance and flexibility. In this paper, we consider these requirements and present Ray—a distributed system to address them. Ray implements a dynamic task graph computation model that supports both the task-parallel and the actor programming models. To meet the performance requirements of AI applications, we propose an architecture that logically centralizes the system’s control state using a sharded storage system and a novel bottom-up distributed scheduler. In our experiments, we demonstrate sub-millisecond remote task latencies and linear throughput scaling beyond 1.8 million tasks per second. We empirically validate that Ray speeds up challenging benchmarks and serves as both a natural and performant fit for an emerging class of reinforcement learning applications and algorithms. Ray Ray RLLib Reinforcement learning (RL) algorithms involve the deep nesting of distinct components, where each component typically exhibits opportunities for distributed computation. Current RL libraries offer parallelism at the level of the entire program, coupling all the components together and making existing implementations difficult to extend, combine, and reuse. We argue for building composable RL components by encapsulating parallelism and resource requirements within individual components, which can be achieved by building on top of a flexible task-based programming model. We demonstrate this principle by building Ray RLLib on top of Ray and show that we can implement a wide range of state-of-the-art algorithms by composing and reusing a handful of standard components. This composability does not come at the cost of performance — in our experiments, RLLib matches or exceeds the performance of highly optimized reference implementations. Ray RLLib is available as part of Ray at https://…/. RBFI Unit In adversarial attacks to machine-learning classifiers, small perturbations are added to input that is correctly classified. The perturbations yield adversarial examples, which are virtually indistinguishable from the unperturbed input, and yet are misclassified. In standard neural networks used for deep learning, attackers can craft adversarial examples from most input to cause a misclassification of their choice. We introduce a new type of network units, called RBFI units, whose non-linear structure makes them inherently resistant to adversarial attacks. On permutation-invariant MNIST, in absence of adversarial attacks, networks using RBFI units match the performance of networks using sigmoid units, and are slightly below the accuracy of networks with ReLU units. When subjected to adversarial attacks, networks with RBFI units retain accuracies above 90% for attacks that degrade the accuracy of networks with ReLU or sigmoid units to below 2%. RBFI networks trained with regular input are superior in their resistance to adversarial attacks even to ReLU and sigmoid networks trained with the help of adversarial examples. The non-linear structure of RBFI units makes them difficult to train using standard gradient descent. We show that networks of RBFI units can be efficiently trained to high accuracies using pseudogradients, computed using functions especially crafted to facilitate learning instead of their true derivatives. We show that the use of pseudogradients makes training deep RBFI networks practical, and we compare several structural alternatives of RBFI networks for their accuracy. RCGAN-U ➘ “Robust Conditional GAN” RDeepSense Recent advances in deep learning have led various applications to unprecedented achievements, which could potentially bring higher intelligence to a broad spectrum of mobile and ubiquitous applications. Although existing studies have demonstrated the effectiveness and feasibility of running deep neural network inference operations on mobile and embedded devices, they overlooked the reliability of mobile computing models. Reliability measurements such as predictive uncertainty estimations are key factors for improving the decision accuracy and user experience. In this work, we propose RDeepSense, the first deep learning model that provides well-calibrated uncertainty estimations for resource-constrained mobile and embedded devices. RDeepSense enables the predictive uncertainty by adopting a tunable proper scoring rule as the training criterion and dropout as the implicit Bayesian approximation, which theoretically proves its correctness.To reduce the computational complexity, RDeepSense employs efficient dropout and predictive distribution estimation instead of model ensemble or sampling-based method for inference operations. We evaluate RDeepSense with four mobile sensing applications using Intel Edison devices. Results show that RDeepSense can reduce around 90% of the energy consumption while producing superior uncertainty estimations and preserving at least the same model accuracy compared with other state-of-the-art methods. RDPD In many situations, we have both rich- and poor- data environments: in a rich-data environment (e.g., intensive care units), we have high-quality multi-modality data. On the other hand, in a poor-data environment (e.g., at home), we often only have access to a single data modality with low quality. How can we learn an accurate and efficient model for the poor-data environment by leveraging multi-modality data from the rich-data environment? In this work, we propose a knowledge distillation model RDPD to enhance a small model trained on poor data with a complex model trained on rich data. In an end-to-end fashion, RDPD trains a student model built on a single modality data (poor data) to imitate the behavior and performance of a teacher model from multimodal data (rich data) via jointly optimizing the combined loss of attention imitation and target imitation. We evaluated RDPD on three real-world datasets. RDPD consistently outperformed all baselines across all three datasets, especially achieving the greatest performance improvement over a standard neural network model trained on the common features (Direct model) by 24.56% on PR-AUC and 12.21% on ROC-AUC, and over the standard knowledge distillation model by 5.91% on PR-AUC and 4.44% on ROC-AUC. ReabsNet Though deep neural network has hit a huge success in recent studies and applica- tions, it still remains vulnerable to adversarial perturbations which are imperceptible to humans. To address this problem, we propose a novel network called ReabsNet to achieve high classification accuracy in the face of various attacks. The approach is to augment an existing classification network with a guardian network to detect if a sample is natural or has been adversarially perturbed. Critically, instead of simply rejecting adversarial examples, we revise them to get their true labels. We exploit the observation that a sample containing adversarial perturbations has a possibility of returning to its true class after revision. We demonstrate that our ReabsNet outperforms the state-of-the-art defense method under various adversarial attacks. React.js React (sometimes styled React.js or ReactJS) is an open-source JavaScript library for creating user interfaces that aims to address challenges encountered in developing single-page applications. It is maintained by Facebook, Instagram and a community of individual developers and corporations. React is intended to help developers build large applications that use data that changes over time. Its goal is to be simple, declarative and composable. React only handles the user interface in an app; it is considered to only be the view in the model-view-controller (MVC) software pattern, and can be used in conjunction with other JavaScript libraries or larger MVC frameworks such as AngularJS. It can also be used with React-based add-ons that take care of the non-UI parts of building a web application. According to JavaScript analytics service Libscore, React is currently being used on the homepages of Imgur, Bleacher Report, Feedly, Airbnb, SeatGeek, HelloSign, and others. Reactive Application A Reactive Application is an application that reacts to its changing environment by design. It’s constructed from the beginning to react to load, react to failure and react to users. This is achieved by the underlying notion of reacting to messages. Reactive Programming In computing, reactive programming is a programming paradigm oriented around data flows and the propagation of change. This means that it should be possible to express static or dynamic data flows with ease in the programming languages used, and that the underlying execution model will automatically propagate changes through the data flow. For example, in an imperative programming setting, a:=b+c would mean that a is being assigned the result of b+c in the instant the expression is evaluated. Later, the values of b and c can be changed with no effect on the value of a. In reactive programming, the value of a would be automatically updated based on the new values. readPTU readPTU is a python package designed to analyze time-correlated single-photon counting data. The use of the library promotes the storage of the complete time arrival information of the photons and full flexibility in post-processing data for analysis. The library supports the computation of time resolved signal with external triggers and second order autocorrelation function analysis can be performed using multiple algorithms that provide the user with different trade-offs with regards to speed and accuracy. Additionally, a thresholding algorithm to perform time post-selection is also available. The library has been designed with performance and extensibility in mind to allow future users to implement support for additional file extensions and algorithms without having to deal with low level details. We demonstrate the performance of readPTU by analyzing the second-order autocorrelation function of the resonance fluorescence from a single quantum dot in a two-dimensional semiconductor. Real log Canonical Threshold(RLCT) ➘ “Widely Applicable Bayesian Information Criterion” Real Logic We propose real logic: a uniform framework for integrating automatic learning and reasoning. Real logic is defined on a full first-order language where formulas have truth-value in the interval [0,1] and semantics defined concretely on the domain of real numbers. Logical constants are interpreted as (feature) vectors of real numbers. Real logic promotes a well-founded integration of deductive reasoning on knowledge-bases with efficient, data-driven relational machine learning. We show how Real Logic can be implemented in deep Tensor Neural Networks with the use of Google’s TensorFlow primitives. The paper concludes with experiments on a simple but representative example of knowledge completion. REalistic Single Image DEhazing(RESIDE) In this paper, we present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new large-scale benchmark consisting of both synthetic and real-world hazy images, called REalistic Single Image DEhazing (RESIDE). RESIDE highlights diverse data sources and image contents, and is divided into five subsets, each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Experiments on RESIDE sheds light on the comparisons and limitations of state-of-the-art dehazing algorithms, and suggest promising future directions. (PDF) RESIDE: A Benchmark for Single Image Dehazing. Available from: https://…IDE_A_Benchmark_for_Single_Image_Dehazing [accessed Jul 03 2018]. Real-Time Anomaly Detection System(RADS) Cybersecurity attacks in Cloud data centres are increasing alongside the growth of the Cloud services market. Existing research proposes a number of anomaly detection systems for detecting such attacks. However, these systems encounter a number of challenges, specifically due to the unknown behaviour of the attacks and the occurrence of genuine Cloud workload spikes, which must be distinguished from attacks. In this paper, we discuss these challenges and investigate the issues with the existing Cloud anomaly detection approaches. Then, we propose a Real-time Anomaly Detection System (RADS) for Cloud data centres, which uses a one class classification algorithm and a window-based time series analysis to address the challenges. Specifically, RADS can detect VM-level anomalies occurring due to DDoS and cryptomining attacks. We evaluate the performance of RADS by running lab-based experiments and by using real-world Cloud workload traces. Evaluation results demonstrate that RADS can achieve 90-95% accuracy with a low false positive rate of 0-3%. The results further reveal that RADS experiences fewer false positives when using its window-based time series analysis in comparison to using state-of-the-art average or entropy based analysis. Real-time Automated Photometric IDentification(RAPID) We present RAPID (Real-time Automated Photometric IDentification), a novel time-series classification tool capable of automatically identifying transients from within a day of the initial alert, to the full lifetime of a light curve. Using a deep recurrent neural network with Gated Recurrent Units (GRUs), we present the first method specifically designed to provide early classifications of astronomical time-series data, typing 12 different transient classes. Our classifier can process light curves with any phase coverage, and it does not rely on deriving computationally expensive features from the data, making RAPID well-suited for processing the millions of alerts that ongoing and upcoming wide-field surveys such as the Zwicky Transient Facility (ZTF), and the Large Synoptic Survey Telescope (LSST) will produce. The classification accuracy improves over the lifetime of the transient as more photometric data becomes available, and across the 12 transient classes, we obtain an average area under the receiver operating characteristic curve of 0.95 and 0.98 at early and late epochs, respectively. We demonstrate RAPID’s ability to effectively provide early classifications of transients from the ZTF data stream. We have made RAPID available as an open-source software package (https://astrorapid.readthedocs.io ) for machine learning-based alert-brokers to use for the autonomous and quick classification of several thousand light curves within a few seconds. Real-Time Intelligent Computing ➘ “Real-Time Intelligent Systems” Real-Time Intelligent Systems Intelligent computing refers greatly to artificial intelligence with the aim at making computer to act as a human. This newly developed area of real-time intelligent computing integrates the aspect of dynamic environments with the human intelligence. Book: Lecture Notes in Real-Time Intelligent Systems Real-time IoT Benchmark for Distributed Stream Processing Platforms(RIoTBench) The Internet of Things (IoT) is an emerging technology paradigm where millions of sensors and actuators help monitor and manage, physical, environmental and human systems in real-time. The inherent closedloop responsiveness and decision making of IoT applications make them ideal candidates for using low latency and scalable stream processing platforms. Distributed Stream Processing Systems (DSPS) hosted on Cloud data-centers are becoming the vital engine for real-time data processing and analytics in any IoT software architecture. But the efficacy and performance of contemporary DSPS have not been rigorously studied for IoT applications and data streams. Here, we develop RIoTBench, a Realtime IoT Benchmark suite, along with performance metrics, to evaluate DSPS for streaming IoT applications. The benchmark includes 27 common IoT tasks classified across various functional categories and implemented as reusable micro-benchmarks. Further, we propose four IoT application benchmarks composed from these tasks, and that leverage various dataflow semantics of DSPS. The applications are based on common IoT patterns for data pre-processing, statistical summarization and predictive analytics. These are coupled with four stream workloads sourced from real IoT observations on smart cities and fitness, with peak streams rates that range from 500 to 10000 messages/sec and diverse frequency distributions. We validate the RIoTBench suite for the popular Apache Storm DSPS on the Microsoft Azure public Cloud, and present empirical observations. This suite can be used by DSPS researchers for performance analysis and resource scheduling, and by IoT practitioners to evaluate DSPS platforms. Real-Time Predictive Analytics It is when a predictive model (built/fitted on a set of aggregated data) is deployed to perform run-time prediction on a continuous stream of event data to enable decision making in real-time. In order to achieve this, there are two aspects involved. One, the predictive model built by a Data Scientist via a stand-alone tool (R, SAS, SPSS, etc.) has to be exported in a consumable format (PMML is a preferred method across machine learning environments these days; we have done this and also via other formats). Second, a streaming operational analytics platform has to consume the model (PMML or other format) and translate it into the necessary predictive function (via open-source jPMML or Cascading Pattern or Zementis’ commercial licensed UPPI or other interfaces), and also feed the processed streaming event data (via a stream processing component in CEP or similar) to compute the predicted outcome. This deployment of a complex predictive model, from its parent machine learning environment to an operational analytics environment, is one possible route in order to successfully achieve a continuous run-time prediction on streaming event data in real-time. Reblur2Deblur Motion blur is a fundamental problem in computer vision as it impacts image quality and hinders inference. Traditional deblurring algorithms leverage the physics of the image formation model and use hand-crafted priors: they usually produce results that better reflect the underlying scene, but present artifacts. Recent learning-based methods implicitly extract the distribution of natural images directly from the data and use it to synthesize plausible images. Their results are impressive, but they are not always faithful to the content of the latent image. We present an approach that bridges the two. Our method fine-tunes existing deblurring neural networks in a self-supervised fashion by enforcing that the output, when blurred based on the optical flow between subsequent frames, matches the input blurry image. We show that our method significantly improves the performance of existing methods on several datasets both visually and in terms of image quality metrics. The supplementary material is https://goo.gl/nYPjEQ Recall In pattern recognition and information retrieval with binary classification, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance. Suppose a program for recognizing dogs in scenes from a video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4 of the identifications are correct, but 3 are actually cats, the program’s precision is 4/7 while its recall is 4/9. When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. In statistics, if the null hypothesis is that all and only the relevant items are retrieved, absence of type I and type II errors corresponds respectively to maximum precision (no false positive) and maximum recall (no false negative). The above pattern recognition example contained 7 – 4 = 3 type I errors and 9 – 4 = 5 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant, while high recall means that an algorithm returned most of the relevant results. Recall-Oriented Understudy for Gisting Evaluation(ROUGE) ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation. Receding Horizon Accelerated Gradient(RHAG) This paper studies an online optimization problem with switching costs and a finite prediction window. We propose two computationally efficient algorithms: Receding Horizon Gradient Descent (RHGD), and Receding Horizon Accelerated Gradient (RHAG). Both algorithms only require a finite number of gradient evaluations at each time. We show that both the dynamic regret and the competitive ratio of the proposed algorithms decay exponentially fast with the length of the prediction window, and the decay rate of RHAG is larger than RHGD. Moreover, we provide a fundamental lower bound on the dynamic regret for general online algorithms with a finite prediction window. The lower bound matches the dynamic regret of our RHAG, meaning that the performance can not improve significantly even with more computation. Lastly, we present simulation results to test our algorithms numerically. Receding Horizon Gradient Descent(RHGD) This paper studies an online optimization problem with switching costs and a finite prediction window. We propose two computationally efficient algorithms: Receding Horizon Gradient Descent (RHGD), and Receding Horizon Accelerated Gradient (RHAG). Both algorithms only require a finite number of gradient evaluations at each time. We show that both the dynamic regret and the competitive ratio of the proposed algorithms decay exponentially fast with the length of the prediction window, and the decay rate of RHAG is larger than RHGD. Moreover, we provide a fundamental lower bound on the dynamic regret for general online algorithms with a finite prediction window. The lower bound matches the dynamic regret of our RHAG, meaning that the performance can not improve significantly even with more computation. Lastly, we present simulation results to test our algorithms numerically. Receiver Operating Characteristic(ROC Curve) In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is created by plotting the fraction of true positives out of the total actual positives (TPR = true positive rate) vs. the fraction of false positives out of the total actual negatives (FPR = false positive rate), at various threshold settings. TPR is also known as sensitivity or recall in machine learning. The FPR is also known as the fall-out and can be calculated as one minus the more well known specificity. The ROC curve is then the sensitivity as a function of fall-out. In general, if both of the probability distributions for detection and false alarm are known, the ROC curve can be generated by plotting the Cumulative Distribution Function of the detection probability in the y-axis versus the Cumulative Distribution Function of the false alarm probability in x-axis. https://rocr.bioinf.mpi-sb.mpg.de ROCR RecLab Different software tools have been developed with the purpose of performing offline evaluations of recommender systems. However, the results obtained with these tools may be not directly comparable because of subtle differences in the experimental protocols and metrics. Furthermore, it is difficult to analyze in the same experimental conditions several algorithms without disclosing their implementation details. For these reasons, we introduce RecLab, an open source software for evaluating recommender systems in a distributed fashion. By relying on consolidated web protocols, we created RESTful APIs for training and querying recommenders remotely. In this way, it is possible to easily integrate into the same toolkit algorithms realized with different technologies. In details, the experimenter can perform an evaluation by simply visiting a web interface provided by RecLab. The framework will then interact with all the selected recommenders and it will compute and display a comprehensive set of measures, each representing a different metric. The results of all experiments are permanently stored and publicly available in order to support accountability and comparative analyses. ReCode In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to memorize large and complex structures. We introduce ReCode, a method based on subtree retrieval that makes it possible to explicitly reference existing code examples within a neural code generation model. First, we retrieve sentences that are similar to input sentences using a dynamic-programming-based sentence similarity scoring method. Next, we extract n-grams of action sequences that build the associated abstract syntax tree. Finally, we increase the probability of actions that cause the retrieved n-gram action subtree to be in the predicted code. We show that our approach improves the performance on two code generation tasks by up to +2.6 BLEU. RecoGym Recommender Systems are becoming ubiquitous in many settings and take many forms, from product recommendation in e-commerce stores, to query suggestions in search engines, to friend recommendation in social networks. Current research directions which are largely based upon supervised learning from historical data appear to be showing diminishing returns with a lot of practitioners report a discrepancy between improvements in offline metrics for supervised learning and the online performance of the newly proposed models. One possible reason is that we are using the wrong paradigm: when looking at the long-term cycle of collecting historical performance data, creating a new version of the recommendation model, A/B testing it and then rolling it out. We see that there a lot of commonalities with the reinforcement learning (RL) setup, where the agent observes the environment and acts upon it in order to change its state towards better states (states with higher rewards). To this end we introduce RecoGym, an RL environment for recommendation, which is defined by a model of user traffic patterns on e-commerce and the users response to recommendations on the publisher websites. We believe that this is an important step forward for the field of recommendation systems research, that could open up an avenue of collaboration between the recommender systems and reinforcement learning communities and lead to better alignment between offline and online performance metrics. Recombinator-k-Means We present a heuristic algorithm, called recombinator-k-means, that can substantially improve the results of k-means optimization. Instead of using simple independent restarts and returning the best result, our scheme performs restarts in batches, using the results of a previous batch as a reservoir of candidates for the new initial starting values (seeds), exploiting the popular k-means++ seeding algorithm to piece them together into new promising initial configurations. Our scheme is general (it only affects the seeding part of the optimization, thus it could be applied even to k-medians or k-medoids, for example), it has no additional costs and it is trivially parallelizable across the restarts of each batch. In some circumstances, it can systematically find better configurations than the best one obtained after 10^4 restarts of a standard scheme. Our implementation is publicly available at https://…/RecombinatorKMeans.jl. Recommendation Engine of Multilayers(REM) Recommender systems have been widely adopted by electronic commerce and entertainment industries for individualized prediction and recommendation, which benefit consumers and improve business intelligence. In this article, we propose an innovative method, namely the recommendation engine of multilayers (REM), for tensor recommender systems. The proposed method utilizes the structure of a tensor response to integrate information from multiple modes, and creates an additional layer of nested latent factors to accommodate between-subjects dependency. One major advantage is that the proposed method is able to address the ‘cold-start’ issue in the absence of information from new customers, new products or new contexts. Specifically, it provides more effective recommendations through sub-group information. To achieve scalable computation, we develop a new algorithm for the proposed method, which incorporates a maximum block improvement strategy into the cyclic blockwise-coordinate-descent algorithm. In theory, we investigate both algorithmic properties for global and local convergence, along with the asymptotic consistency of estimated parameters. Finally, the proposed method is applied in simulations and IRI marketing data with 116 million observations of product sales. Numerical studies demonstrate that the proposed method outperforms existing competitors in the literature. Recommender System Recommender systems or recommendation systems (sometimes replacing “system” with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that user would give to an item. recosystem Reconciled Polynomial Machine In this paper, we aim at introducing a new machine learning model, namely reconciled polynomial machine, which can provide a unified representation of existing shallow and deep machine learning models. Reconciled polynomial machine predicts the output by computing the inner product of the feature kernel function and variable reconciling function. Analysis of several concrete models, including Linear Models, FM, MVM, Perceptron, MLP and Deep Neural Networks, will be provided in this paper, which can all be reduced to the reconciled polynomial machine representations. Detailed analysis of the learning error by these models will also be illustrated in this paper based on their reduced representations from the function approximation perspective. Reconciliation k-Median We propose a new variant of the k-median problem, where the objective function models not only the cost of assigning data points to cluster representatives, but also a penalty term for disagreement among the representatives. We motivate this novel problem by applications where we are interested in clustering data while avoiding selecting representatives that are too far from each other. For example, we may want to summarize a set of news sources, but avoid selecting ideologically-extreme articles in order to reduce polarization. To solve the proposed k-median formulation we adopt the local-search algorithm of Arya et al. We show that the algorithm provides a provable approximation guarantee, which becomes constant under a mild assumption on the minimum number of points for each cluster. We experimentally evaluate our problem formulation and proposed algorithm on datasets inspired by the motivating applications. In particular, we experiment with data extracted from Twitter, the US Congress voting records, and popular news sources. The results show that our objective can lead to choosing less polarized groups of representatives without significant loss in representation fidelity. Reconfigurable Inverted Index(RII) Existing approximate nearest neighbor search systems suffer from two fundamental problems that are of practical importance but have not received sufficient attention from the research community. First, although existing systems perform well for the whole database, it is difficult to run a search over a subset of the database. Second, there has been no discussion concerning the performance decrement after many items have been newly added to a system. We develop a reconfigurable inverted index (Rii) to resolve these two issues. Based on the standard IVFADC system, we design a data layout such that items are stored linearly. This enables us to efficiently run a subset search by switching the search method to a linear PQ scan if the size of a subset is small. Owing to the linear layout, the data structure can be dynamically adjusted after new items are added, maintaining the fast speed of the system. Extensive comparisons show that Rii achieves a comparable performance with state-of-the art systems such as Faiss. ReCoRD We present a large-scale dataset, ReCoRD, for machine reading comprehension requiring commonsense reasoning. Experiments on this dataset demonstrate that the performance of state-of-the-art MRC systems fall far behind human performance. ReCoRD represents a challenge for future research to bridge the gap between human and machine commonsense reading comprehension. ReCoRD is available at http://…/record. Record Linkage(RL) Record linkage (RL) refers to the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, databases). Record linkage is necessary when joining data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), as may be the case due to differences in record shape, storage location, and/or curator style or preference. A data set that has undergone RL-oriented reconciliation may be referred to as being cross-linked. Record Linkage is called Data Linkage in many jurisdictions, but is the same process. RecSys-DAN Data sparsity and data imbalance are practical and challenging issues in cross-domain recommender systems. This paper addresses those problems by leveraging the concepts which derive from representation learning, adversarial learning and transfer learning (particularly, domain adaptation). Although various transfer learning methods have shown promising performance in this context, our proposed novel method RecSys-DAN focuses on alleviating the cross-domain and within-domain data sparsity and data imbalance and learns transferable latent representations for users, items and their interactions. Different from existing approaches, the proposed method transfers the latent representations from a source domain to a target domain in an adversarial way. The mapping functions in the target domain are learned by playing a min-max game with an adversarial loss, aiming to generate domain indistinguishable representations for a discriminator. Four neural architectural instances of ResSys-DAN are proposed and explored. Empirical results on real-world Amazon data show that, even without using labeled data (i.e., ratings) in the target domain, RecSys-DAN achieves competitive performance as compared to the state-of-the-art supervised methods. More importantly, RecSys-DAN is highly flexible to both unimodal and multimodal scenarios, and thus it is more robust to the cold-start recommendation which is difficult for previous methods. Rectangular Bounding Process(RBP) Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity. Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a new parsimonious partition model — the Rectangular Bounding Process (RBP) — to efficiently partition multi-dimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. Unlike existing approaches, the RBP possesses several attractive theoretical properties that make it a powerful nonparametric partition prior on a hypercube. In particular, the RBP is self-consistent and as such can be directly extended from a finite hypercube to infinite (unbounded) space. We apply the RBP to regression trees and relational models as a flexible partition prior. The experimental results validate the merit of the RBP {in rich yet parsimonious expressiveness} compared to the state-of-the-art methods. Rectified Decision Tree(ReDT) How to obtain a model with good interpretability and performance has always been an important research topic. In this paper, we propose rectified decision trees (ReDT), a knowledge distillation based decision trees rectification with high interpretability, small model size, and empirical soundness. Specifically, we extend the impurity calculation and the pure ending condition of the classical decision tree to propose a decision tree extension that allows the use of soft labels generated by a well-trained teacher model in training and prediction process. It is worth noting that for the acquisition of soft labels, we propose a new multiple cross-validation based method to reduce the effects of randomness and overfitting. These approaches ensure that ReDT retains excellent interpretability and even achieves fewer nodes than the decision tree in the aspect of compression while having relatively good performance. Besides, in contrast to traditional knowledge distillation, back propagation of the student model is not necessarily required in ReDT, which is an attempt of a new knowledge distillation approach. Extensive experiments are conducted, which demonstrates the superiority of ReDT in interpretability, compression, and empirical soundness. Rectified Factor Networks(RFN) We propose rectified factor networks (RFNs) to efficiently construct very sparse, non-linear, high-dimensional representations of the input. RFN models identify rare and small events in the input, have a low interference between code units, have a small reconstruction error, and explain the data covariance structure. RFN learning is a generalized alternating minimization algorithm derived from the posterior regularization method which enforces non-negative and normalized posterior means. We proof convergence and correctness of the RFN learning algorithm. On benchmarks, RFNs are compared to other unsupervised methods like autoencoders, RBMs, factor analysis, ICA, and PCA. In contrast to previous sparse coding methods, RFNs yield sparser codes, capture the data’s covariance structure more precisely, and have a significantly smaller reconstruction error. We test RFNs as pretraining technique for deep networks on different vision datasets, where RFNs were superior to RBMs and autoencoders. On gene expression data from two pharmaceutical drug discovery studies, RFNs detected small and rare gene modules that revealed highly relevant new biological insights which were so far missed by other unsupervised methods. Rectified Linear Unit(ReLU) ➘ “Rectifier” Rectified Local Phase Volume(ReLPV) Traditional 3D Convolutional Neural Networks (CNNs) are computationally expensive, memory intensive, prone to overfit, and most importantly, there is a need to improve their feature learning capabilities. To address these issues, we propose Rectified Local Phase Volume (ReLPV) block, an efficient alternative to the standard 3D convolutional layer. The ReLPV block extracts the phase in a 3D local neighborhood (e.g., 3x3x3) of each position of the input map to obtain the feature maps. The phase is extracted by computing 3D Short Term Fourier Transform (STFT) at multiple fixed low frequency points in the 3D local neighborhood of each position. These feature maps at different frequency points are then linearly combined after passing them through an activation function. The ReLPV block provides significant parameter savings of at least, 3^3 to 13^3 times compared to the standard 3D convolutional layer with the filter sizes 3x3x3 to 13x13x13, respectively. We show that the feature learning capabilities of the ReLPV block are significantly better than the standard 3D convolutional layer. Furthermore, it produces consistently better results across different 3D data representations. We achieve state-of-the-art accuracy on the volumetric ModelNet10 and ModelNet40 datasets while utilizing only 11% parameters of the current state-of-the-art. We also improve the state-of-the-art on the UCF-101 split-1 action recognition dataset by 5.68% (when trained from scratch) while using only 15% of the parameters of the state-of-the-art. The project webpage is available at https://…/home. Rectified Wire Network We introduce a new neural network model, together with a tractable and monotone online learning algorithm. Our model describes feed-forward networks for classification, with one output node for each class. The only nonlinear operation is rectification using a ReLU function with a bias. However, there is a rectifier on every edge rather than at the nodes of the network. There are also weights, but these are positive, static, and associated with the nodes. Our rectified wire networks are able to represent arbitrary Boolean functions. Only the bias parameters, on the edges of the network, are learned. Another departure in our approach, from standard neural networks, is that the loss function is replaced by a constraint. This constraint is simply that the value of the output node associated with the correct class should be zero. Our model has the property that the exact norm-minimizing parameter update, required to correctly classify a training item, is the solution to a quadratic program that can be computed with a few passes through the network. We demonstrate a training algorithm using this update, called sequential deactivation (SDA), on MNIST and some synthetic datasets. Upon adopting a natural choice for the nodal weights, SDA has no hyperparameters other than those describing the network structure. Our experiments explore behavior with respect to network size and depth in a family of sparse expander networks. Rectifier In the context of artificial neural networks, the rectifier is an activation function defined as f(x) = max(0, x) where x is the input to a neuron. This activation function has been argued to be more biologically plausible (cortical neurons are rarely in their maximum saturation regime) than the widely used logistic sigmoid (which is inspired by probability theory; see logistic regression) and its more practical counterpart, the hyperbolic tangent. A unit employing the rectifier is also called a rectified linear unit (ReLU). RecurJac The Jacobian matrix (or the gradient for single-output networks) is directly related to many important properties of neural networks, such as the function landscape, stationary points, (local) Lipschitz constants and robustness to adversarial attacks. In this paper, we propose a recursive algorithm, RecurJac, to compute both upper and lower bounds for each element in the Jacobian matrix of a neural network with respect to network’s input, and the network can contain a wide range of activation functions. As a byproduct, we can efficiently obtain a (local) Lipschitz constant, which plays a crucial role in neural network robustness verification, as well as the training stability of GANs. Experiments show that (local) Lipschitz constants produced by our method is of better quality than previous approaches, thus providing better robustness verification results. Our algorithm has polynomial time complexity, and its computation time is reasonable even for relatively large networks. Additionally, we use our bounds of Jacobian matrix to characterize the landscape of the neural network, for example, to determine whether there exist stationary points in a local neighborhood. Source code available at https://…/RecurJac-Jacobian-Bounds. Recurrence Plot(RP) In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for a given moment in time, the times at which a phase space trajectory visits roughly the same area in the phase space. Recurrence Quantification Analysis(RQA) Recurrence quantification analysis (RQA) is a method of nonlinear data analysis (cf. chaos theory) for the investigation of dynamical systems. It quantifies the number and duration of recurrences of a dynamical system presented by its phase space trajectory. Dynamic Natural Language Processing with Recurrence Quantification Analysis Recurrent Additive Networks(RAN) We introduce recurrent additive networks (RANs), a new gated RNN which is distinguished by the use of purely additive latent state updates. At every time step, the new state is computed as a gated component-wise sum of the input and the previous state, without any of the non-linearities commonly used in RNN transition dynamics. We formally show that RAN states are weighted sums of the input vectors, and that the gates only contribute to computing the weights of these sums. Despite this relatively simple functional form, experiments demonstrate that RANs outperform both LSTMs and GRUs on benchmark language modeling problems. This result shows that many of the non-linear computations in LSTMs and related networks are not essential, at least for the problems we consider, and suggests that the gates are doing more of the computational work than previously understood. Recurrent Attention Unit(RAU) Recurrent Neural Network (RNN) has been successfully applied in many sequence learning problems. Such as handwriting recognition, image description, natural language processing and video motion analysis. After years of development, researchers have improved the internal structure of the RNN and introduced many variants. Among others, Gated Recurrent Unit (GRU) is one of the most widely used RNN model. However, GRU lacks the capability of adaptively paying attention to certain regions or locations, so that it may cause information redundancy or loss during leaning. In this paper, we propose a RNN model, called Recurrent Attention Unit (RAU), which seamlessly integrates the attention mechanism into the interior of GRU by adding an attention gate. The attention gate can enhance GRU’s ability to remember long-term memory and help memory cells quickly discard unimportant content. RAU is capable of extracting information from the sequential data by adaptively selecting a sequence of regions or locations and pay more attention to the selected regions during learning. Extensive experiments on image classification, sentiment classification and language modeling show that RAU consistently outperforms GRU and other baseline methods. Recurrent Attentive and Intensive Model(RAIM) With the improvement of medical data capturing, vast amount of continuous patient monitoring data, e.g., electrocardiogram (ECG), real-time vital signs and medications, become available for clinical decision support at intensive care units (ICUs). However, it becomes increasingly challenging to model such data, due to high density of the monitoring data, heterogeneous data types and the requirement for interpretable models. Integration of these high-density monitoring data with the discrete clinical events (including diagnosis, medications, labs) is challenging but potentially rewarding since richness and granularity in such multimodal data increase the possibilities for accurate detection of complex problems and predicting outcomes (e.g., length of stay and mortality). We propose Recurrent Attentive and Intensive Model (RAIM) for jointly analyzing continuous monitoring data and discrete clinical events. RAIM introduces an efficient attention mechanism for continuous monitoring data (e.g., ECG), which is guided by discrete clinical events (e.g, medication usage). We apply RAIM in predicting physiological decompensation and length of stay in those critically ill patients at ICU. With evaluations on MIMIC- III Waveform Database Matched Subset, we obtain an AUC-ROC score of 90.18% for predicting decompensation and an accuracy of 86.82% for forecasting length of stay with our final model, which outperforms our six baseline models. Recurrent Collective Classification(RCC) We propose a new method for training iterative collective classifiers for labeling nodes in network data. The iterative classification algorithm (ICA) is a canonical method for incorporating relational information into classification. Yet, existing methods for training ICA models rely on the assumption that relational features reflect the true labels of the nodes. This unrealistic assumption introduces a bias that is inconsistent with the actual prediction algorithm. In this paper, we introduce recurrent collective classification (RCC), a variant of ICA analogous to recurrent neural network prediction. RCC accommodates any differentiable local classifier and relational feature functions. We provide gradient-based strategies for optimizing over model parameters to more directly minimize the loss function. In our experiments, this direct loss minimization translates to improved accuracy and robustness on real network data. We demonstrate the robustness of RCC in settings where local classification is very noisy, settings that are particularly challenging for ICA. Recurrent Control Net(RCN) Central Pattern Generators (CPGs) are biological neural circuits capable of producing coordinated rhythmic outputs in the absence of rhythmic input. As a result, they are responsible for most rhythmic motion in living organisms. This rhythmic control is broadly applicable to fields such as locomotive robotics and medical devices. In this paper, we explore the possibility of creating a self-sustaining CPG network for reinforcement learning that learns rhythmic motion more efficiently and across more general environments than the current multilayer perceptron (MLP) baseline models. Recent work introduces the Structured Control Net (SCN), which maintains linear and nonlinear modules for local and global control, respectively. Here, we show that time-sequence architectures such as Recurrent Neural Networks (RNNs) model CPGs effectively. Combining previous work with RNNs and SCNs, we introduce the Recurrent Control Net (RCN), which adds a linear component to the, RCNs match and exceed the performance of baseline MLPs and SCNs across all environment tasks. Our findings confirm existing intuitions for RNNs on reinforcement learning tasks, and demonstrate promise of SCN-like structures in reinforcement learning. Recurrent Convolutional Network(RCN) Recently, three dimensional (3D) convolutional neural networks (CNNs) have emerged as dominant methods to capture spatiotemporal representations, by adding to pre-existing 2D CNNs a third, temporal dimension. Such 3D CNNs, however, are anti-causal (i.e., they exploit information from both the past and the future to produce feature representations, thus preventing their use in online settings), constrain the temporal reasoning horizon to the size of the temporal convolution kernel, and are not temporal resolution-preserving for video sequence-to-sequence modelling, as, e.g., in spatiotemporal action detection. To address these serious limitations, we present a new architecture for the causal/online spatiotemporal representation of videos. Namely, we propose a recurrent convolutional network (RCN), which relies on recurrence to capture the temporal context across frames at every level of network depth. Our network decomposes 3D convolutions into (1) a 2D spatial convolution component, and (2) an additional hidden state $1\times 1$ convolution applied across time. The hidden state at any time $t$ is assumed to depend on the hidden state at $t-1$ and on the current output of the spatial convolution component. As a result, the proposed network: (i) provides flexible temporal reasoning, (ii) produces causal outputs, and (iii) preserves temporal resolution. Our experiments on the large-scale large ‘Kinetics’ dataset show that the proposed method achieves superior performance compared to 3D CNNs, while being causal and using fewer parameters. Recurrent Distribution Regression Network(RDRN) While deep neural networks have achieved groundbreaking prediction results in many tasks, there is a class of data where existing architectures are not optimal — sequences of probability distributions. Performing forward prediction on sequences of distributions has many important applications. However, there are two main challenges in designing a network model for this task. First, neural networks are unable to encode distributions compactly as each node encodes just a real value. A recent work of Distribution Regression Network (DRN) solved this problem with a novel network that encodes an entire distribution in a single node, resulting in improved accuracies while using much fewer parameters than neural networks. However, despite its compact distribution representation, DRN does not address the second challenge, which is the need to model time dependencies in a sequence of distributions. In this paper, we propose our Recurrent Distribution Regression Network (RDRN) which adopts a recurrent architecture for DRN. The combination of compact distribution representation and shared weights architecture across time steps makes RDRN suitable for modeling the time dependencies in a distribution sequence. Compared to neural networks and DRN, RDRN achieves the best prediction performance while keeping the network compact. Recurrent Embedding Dialogue Policy(REDP) Machine-learning based dialogue managers are able to learn complex behaviors in order to complete a task, but it is not straightforward to extend their capabilities to new domains. We investigate different policies’ ability to handle uncooperative user behavior, and how well expertise in completing one task (such as restaurant reservations) can be reapplied when learning a new one (e.g. booking a hotel). We introduce the Recurrent Embedding Dialogue Policy (REDP), which embeds system actions and dialogue states in the same vector space. REDP contains a memory component and attention mechanism based on a modified Neural Turing Machine, and significantly outperforms a baseline LSTM classifier on this task. We also show that both our architecture and baseline solve the bAbI dialogue task, achieving 100% test accuracy. Recurrent Entity Network(EntNet) We introduce a new model, the Recurrent Entity Network (EntNet). It is equipped with a dynamic long-term memory which allows it to maintain and update a representation of the state of the world as it receives new data. For language understanding tasks, it can reason on-the-fly as it reads text, not just when it is required to answer a question or respond as is the case for a Memory Network (Sukhbaatar et al., 2015). Like a Neural Turing Machine or Differentiable Neural Computer (Graves et al., 2014; 2016) it maintains a fixed size memory and can learn to perform location and content-based read and write operations. However, unlike those models it has a simple parallel architecture in which several memory locations can be updated simultaneously. The EntNet sets a new state-of-the-art on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting. We also demonstrate that it can solve a reasoning task which requires a large number of supporting facts, which other methods are not able to solve, and can generalize past its training horizon. It can also be practically used on large scale datasets such as Children’s Book Test, where it obtains competitive performance, reading the story in a single pass. Recurrent Event Network Recently, there has been a surge of interest in learning representation of graph-structured data that are dynamically evolving. However, current dynamic graph learning methods lack a principled way in modeling temporal, multi-relational, and concurrent interactions between nodes—a limitation that is especially problematic for the task of temporal knowledge graph reasoning, where the goal is to predict unseen entity relationships (i.e., events) over time. Here we present Recurrent Event Network (\method)—an architecture for modeling complex event sequences—which consists of a recurrent event encoder and a neighborhood aggregator. The event encoder employs a RNN to capture (subject, relation)-specific patterns from historical entity interactions; while the neighborhood aggregator summarizes concurrent interactions within each time stamp. An output layer is designed for predicting forthcoming, multi-relational events. Experiments on temporal link prediction over two knowledge graph datasets demonstrate the effectiveness of our method, especially on multi-step inference over time. Recurrent Gaussian Processes(RGP) We define Recurrent Gaussian Processes (RGP) models, a general family of Bayesian nonparametric models with recurrent GP priors which are able to learn dynamical patterns from sequential data. Similar to Recurrent Neural Networks (RNNs), RGPs can have different formulations for their internal states, distinct inference methods and be extended with deep structures. In such context, we propose a novel deep RGP model whose autoregressive states are latent, thereby performing representation and dynamical learning simultaneously. To fully exploit the Bayesian nature of the RGP model we develop the Recurrent Variational Bayes (REVARB) framework, which enables efficient inference and strong regularization through coherent propagation of uncertainty across the RGP layers and states. We also introduce a RGP extension where variational parameters are greatly reduced by being reparametrized through RNN-based sequential recognition models. We apply our model to the tasks of nonlinear system identification and human motion modeling. The promising obtained results indicate that our RGP model maintains its highly flexibility while being able to avoid overfitting and being applicable even when larger datasets are not available. Recurrent Graph Neural Network In this paper, we study the problem of node representation learning with graph neural networks. We present a graph neural network class named recurrent graph neural network (RGNN), that address the shortcomings of prior methods. By using recurrent units to capture the long-term dependency across layers, our methods can successfully identify important information during recursive neighborhood expansion. In our experiments, we show that our model class achieves state-of-the-art results on three benchmarks: the Pubmed, Reddit, and PPI network datasets. Our in-depth analyses also demonstrate that incorporating recurrent units is a simple yet effective method to prevent noisy information in graphs, which enables a deeper graph neural network. Recurrent Iterative Gating Network(RIGNet) In this paper, we present an approach for Recurrent Iterative Gating called RIGNet. The core elements of RIGNet involve recurrent connections that control the flow of information in neural networks in a top-down manner, and different variants on the core structure are considered. The iterative nature of this mechanism allows for gating to spread in both spatial extent and feature space. This is revealed to be a powerful mechanism with broad compatibility with common existing networks. Analysis shows how gating interacts with different network characteristics, and we also show that more shallow networks with gating may be made to perform better than much deeper networks that do not include RIGNet modules. Recurrent Kalman Network(RKN) In order to integrate uncertainty estimates into deep time-series modelling, Kalman Filters (KFs) (Kalman et al., 1960) have been integrated with deep learning models, however, such approaches typically rely on approximate inference techniques such as variational inference which makes learning more complex and often less scalable due to approximation errors. We propose a new deep approach to Kalman filtering which can be learned directly in an end-to-end manner using backpropagation without additional approximations. Our approach uses a high-dimensional factorized latent state representation for which the Kalman updates simplify to scalar operations and thus avoids hard to backpropagate, computationally heavy and potentially unstable matrix inversions. Moreover, we use locally linear dynamic models to efficiently propagate the latent state to the next time step. The resulting network architecture, which we call Recurrent Kalman Network (RKN), can be used for any time-series data, similar to a LSTM (Hochreiter & Schmidhuber, 1997) but uses an explicit representation of uncertainty. As shown by our experiments, the RKN obtains much more accurate uncertainty estimates than an LSTM or Gated Recurrent Units (GRUs) (Cho et al., 2014) while also showing a slightly improved prediction performance and outperforms various recent generative models on an image imputation task. Recurrent Knowledge Distillation Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the student network even further by recasting multiple residual layers in the teacher network into a single recurrent student layer. We propose three variants of adding recurrent connections into the student network, and show experimentally on CIFAR-10, Scenes and MiniPlaces, that we can reduce the number of parameters at little loss in accuracy. Recurrent Ladder Network In this paper we address the problem of electing a committee among a set of $m$ candidates and on the basis of the preferences of a set of $n$ voters. We consider the approval voting method in which each voter can approve as many candidates as she/he likes by expressing a preference profile (boolean $m$-vector). In order to elect a committee, a voting rule must be established to transform’ the $n$ voters’ profiles into a winning committee. The problem is widely studied in voting theory; for a variety of voting rules the problem was shown to be computationally difficult and approximation algorithms and heuristic techniques were proposed in the literature. In this paper we follow an Ordered Weighted Averaging approach and study the $k$-sum approval voting (optimization) problem in the general case \$1 \leq k