R Packages

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
|R Packages| = 6634


A3 Accurate, Adaptable, and Accessible Error Metrics for Predictive Models
Supplies tools for tabulating and analyzing the results of predictive models. The methods employed are applicable to virtually any predictive model and make comparisons between different methodologies straightforward.
abc Tools for Approximate Bayesian Computation (ABC)
Implements several ABC algorithms for performing parameter estimation, model selection, and goodness-of-fit. Cross-validation tools are also available for measuring the accuracy of ABC estimates, and to calculate the misclassification probabilities of different models. Data Only: Tools for Approximate Bayesian Computation (ABC)
Contains data which are used by functions of the ‘abc’ package.
ABCanalysis Computed ABC Analysis
For a given data set, the package provides a novel method of computing precise limits to acquire subsets which are easily interpreted. Closely related to the Lorenz curve, the ABC curve visualizes the data by graphically representing the cumulative distribution function. Based on an ABC analysis the algorithm calculates, with the help of the ABC curve , the optimal limits by exploiting the mathematical properties pertaining to distribution of analyzed items. The data containing positive values is divided into three disjoint subsets A, B and C, with subset A comprising very profitable values, i.e. largest data values (“the important few”) subset B comprising values where the profit equals to the effort required to obtain it, and the subset C comprising of non-profitable values, i.e., the smallest data sets (“the trivial many”).
abcrf Approximate Bayesian Computation via Random Forests
Performs Approximate Bayesian Computation (ABC) model choice via random forests.
abctools Tools for ABC Analyses
Tools for approximate Bayesian computation including summary statistic selection and assessing coverage.
An R Package for Tuning Approximate Bayesian Computation Analyses
abe Augmented Backward Elimination
Performs augmented backward elimination and checks the stability of the obtained model. Augmented backward elimination combines significance or information based criteria with the change in estimate to either select the optimal model for prediction purposes or to serve as a tool to obtain a practically sound, highly interpretable model. More details can be found in Dunkler et al. (2014) <doi:10.1371/journal.pone.0113677>.
abnormality Measure a Subject’s Abnormality with Respect to a Reference Population
Contains the functions to implement the methodology and considerations laid out by Marks et al. in the manuscript Measuring Abnormality in High Dimensional Spaces: Applications in Biomechanical Gait Analysis. As of 2/27/2018 this paper has been submitted and is under scientific review. Using high-dimensional datasets to measure a subject’s overall level of abnormality as compared to a reference population is often needed in outcomes research. Utilizing applications in instrumented gait analysis, that article demonstrates how using data that is inherently non-independent to measure overall abnormality may bias results. A methodology is introduced to address this bias to accurately measure overall abnormality in high dimensional spaces. While this methodology is in line with previous literature, it differs in two major ways. Advantageously, it can be applied to datasets in which the number of observations is less than the number of features/variables, and it can be abstracted to practically any number of domains or dimensions. After applying the proposed methodology to the original data, the researcher is left with a set of uncorrelated variables (i.e. principal components) with which overall abnormality can be measured without bias. Different considerations are discussed in that article in deciding the appropriate number of principal components to keep and the aggregate distance measure to utilize.
abodOutlier Angle-Based Outlier Detection
Performs angle-based outlier detection on a given dataframe. Three methods are available, a full but slow implementation using all the data that has cubic complexity, a fully randomized one which is way more efficient and another using k-nearest neighbours. These algorithms are specially well suited for high dimensional data outlier detection.
abstractr An R-Shiny Application for Creating Visual Abstracts
An R-Shiny application to create visual abstracts for original research. A variety of user defined options and formatting are included.
abtest Bayesian A/B Testing
Provides functions for Bayesian A/B testing including prior elicitation options based on Kass and Vaidyanathan (1992) <doi:10.1111/j.2517-6161.1992.tb01868.x>.
Ac3net Inferring Directional Conservative Causal Core Gene Networks
Infers directional conservative causal core (gene) networks. It is an advanced version of the algorithm C3NET by providing directional network. Gokmen Altay (2018) <doi:10.1101/271031>, bioRxiv.
ACA Abrupt Change-Point or Aberration Detection in Point Series
Offers an interactive function for the detection of breakpoints in series.
accelmissing Missing Value Imputation for Accelerometer Data
Imputation for the missing count values in accelerometer data. The methodology includes both parametric and semi-parametric multiple imputations under the zero-inflated Poisson lognormal model. This package also provides multiple functions to pre-process the accelerometer data previous to the missing data imputation. These includes detecting wearing and non-wearing time, selecting valid days and subjects, and creating plots.
accSDA Accelerated Sparse Discriminant Analysis
Implementation of sparse linear discriminant analysis, which is a supervised classification method for multiple classes. Various novel optimization approaches to this problem are implemented including alternating direction method of multipliers (ADMM), proximal gradient (PG) and accelerated proximal gradient (APG) (See Atkins et al. <arXiv:1705.07194>). Functions for performing cross validation are also supplied along with basic prediction and plotting functions. Sparse zero variance discriminant analysis (SZVD) is also included in the package (See Ames and Hong, <arXiv:1401.5492>). See the github wiki for a more extended description.
ACDm Tools for Autoregressive Conditional Duration Models
Package for Autoregressive Conditional Duration (ACD, Engle and Russell, 1998) models. Creates trade, price or volume durations from transactions (tic) data, performs diurnal adjustments, fits various ACD models and tests them.
Acinonyx High-Performance interactive graphics system iPlots eXtreme
Acinonyx (genus of cheetah – for its speed) is the codename for the next generation of a high-performance interactive graphics system iPlots eXtreme. It is a continuation of the iPlots project, allowing visualization and exploratory analysis of large data. Due to its highly flexible design and focus on speed optimization, it can also be used as a general graphics system (e.g. it is the fastest R graphics device if you have a good GPU) and an interactive toolkit. It is a complete re-write of iPlots from scratch, taking the best from iPlots design and focusing on speed and flexibility. The main focus compared to the previous iPlots project is on: • speed and scalability to support large data (it uses OpenGL, optimized native code and object sharing to allow visualization of millions of datapoints). • enhanced support for adding statistical models to plots with full interactivity • seamless integration in GUIs (Windows and Mac OS X)
ACMEeqtl Estimation of Interpretable eQTL Effect Sizes Using a Log of Linear Model
We use a non-linear model, termed ACME, that reflects a parsimonious biological model for allelic contributions of cis-acting eQTLs. With non-linear least-squares algorithm we estimate maximum likelihood parameters. The ACME model provides interpretable effect size estimates and p-values with well controlled Type-I error. Includes both R and (much faster) C implementations. For more details see Palowitch et al. (2017) <doi:10.1111/biom.12810>.
AcousticNDLCodeR Coding Sound Files for Use with NDL
Make acoustic cues to use with the R packages ‘ndl’ or ‘ndl2’. The package implements functions used in the PLOS ONE paper: Denis Arnold, Fabian Tomaschek, Konstantin Sering, Florence Lopez, and R. Harald Baayen (accepted). Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PLOS ONE More details can be found in the paper and the supplement. ‘ndl’ is available on CRAN. ‘ndl2’ is available by request from <>.
acp Autoregressive Conditional Poisson
Time series analysis of count data
AcrossTic A Cost-Minimal Regular Spanning Subgraph with TreeClust
Construct minimum-cost regular spanning subgraph as part of a non-parametric two-sample test for equality of distribution.
acrt Autocorrelation Robust Testing
Functions for testing affine hypotheses on the regression coefficient vector in regression models with autocorrelated errors.
AdapEnetClass A Class of Adaptive Elastic Net Methods for Censored Data
Provides new approaches to variable selection for AFT model.
adapr Implementation of an Accountable Data Analysis Process
Tracks reading and writing within R scripts that are organized into a directed acyclic graph. Contains an interactive shiny application adaprApp(). Uses Git and file hashes to track version histories of input and output.
AdapSamp Adaptive Sampling Algorithms
For distributions whose probability density functions are log-concave, Adaptive Rejection Sampling (ARS) algorithm can be used to build for sampling by Gilks W R, Wild P (1992) <doi:10.2307/2347565>. For others, we can use Modifed Adaptive Rejection Sampling (MARS) algorithm by Martino L, Míguez J (2011) <doi:10.1007/s11222-010-9197-9>, Concave-Convex Adaptive Rejection (CCARS) Sampling algorithm by Görür D, Teh Y W (2011) <doi:10.1198/jcgs.2011.09058> and Adaptive Slice Sampling (ASS) algorithm by Radford M. Neal (2003) <doi:10.1214/aos/1056562461>. So we designed an R package including mainly 4 functions: rARS(), rMARS(), rCCARS(), and rASS(). These functions can realize sampling based on the algorithms above.
adaptalint Check Code Style Painlessly
Infer the code style (which style rules are followed and which ones are not) from one package and use it to check another. This makes it easier to find and correct the most important problems first.
adaptDA Adaptive Mixture Discriminant Analysis
The adaptive mixture discriminant analysis (AMDA) allows to adapt a model-based classifier to the situation where a class represented in the test set may have not been encountered earlier in the learning phase.
AdaptGauss Gaussian Mixture Models (GMM)
Multimodal distributions can be modelled as a mixture of components. The model is derived using the Pareto Density Estimation (PDE) for an estimation of the pdf. PDE has been designed in particular to identify groups/classes in a dataset. Precise limits for the classes can be calculated using the theorem of Bayes. Verification of the model is possible by QQ plot and Chi-squared test.
adaptiveGPCA Adaptive Generalized PCA
Implements adaptive gPCA, as described in: Fukuyama, J. (2017) <arXiv:1702.00501>. The package also includes functionality for applying the method to ‘phyloseq’ objects so that the method can be easily applied to microbiome data and a ‘shiny’ app for interactive visualization.
AdaptiveSparsity Adaptive Sparsity Models
Implements Figueiredo EM algorithm for adaptive sparsity (Jeffreys prior) (see Figueiredo, M.A.T.; , ‘Adaptive sparseness for supervised learning,’ Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.25, no.9, pp. 1150- 1159, Sept. 2003) and Wong algorithm for adaptively sparse gaussian geometric models (see Wong, Eleanor, Suyash Awate, and P. Thomas Fletcher. ‘Adaptive Sparsity in Gaussian Graphical Models.’ In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 311-319. 2013.)
adaptMT Adaptive P-Value Thresholding for Multiple Hypothesis Testing with Side Information
Implementation of adaptive p-value thresholding (AdaPT), including both a framework that allows the user to specify any algorithm to learn local false discovery rate and a pool of convenient functions that implement specific algorithms. See Lei, Lihua and Fithian, William (2016) <arXiv:1609.06035>.
AdaSampling Adaptive Sampling for Positive Unlabeled and Label Noise Learning
Implements the adaptive sampling procedure, a framework for both positive unlabeled learning and learning with class label noise. Yang, P., Ormerod, J., Liu, W., Ma, C., Zomaya, A., Yang, J. (2018) <doi:10.1109/TCYB.2018.2816984>.
ADCT Adaptive Design in Clinical Trials
Existing adaptive design methods in clinical trials. The package includes power, stopping boundaries (sample size) calculation functions for two-group group sequential designs, adaptive design with coprimary endpoints, biomarker-informed adaptive design, etc.
addhaz Binomial and Multinomial Additive Hazards Models
Functions to fit the binomial and multinomial additive hazards models and to calculate the contribution of diseases/conditions to the disability prevalence, as proposed by Nusselder and Looman (2004) <DOI:10.1353/dem.2004.0017>.
addhazard Fit Additive Hazards Models for Survival Analysis
Contains tools to fit additive hazards model to random sampling, two-phase sampling and two-phase sampling with auxiliary information. This package provides regression parameter estimates and their model-based and robust standard errors. It also offers tools to make prediction of individual specific hazards.
additiveDEA Additive Data Envelopment Analysis Models
Provides functions for calculating efficiency with two types of additive Data Envelopment Analysis models: (i) Generalized Efficiency Measures: unweighted additive model (Cooper et al., 2007 <doi:10.1007/978-0-387-45283-8>), Range Adjusted Measure (Cooper et al., 1999, <doi:10.1023/A:1007701304281>), Bounded Adjusted Measure (Cooper et al., 2011 <doi:10.1007/s11123-010-0190-2>), Measure of Inefficiency Proportions (Cooper et al., 1999 <doi:10.1023/A:1007701304281>), and the Lovell-Pastor Measure (Lovell and Pastor, 1995 <doi:10.1016/0167-6377(95)00044-5>); and (ii) the Slacks-Based Measure (Tone, 2001 <doi:10.1016/S0377-2217(99)00407-5>). The functions provide several options: (i) constant and variable returns to scale; (ii) fixed (non-controllable) inputs and/or outputs; (iii) bounding the slacks so that unrealistically large slack values are avoided; and (iv) calculating the efficiency of specific Decision-Making Units (DMUs), rather than of the whole sample. Package additiveDEA also provides a function for reducing computation time when datasets are large.
ADDT A Package for Analysis of Accelerated Destructive Degradation Test Data
Accelerated destructive degradation tests (ADDT) are often used to collect necessary data for assessing the long-term properties of polymeric materials. Based on the collected data, a thermal index (TI) is estimated. The TI can be useful for material rating and comparison. This package performs the least squares (LS) and maximum likelihood (ML) procedures for estimating TI for polymeric materials. The LS approach is a two-step approach that is currently used in industrial standards, while the ML procedure is widely used in the statistical literature. The ML approach allows one to do statistical inference such as quantifying uncertainties in estimation, hypothesis testing, and predictions. Two publicly available datasets are provided to allow users to experiment and practice with the functions.
adeba Adaptive Density Estimation by Bayesian Averaging
Univariate and multivariate non-parametric kernel density estimation with adaptive bandwidth using a Bayesian approach to Abramson’s square root law.
adegraphics An S4 Lattice-Based Package for the Representation of Multivariate Data
Graphical functionalities for the representation of multivariate data. It is a complete re-implementation of the functions available in the ‘ade4’ package.
adepro A Shiny Application for the (Audio-)Visualization of Adverse Event Profiles
The name of this package is an abbreviation for Animation of Adverse Event Profiles and refers to a shiny application which (audio-)visualizes adverse events occurring in clinical trials. As this data is usually considered sensitive, this tool is provided as a stand-alone application that can be launched from any local machine on which the data is stored.
adespatial Multivariate Multiscale Spatial Analysis
Tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran’s Eigenvectors Maps, MEM).
adjclust Adjacency-Constrained Clustering of a Block-Diagonal Similarity Matrix
Implements a constrained version of hierarchical agglomerative clustering, in which each observation is associated to a position, and only adjacent clusters can be merged. Typical application fields in bioinformatics include Genome-Wide Association Studies or Hi-C data analysis, where the similarity between items is a decreasing function of their genomic distance. Taking advantage of this feature, the implemented algorithm is time and memory efficient. This algorithm is described in Chapter 4 of Alia Dehman (2015) <https://…/tel-01288568v1>.
adjustedcranlogs Remove Automated and Repeated Downloads from ‘RStudio’ ‘CRAN’ Download Logs
Adjusts output of ‘cranlogs’ package to account for ‘CRAN’-wide daily automated downloads and re-downloads caused by package updates.
admisc Adrian Dusa’s Miscellaneous
Contains functions used across packages ‘QCA’, ‘DDIwR’, and ‘venn’. Interprets and translates DNF – Disjunctive Normal Form expressions, for both binary and multi-value crisp sets, and extracts information (set names, set values) from those expressions. Other functions perform various other checks if possibly numeric (even if all numbers reside in a character vector) and coerce to numeric, or check if the numbers are whole. It also offers, among many others, a highly flexible recoding function.
ADMM Algorithms using Alternating Direction Method of Multipliers
Provides algorithms to solve popular optimization problems in statistics such as regression or denoising based on Alternating Direction Method of Multipliers (ADMM). See Boyd et al (2010) <doi:10.1561/2200000016> for complete introduction to the method.
ADMMnet Regularized Model with Selecting the Number of Non-Zeros
Fit linear and cox models regularized with net (L1 and Laplacian), elastic-net (L1 and L2) or lasso (L1) penalty, and their adaptive forms, such as adaptive lasso and net adjusting for signs of linked coefficients. In addition, it treats the number of non-zero coefficients as another tuning parameter and simultaneously selects with the regularization parameter. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients.
ADMMsigma Penalized Precision Matrix Estimation via ADMM
Estimates a penalized precision matrix via the alternating direction method of multipliers (ADMM) algorithm. It currently supports a general elastic-net penalty that allows for both ridge and lasso-type penalties as special cases. This package is an alternative to the ‘glasso’ package. See Boyd et al (2010) <doi:10.1561/2200000016> for details regarding the estimation method.
adnuts No-U-Turn MCMC Sampling for ‘ADMB’ and ‘TMB’ Models
Bayesian inference using the no-U-turn (NUTS) algorithm by Hoffman and Gelman (2014) <http://…/hoffman14a.html>. Designed for ‘AD Model Builder’ (‘ADMB’) models, or when R functions for log-density and log-density gradient are available, such as ‘Template Model Builder’ (‘TMB’) models and other special cases. Functionality is similar to ‘Stan’, and the ‘rstan’ and ‘shinystan’ packages are used for diagnostics and inference.
adoption Modelling Adoption Process in Marketing
The classical Bass (1969) <doi:10.1287/mnsc.15.5.215> model and the agent based models, such as that by Goldenberg, Libai and Muller (2010) <doi:10.1016/j.ijresmar.2009.06.006> have been two different approaches to model adoption processes in marketing. These two approaches can be unified by explicitly modelling the utility functions. This package provides a GUI that allows, in a unified way, the modelling of these two processes and other processes.
adoptr Adaptive Optimal Two-Stage Designs in R
Optimize one or two-arm, two-stage designs for clinical trials with respect to several pre-implemented objective criteria or implement custom objectives. Optimization under uncertainty and conditional (given stage-one outcome) constraints are supported.
ADPclust Fast Clustering Using Adaptive Density Peak Detection
An implementation of ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon Rodriguez and Laio[2014]’s idea. ADPclust clusters data by finding density peaks in a density-distance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes an user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional ‘density-distance plot’.
ADPF Use Least Squares Polynomial Regression and Statistical Testing to Improve Savitzky-Golay
This function takes a vector or matrix of data and smooths the data with an improved Savitzky Golay transform. The Savitzky-Golay method for data smoothing and differentiation calculates convolution weights using Gram polynomials that exactly reproduce the results of least-squares polynomial regression. Use of the Savitzky-Golay method requires specification of both filter length and polynomial degree to calculate convolution weights. For maximum smoothing of statistical noise in data, polynomials with low degrees are desirable, while a high polynomial degree is necessary for accurate reproduction of peaks in the data. Extension of the least-squares regression formalism with statistical testing of additional terms of polynomial degree to a heuristically chosen minimum for each data window leads to an adaptive-degree polynomial filter (ADPF). Based on noise reduction for data that consist of pure noise and on signal reproduction for data that is purely signal, ADPF performed nearly as well as the optimally chosen fixed-degree Savitzky-Golay filter and outperformed sub-optimally chosen Savitzky-Golay filters. For synthetic data consisting of noise and signal, ADPF outperformed both optimally chosen and sub-optimally chosen fixed-degree Savitzky-Golay filters. See Barak, P. (1995) <doi:10.1021/ac00113a006> for more information.
adpss Design and Analysis of Locally or Globally Efficient Adaptive Designs
Provides the functions for planning and conducting a clinical trial with adaptive sample size determination. Maximal statistical efficiency will be exploited even when dramatic or multiple adaptations are made. Such a trial consists of adaptive determination of sample size at an interim analysis and implementation of frequentist statistical test at the interim and final analysis with a prefixed significance level. The required assumptions for the stage-wise test statistics are independent and stationary increments and normality. Predetermination of adaptation rule is not required.
advclust Object Oriented Advanced Clustering
S4 Object Oriented for Advanced Fuzzy Clustering and Fuzzy COnsensus Clustering. Techniques that provided by this package are Fuzzy C-Means, Gustafson Kessel (Babuska Version), Gath-Geva, Sum Voting Consensus, Product Voting Consensus, and Borda Voting Consensus. This package also provide visualization via Biplot and Radar Plot.
AEDForecasting Change Point Analysis in ARIMA Forecasting
Package to incorporate change point analysis in ARIMA forecasting.
afc Generalized Discrimination Score
This is an implementation of the Generalized Discrimination Score (also known as Two Alternatives Forced Choice Score, 2AFC) for various representations of forecasts and verifying observations. The Generalized Discrimination Score is a generic forecast verification framework which can be applied to any of the following verification contexts: dichotomous, polychotomous (ordinal and nominal), continuous, probabilistic, and ensemble. A comprehensive description of the Generalized Discrimination Score, including all equations used in this package, is provided by Mason and Weigel (2009) <doi:10.1175/MWR-D-10-05069.1>.
afCEC Active Function Cross-Entropy Clustering
Active function cross-entropy clustering partitions the n-dimensional data into the clusters by finding the parameters of the mixed generalized multivariate normal distribution, that optimally approximates the scattering of the data in the n-dimensional space, whose density function is of the form: p_1*N(mi_1,^sigma_1,sigma_1,f_1)+…+p_k*N(mi_k,^sigma_k,sigma_k,f_k). The above-mentioned generalization is performed by introducing so called ‘f-adapted Gaussian densities’ (i.e. the ordinary Gaussian densities adapted by the ‘active function’). Additionally, the active function cross-entropy clustering performs the automatic reduction of the unnecessary clusters. For more information please refer to P. Spurek, J. Tabor, K.Byrski, ‘Active function Cross-Entropy Clustering’ (2017) <doi:10.1016/j.eswa.2016.12.011>.
afex Analysis of Factorial Experiments
Convenience functions for analyzing factorial experiments using ANOVA or mixed models. aov_ez(), aov_car(), and aov_4() allow specification of between, within (i.e., repeated-measures), or mixed between-within (i.e., split-plot) ANOVAs for data in long format (i.e., one observation per row), aggregating multiple observations per individual and cell of the design. mixed() fits mixed models using lme4::lmer() and computes p-values for all fixed effects using either Kenward-Roger or Satterthwaite approximation for degrees of freedom (LMM only), parametric bootstrap (LMMs and GLMMs), or likelihood ratio tests (LMMs and GLMMs). afex uses type 3 sums of squares as default (imitating commercial statistical software).
affluenceIndex Affluence Indices
Computes the statistical indices of affluence (richness) and constructs bootstrap confidence intervals for these indices. Also computes the Wolfson polarization index.
AFheritability The Attributable Fraction (AF) Described as a Function of Disease Heritability, Prevalence and Intervention Specific Factors
The AFfunction() is a function which returns an estimate of the Attributable Fraction (AF) and a plot of the AF as a function of heritability, disease prevalence, size of target group and intervention effect. Since the AF is a function of several factors, a shiny app is used to better illustrate how the relationship between the AF and heritability depends on several other factors. The app is ran by the function runShinyApp(). For more information see Dahlqwist E et al. (2019) <doi:10.1007/s00439-019-02006-8>.
AFM Atomic Force Microscope Image Analysis
Provides Atomic Force Microscope images analysis such as Power Spectrum Density, roughness against lengthscale, variogram and variance, fractal dimension and scale.
after Run Code in the Background
Run an R function in the background, possibly after a delay. The current version uses the Tcl event loop and was ported from the ‘tcltk2’ package.
aftgee Accelerated Failure Time Model with Generalized Estimating Equations
This package features both rank-based estimates and least square estimates to the Accelerated Failure Time (AFT) model. For rank-based estimation, it provides approaches that include the computationally efficient Gehan’s weight and the general’s weight such as the logrank weight. For the least square estimation, the estimating equation is solved with Generalized Estimating Equations (GEE). Moreover, in multivariate cases, the dependence working correlation structure can be specified in GEE’s setting.
aggregation p-Value Aggregation Methods
Contains functionality for performing the following methods of p-value aggregation: Fisher’s method [Fisher, RA (1932, ISBN: 9780028447308)], the Lancaster method (weighted Fisher’s method) [Lancaster, HO (1961, <doi:10.1111/j.1467-842X.1961.tb00058.x>)], and Sidak correction (minimum p-value method with correction) [Sidak, Z (1967, <doi:10.1080/01621459.1967.10482935>)].
agRee Various Methods for Measuring Agreement
Bland-Altman plot and scatter plot with identity line for visualization and point and interval estimates for different metrics related to reproducibility/repeatability/agreement including the concordance correlation coefficient, intraclass correlation coefficient, within-subject coefficient of variation, smallest detectable difference, and mean normalized smallest detectable difference.
AgreementInterval Agreement Interval of Two Measurement Methods
A tool for calculating agreement interval of two measurement methods (Jason Liao (2015) <DOI:10.1515/ijb-2014-0030>) and present results in plots with discordance rate and/or clinically meaningful limit to quantify agreement quality.
agriTutorial Tutorial Analysis of Some Agricultural Experiments
Example software for the analysis of data from designed experiments, especially agricultural crop experiments. The basics of the statistical analysis of designed experiments are discussed using real examples from agricultural field trials. A range of statistical methods are exemplified using a range of R statistical packages. The experimental data is made available as separate data sets for each example and the R analysis code is made available as example code. The example code can be readily extended, as required.
AhoCorasickTrie Fast Searching for Multiple Keywords in Multiple Texts
Aho-Corasick is an optimal algorithm for finding many keywords in a text. It can locate all matches in a text in O(N+M) time; i.e., the time needed scales linearly with the number of keywords (N) and the size of the text (M). Compare this to the naive approach which takes O(N*M) time to loop through each pattern and scan for it in the text. This implementation builds the trie (the generic name of the data structure) and runs the search in a single function call. If you want to search multiple texts with the same trie, the function will take a list or vector of texts and return a list of matches to each text. By default, all 128 ASCII characters are allowed in both the keywords and the text. A more efficient trie is possible if the alphabet size can be reduced. For example, DNA sequences use at most 19 distinct characters and usually only 4; protein sequences use at most 26 distinct characters and usually only 20. UTF-8 (Unicode) matching is not currently supported.
ahp Analytical Hierarchy Process (AHP) with R
An R package to model complex decision making problems using AHP (Analytic Hierarchy Process). AHP lets you analyse complex decision making problems.
AHR Estimation and Testing of Average Hazard Ratios
Methods for estimation of multivariate average hazard ratios as defined by Kalbfleisch and Prentice. The underlying survival functions of the event of interest in each group can be estimated using either the (weighted) Kaplan-Meier estimator or the Aalen-Johansen estimator for the transition probabilities in Markov multi-state models. Right-censored and left-truncated data is supported. Moreover, the difference in restricted mean survival can be estimated.
Ake Associated Kernel Estimations
Continuous and discrete (count or categorical) estimation of density, probability mass function (pmf) and regression functions are performed using associated kernels. The cross-validation technique and the local Bayesian procedure are also implemented for bandwidth selection.
akmedoids Anchored Kmedoids for Longitudinal Data Clustering
Advances a novel adaptation of longitudinal k-means clustering technique (Genolini et al. (2015) <doi:10.18637/jss.v065.i04>) for grouping trajectories based on the similarities of their long-term trends and determines the optimal solution based on the Calinski-Harabatz criterion (Calinski and Harabatz (1974) <doi:10.1080/03610927408827101>). Includes functions to extract descriptive statistics and generate a visualisation of the resulting groups, drawing methods from the ‘ggplot2’ library (Wickham H. (2016) <doi:10.1007/978-3-319-24277-4>). The package also includes a number of other useful functions for exploring and manipulating longitudinal data prior to the clustering process.
albopictus Age-Structured Population Dynamics Model
Implements discrete time deterministic and stochastic age-structured population dynamics models described in Erguler and others (2016) <doi:10.1371/journal.pone.0149282> and Erguler and others (2017) <doi:10.1371/journal.pone.0174293>.
ALEPlot Accumulated Local Effects (ALE) Plots and Partial Dependence (PD) Plots
Visualizes the main effects of individual predictor variables and their second-order interaction effects in black-box supervised learning models. The package creates either Accumulated Local Effects (ALE) plots and/or Partial Dependence (PD) plots, given a fitted supervised learning model.
algorithmia Allows you to Easily Interact with the Algorithmia Platform
The company, Algorithmia, houses the largest marketplace of online algorithms. This package essentially holds a bunch of REST wrappers that make it very easy to call algorithms in the Algorithmia platform and access files and directories in the Algorithmia data API. To learn more about the services they offer and the algorithms in the platform visit <>. More information for developers can be found at <>.
algstat Algebraic statistics in R
algstat provides functionality for algebraic statistics in R. Current applications include exact inference in log-linear models for contingency table data, analysis of ranked and partially ranked data, and general purpose tools for multivariate polynomials, building on the mpoly package. To aid in the process, algstat has ports to Macaulay2, Bertini, LattE-integrale and 4ti2.
alignfigR Visualizing Multiple Sequence Alignments with ‘ggplot2’
Create extensible figures of multiple sequence alignments, using the ‘ggplot2’ plotting engine. ‘alignfigr’ will create a baseline figure of a multiple sequence alignment which can be fully customized to the user’s liking with standard ‘ggplot2’ features.
AlignStat Comparison of Alternative Multiple Aequence Alignments
Methods for comparing two alternative multiple sequence alignments (MSAs) to determine whether they align homologous residues in the same columns as one another. It then classifies similarities and differences into conserved gaps, conserved sequence, merges, splits or shifts of one MSA relative to the other. Summarising these categories for each MSA column yields information on which sequence regions are agreed upon my both MSAs, and which differ. Several plotting functions enable easily visualisation of the comparison data for analysis.
alineR Alignment of Phonetic Sequence Using the ‘ALINE’ Algorithm
Functions are provided to calculate the ‘ALINE’ Distance between a cognate pair. The score is based on phonetic features represented using the Unicode-compliant International Phonetic Alphabet (IPA). Parameterized features weights used to determine the optimal alignment and functions are provided to estimate optimum values.This project was funded by the National Science Foundation Cultural Anthropology Program (Grant number SBS-1030031) and the University of Maryland College of Behavioral and Social Sciences.
allanvar Allan Variance Analysis
A collection of tools for stochastic sensor error characterization using the Allan Variance technique originally developed by D. Allan.
alluvial Alluvial Diagrams
Creating alluvial diagrams (also known as parallel sets plots) for multivariate and time series-like data.
alpaca Fit GLM’s with High-Dimensional k-Way Fixed Effects
Provides a routine to concentrate out factors with many levels during the optimization of the log-likelihood function of the corresponding generalized linear model (glm). The package is based on the algorithm proposed by Stammann (2018) <arXiv:1707.01815> and is restricted to glm’s that are based on maximum likelihood estimation and non-linear. It also offers an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine. The package also includes robust and multi-way clustered standard errors.
alphaOutlier Obtain Alpha-Outlier Regions for Well-Known Probability Distributions
Given the parameters of a distribution, the package uses the concept of alpha-outliers by Davies and Gather (1993) to flag outliers in a data set. See Davies, L.; Gather, U. (1993): The identification of multiple outliers, JASA, 88 423, 782-792, doi: 10.1080/01621459.1993.10476339 for details.
alphashape3d Implementation of the 3D Alpha-Shape for the Reconstruction of 3D Sets from a Point Cloud
Implementation in R of the alpha-shape of a finite set of points in the three-dimensional space. The alpha-shape generalizes the convex hull and allows to recover the shape of non-convex and even non-connected sets in 3D, given a random sample of points taken into it. Besides the computation of the alpha-shape, this package provides users with functions to compute the volume of the alpha-shape, identify the connected components and facilitate the three-dimensional graphical visualization of the estimated set.
alphastable Inference for Stable Distribution
Developed to perform the tasks given by the following. 1-computing the probability density function and distribution function of a univariate stable distribution; 2- generating realization from univariate stable, truncated stable, multivariate elliptically contoured stable, and bivariate strictly stable distributions; 3- estimating the parameters of univariate symmetric stable, skew stable, Cauchy, multivariate elliptically contoured stable, and multivariate strictly stable distributions; 4- estimating the parameters of the mixture of symmetric stable and mixture of Cauchy distributions.
AlphaVantageClient Wrapper for Alpha Vantage API
Download data from the Alpha Vantage API (<https://…/> ). Alpha Vantage is a RESTful API which provides various financial data, including stock prices and technical indicators. There is documentation for the underlying API available here: <https://…/>. To get access to this API, the user needs to first claim an API key: <https://…/>.
alphavantager Lightweight R Interface to the Alpha Vantage API
Alpha Vantage has free historical financial information. All you need to do is get a free API key at <>. Then you can use the R interface to retrieve free equity information. Refer to the Alpha Vantage website for more information.
alR Arc Lengths of Statistical Functions
Estimation, regression and classification using arc lengths.
altmeta Alternative Meta-Analysis Methods
Provides alternative statistical methods for meta-analysis, including new heterogeneity tests, estimators of between-study variance, and heterogeneity measures that are robust to outliers.
ambient A Generator of Multidimensional Noise
Generation of natural looking noise has many application within simulation, procedural generation, and art, to name a few. The ‘ambient’ package provides an interface to the ‘FastNoise’ C++ library and allows for efficient generation of perlin, simplex, worley, cubic, value, and white noise with optional pertubation in either 2, 3, or 4 (in case of simplex and white noise) dimensions.
AMCTestmakeR Generate LaTeX Code for Auto-Multiple-Choice (AMC)
Generate code for use with the Optical Mark Recognition free software Auto Multiple Choice (AMC). More specifically, this package provides functions that use as input the question and answer texts, and output the LaTeX code for AMC.
amelie Anomaly Detection with Normal Probability Functions
Implements anomaly detection as binary classification for cross-sectional data. Uses maximum likelihood estimates and normal probability functions to classify observations as anomalous. The method is presented in the following lecture from the Machine Learning course by Andrew Ng: <https://…/>, and is also described in: Aleksandar Lazarevic, Levent Ertoz, Vipin Kumar, Aysel Ozgur, Jaideep Srivastava (2003) <doi:10.1137/1.9781611972733.3>.
AMIAS Alternating Minimization Induced Active Set Algorithms
An implementation of alternating minimization induced active set (AMIAS) method for solving the L0 regularized learning problems. It includes a piecewise smooth estimator by minimizing the least squares function with constraints on the number of kink points in the discrete derivatives. It also includes generalized structural sparsity via composite L0 penalty. Both time series and image segmentation can be handled by this package.
ammistability Additive Main Effects and Multiplicative Interaction Model Stability Parameters
Computes various stability parameters from Additive Main Effects and Multiplicative Interaction (AMMI) analysis results such as Modified AMMI Stability Value (MASV), Sums of the Absolute Value of the Interaction Principal Component Scores (SIPC), Sum Across Environments of Genotype-Environment Interaction Modelled by AMMI (AMGE), Sum Across Environments of Absolute Value of Genotype-Environment Interaction Modelled by AMMI (AV_(AMGE)), AMMI Stability Index (ASI), Modified ASI (MASI), AMMI Based Stability Parameter (ASTAB), Annicchiarico’s D Parameter (DA), Zhang’s D Parameter (DZ), Averages of the Squared Eigenvector Values (EV), Stability Measure Based on Fitted AMMI Model (FA), Absolute Value of the Relative Contribution of IPCs to the Interaction (Za). Further calculates the Simultaneous Selection Index for Yield and Stability from the computed stability parameters. See the vignette for complete list of citations for the methods implemented.
aMNLFA Automated Fitting of Moderated Nonlinear Factor Analysis Through the ‘Mplus’ Program
Automated generation, running, and interpretation of moderated nonlinear factor analysis models for obtaining scores from observed variables. This package creates ‘Mplus’ input files which may be run iteratively to test two different types of covariate effects on items: (1) latent variable impact (both mean and variance); and (2) differential item functioning. After sequentially testing for all effects, it also creates a final model by including all significant effects after adjusting for multiple comparisons. Finally, the package creates a scoring model which uses the final values of parameter estimates to generate latent variable scores.
ampd An Algorithm for Automatic Peak Detection in Noisy Periodic and Quasi- Periodic Signals
A method for automatic detection of peaks in noisy periodic and quasi-periodic signals. This method, called automatic multiscale-based peak detection (AMPD), is based on the calculation and analysis of the local maxima scalogram, a matrix comprising the scale-dependent occurrences of local maxima.
analyz Model Layer for Automatic Data Analysis
Class with methods to read and execute R commands described as steps in a CSV file.
anapuce Tools for Microarray Data Analysis
Functions for normalisation, differentially analysis of microarray data and local False Discovery Rate.
anfis Adaptive Neuro Fuzzy Inference System in R
The package implements ANFIS Type 3 Takagi and Sugeno’s fuzzy if-then rule network with the following features: (1) Independent number of membership functions(MF) for each input, and also different MF extensible types. (2) Type 3 Takagi and Sugeno’s fuzzy if-then rule (3) Full Rule combinations, e.g. 2 inputs 2 membership funtions -> 4 fuzzy rules (4) Hibrid learning, i.e. Descent Gradient for precedents and Least Squares Estimation for consequents (5) Multiple outputs.
aniDom Inferring Dominance Hierarchies and Estimating Uncertainty
Provides: (1) Tools to infer dominance hierarchies based on calculating Elo scores, but with custom functions to improve estimates in animals with relatively stable dominance ranks. (2) Tools to plot the shape of the dominance hierarchy and estimate the uncertainty of a given data set.
anipaths Animate Paths
Animation of observed trajectories using spline-based interpolation (see for example, Buderman, F. E., Hooten, M. B., Ivan, J. S. and Shenk, T. M. (2016), <doi:10.1111/2041-210X.12465> ‘A functional model for characterizing long-distance movement behaviour’. Methods Ecol Evol). Intended to be used exploratory data analysis, and perhaps for preparation of presentations.
ANLP Build Text Prediction Model
Library to sample and clean text data, build N-gram model, Backoff algorithm etc.
anMC Compute High Dimensional Orthant Probabilities
Computationally efficient method to estimate orthant probabilities of high-dimensional Gaussian vectors. Further implements a function to compute conservative estimates of excursion sets under Gaussian random field priors.
ANN2 Artificial Neural Networks for Anomaly Detection
Training of general classification and regression neural networks using gradient descent. Special features include a function for training autoencoders as well as an implementation of replicator neural networks, for details see Hawkins et al. (2012) <doi:10.1007/3-540-46145-0_17>. Multiple activation and cost functions (including Huber and pseudo-Huber) are included, as well as L1 and L2 regularization, momentum, early stopping and the possibility to specify a learning rate schedule. The package contains a vectorized gradient descent implementation which facilitates faster training through batch learning.
AnnuityRIR Annuity Random Interest Rates
Annuity Random Interest Rates proposes different techniques for the approximation of the present and final value of a unitary annuity-due or annuity-immediate considering interest rate as a random variable. Cruz Rambaud et al. (2017) <doi:10.1007/978-3-319-54819-7_16>. Cruz Rambaud et al. (2015) <doi:10.23755/rm.v28i1.25>.
anocva A Non-Parametric Statistical Test to Compare Clustering Structures
Provides ANOCVA (ANalysis Of Cluster VAriability), a non-parametric statistical test to compare clustering structures with applications in functional magnetic resonance imaging data (fMRI). The ANOCVA allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering.
ANOM Analysis of Means
Analysis of means (ANOM) as used in technometrical computing. The package takes results from multiple comparisons with the grand mean (obtained with multcomp, SimComp, nparcomp, or MCPAN) or corresponding simultaneous confidence intervals as input and produces ANOM decision charts that illustrate which group means deviate significantly from the grand mean.
anomalous Anomalous time series package for R
It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. A common use-case is to identify servers that are behaving unusually. Methods in this package compute a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then a robust principal component decomposition is used on the features, and various bivariate outlier detection methods are applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on highest density regions and alpha-hulls. For demo purposes, this package contains both synthetic and real data from Yahoo.
anomalous-acm Anomalous time series package for R (ACM)
It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. A common use-case is to identify servers that are behaving unusually. Methods in this package compute a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then a robust principal component decomposition is used on the features, and various bivariate outlier detection methods are applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on highest density regions and alpha-hulls. For demo purposes, this package contains both synthetic and real data from Yahoo.
anomaly Detecting Anomalies in Data
An implementation of CAPA (Collective And Point Anomaly) by Fisch, Eckley and Fearnhead (2018) <arXiv:1806.01947> for the detection of anomalies in time series data. The package also contains Kepler lightcurve data and shows how CAPA can be applied to detect exoplanets.
anomalyDetection Implementation of Augmented Network Log Anomaly Detection Procedures
Implements procedures to aid in detecting network log anomalies. By combining various multivariate analytic approaches relevant to network anomaly detection, it provides cyber analysts efficient means to detect suspected anomalies requiring further evaluation.
AnomalyDetection Anomaly Detection with R
AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. The AnomalyDetection package can be used in wide variety of contexts. For example, detecting anomalies in system metrics after a new software release, user engagement post an A/B test, or for problems in econometrics, financial engineering, political and social sciences.
anominate Alpha-NOMINATE Ideal Point Estimator
Fits ideal point model described in Carroll, Lewis, Lo, Poole and Rosenthal (2013), ‘The Structure of Utility in Models of Spatial Voting,’ American Journal of Political Science 57(4): 1008–1028, <doi:10.1111/ajps.12029>.
anonymizer Anonymize Data Containing Personally Identifiable Information
Allows users to quickly and easily anonymize data containing Personally Identifiable Information (PII) through convenience functions.
ANOVAShiny Interactive Document for Working with Analysis of Variance
An interactive document on the topic of one-way and two-way analysis of variance using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
antaresViz Antares Visualizations
Visualize results generated by Antares, a powerful software developed by RTE to simulate and study electric power systems (more information about Antares here: <> ). This package provides functions that create interactive charts to help Antares users visually explore the results of their simulations. You can see the results of several ANTARES studies here : <http://…/>.
antiword Extract Text from Microsoft Word Documents
Wraps the ‘AntiWord’ utility to extract text from Microsoft Word documents. The utility only supports the old ‘doc’ format, not the new xml based ‘docx’ format.
anyLib Install and Load Any Package from CRAN, Bioconductor or Github
Made to make your life simpler with packages, by installing and loading a list of packages, whether they are on CRAN, Bioconductor or github. For github, if you do not have the full path, with the maintainer name in it (e.g. ‘achateigner/topReviGO’), it will be able to load it but not to install it.
anytime Anything to ‘POSIXct’ Converter
Convert input in character, integer, or numeric form into ‘POSIXct’ objects, using one of a number of predefined formats, and relying on Boost facilities for date and time parsing.
Aoptbdtvc A-Optimal Block Designs for Comparing Test Treatments with Controls
A collection of functions to construct A-optimal block designs for comparing test treatments with one or more control(s). Mainly A-optimal balanced treatment incomplete block designs, weighted A-optimal balanced treatment incomplete block designs, A-optimal group divisible treatment designs and A-optimal balanced bipartite block designs can be constructed using the package. The designs are constructed using algorithms based on linear integer programming. To the best of our knowledge, these facilities to construct A-optimal block designs for comparing test treatments with one or more controls are not available in the existing R packages. For more details on designs for tests versus control(s) comparisons, please see Hedayat, A. S. and Majumdar, D. (1984) <doi:10.1080/00401706.1984.10487989> A-Optimal Incomplete Block Designs for Control-Test Treatment Comparisons, Technometrics, 26, 363-370 and Mandal, B. N. , Gupta, V. K., Parsad, Rajender. (2017) <doi:10.1080/03610926.2015.1071394> Balanced treatment incomplete block designs through integer programming. Communications in Statistics – Theory and Methods 46(8), 3728-3737.
apa Format Outputs of Statistical Tests According to APA Guidelines
Formatter functions in the ‘apa’ package take the return value of a statistical test function, e.g. a call to chisq.test() and return a string formatted according to the guidelines of the APA (American Psychological Association).
ApacheLogProcessor Process the Apache Web Server Log Files
Provides capabilities to process Apache HTTPD Log files.The main functionalities are to extract data from access and error log files to data frames.
apc Age-Period-Cohort Analysis
Functions for age-period-cohort analysis. The data can be organised in matrices indexed by age-cohort, age-period or cohort-period. The data can include dose and response or just doses. The statistical model is a generalized linear model (GLM) allowing for 3,2,1 or 0 of the age-period-cohort factors. The canonical parametrisation of Kuang, Nielsen and Nielsen (2008) is used. Thus, the analysis does not rely on ad hoc identification.
apc: An R Package for Age-Period-Cohort Analysis
apcf Adapted Pair Correlation Function
The adapted pair correlation function transfers the concept of the pair correlation function from point patterns to patterns of objects of finite size and irregular shape (e.g. lakes within a country). This is a reimplementation of the method suggested by Nuske et al. (2009) <doi:10.1016/j.foreco.2009.09.050> using the libraries ‘GEOS’ and ‘GDAL’ directly instead of through ‘PostGIS’.
apdesign An Implementation of the Additive Polynomial Design Matrix
An implementation of the additive polynomial (AP) design matrix. It constructs and appends an AP design matrix to a data frame for use with longitudinal data subject to seasonality.
APfun Geo-Processing Base Functions
Base tools for facilitating the creation geo-processing functions in R.
aphid Analysis with Profile Hidden Markov Models
Designed for the development and application of hidden Markov models and profile HMMs for biological sequence analysis. Contains functions for multiple and pairwise sequence alignment, model construction and parameter optimization, file import/export, implementation of the forward, backward and Viterbi algorithms for conditional sequence probabilities, tree-based sequence weighting, and sequence simulation. Features a wide variety of potential applications including database searching, gene-finding and annotation, phylogenetic analysis and sequence classification.
APML0 Augmented and Penalized Minimization Method L0
Fit linear and Cox models regularized with L0, lasso (L1), elastic-net (L1 and L2), or net (L1 and Laplacian) penalty, and their adaptive forms, such as adaptive lasso / elastic-net and net adjusting for signs of linked coefficients. It solves L0 penalty problem by simultaneously selecting regularization parameters and the number of non-zero coefficients. This augmented and penalized minimization method provides an approximation solution to the L0 penalty problem, but runs as fast as L1 regularization problem. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients. It could deal with very high dimensional data and has superior selection performance.
apng Convert Png Files into Animated Png
Convert several png files into an animated png file. This package exports only a single function `apng’. Call the apng function with a vector of file names (which should be png files) to convert them to a single animated png file.
apollo Tools for Estimating Discrete Choice Models
The Choice Modelling Centre at the University of Leeds has developed flexible estimation code for choice models in R. Users are able to write their own likelihood functions or use a mix of already available ones. Mixing, in the form of random coefficients and components is allowed for all models. Both classical and Bayesian estimation are available. Multi-threading processing is supported. For more information on discrete choice models see Train, K. (2009) <isbn:978-0-521-74738-7>.
APPEstimation Adjusted Prediction Model Performance Estimation
Calculating predictive model performance measures adjusted for predictor distributions using density ratio method (Sugiyama et al., (2012, ISBN:9781139035613)). L1 and L2 error for continuous outcome and C-statistics for binomial outcome are computed.
approxmatch Approximately Optimal Fine Balance Matching with Multiple Groups
Tools for constructing a matched design with multiple comparison groups. Further specifications of refined covariate balance restriction and exact match on covariate can be imposed. Matches are approximately optimal in the sense that the cost of the solution is at most twice the optimal cost, Crama and Spieksma (1992) <doi:10.1016/0377-2217(92)90078-N>.
apricom Tools for the a Priori Comparison of Regression Modelling Strategies
Tools to compare several model adjustment and validation methods prior to application in a final analysis.
APtools Average Positive Predictive Values (AP) for Binary Outcomes and Censored Event Times
We provide tools to estimate two prediction performance metrics, the average positive predictive values (AP) as well as the well-known AUC (the area under the receiver operator characteristic curve) for risk scores or marker. The outcome of interest is either binary or censored event time. Note that for censored event time, our functions estimate the AP and the AUC are time-dependent for pre-specified time interval(s). A function that compares the APs of two risk scores/markers is also included. Optional outputs include positive predictive values and true positive fractions at the specified marker cut-off values, and a plot of the time-dependent AP versus time (available for event time data).
AR Another Look at the Acceptance-Rejection Method
In mathematics, ‘rejection sampling’ is a basic technique used to generate observations from a distribution. It is also commonly called ‘the Acceptance-Rejection method’ or ‘Accept-Reject algorithm’ and is a type of Monte Carlo method. ‘Acceptance-Rejection method’ is based on the observation that to sample a random variable one can perform a uniformly random sampling of the 2D cartesian graph, and keep the samples in the region under the graph of its density function. Package ‘AR’ is able to generate/simulate random data from a probability density function by Acceptance-Rejection method. Moreover, this package is a useful teaching resource for graphical presentation of Acceptance-Rejection method. From the practical point of view, the user needs to calculate a constant in Acceptance-Rejection method, which package ‘AR’ is able to compute this constant by optimization tools. Several numerical examples are provided to illustrate the graphical presentation for the Acceptance-Rejection Method.
ar.matrix Simulate Auto Regressive Data from Precision Matricies
Using sparse precision matricies and Choleski factorization simulates data that is auto-regressive.
arabicStemR Arabic Stemmer for Text Analysis
Allows users to stem Arabic texts for text analysis.
arc Association Rule Classification
Implements the Classification-based on Association Rules (CBA) algorithm for association rule classification (ARC). The package also contains several convenience methods that allow to automatically set CBA parameters (minimum confidence, minimum support) and it also natively handles numeric attributes by integrating a pre-discretization step. The rule generation phase is handled by the ‘arules’ package.
ARCensReg Fitting Univariate Censored Linear Regression Model with Autoregressive Errors
It fits an univariate left or right censored linear regression model with autoregressive errors under the normal distribution. It provides estimates and standard errors of the parameters, prediction of future observations and it supports missing values on the dependent variable. It also provides convergence plots when exists at least one censored observation.
ArCo Artificial Counterfactual Package
Set of functions to analyse and estimate Artificial Counterfactual models from Carvalho, Masini and Medeiros (2016) <DOI:10.2139/ssrn.2823687>.
areaplot Stacked Area Plot
Produce a stacked area plot, or add polygons to an existing plot. The data can be a numeric vector, table, matrix, data frame, or a time-series object. Supports formula syntax and data can be plotted as proportions, so stacked areas equal 1.
arena2r Plots, Summary Statistics and Tools for Arena Simulation Users
Reads Arena <https://…/> CSV output files and generates nice tables and plots. The package contains a Shiny App that can be used to interactively visualize Arena’s results.
ArfimaMLM Arfima-MLM Estimation For Repeated Cross-Sectional Data
Functions to facilitate the estimation of Arfima-MLM models for repeated cross-sectional data and pooled cross-sectional time-series data (see Lebo and Weber 2015). The estimation procedure uses double filtering with Arfima methods to account for autocorrelation in repeated cross-sectional data followed by multilevel modeling (MLM) to estimate aggregate as well as individual-level parameters simultaneously.
argon2 Secure Password Hashing
Utilities for secure password hashing via the argon2 algorithm. It is a relatively new hashing algorithm and is believed to be very secure. The ‘argon2’ implementation included in the package is the reference implementation. The package also includes some utilities that should be useful for digest authentication, including a wrapper of ‘blake2b’. For similar R packages, see sodium and ‘bcrypt’. See <https://…/Argon2> or <https://…/430.pdf> for more information.
ArgumentCheck Improved Communication to Users with Respect to Problems in Function Arguments
The typical process of checking arguments in functions is iterative. In this process, an error may be returned and the user may fix it only to receive another error on a different argument. ‘ArgumentCheck’ facilitates a more helpful way to perform argument checks allowing the programmer to run all of the checks and then return all of the errors and warnings in a single message.
ARHT Adaptable Regularized Hotelling’s T^2 Test for High-Dimensional Data
Perform the Adaptable Regularized Hotelling’s T^2 test (ARHT) proposed by Li et al., (2016) <arXiv:1609.08725>. Both one-sample and two-sample mean test are available with various probabilistic alternative prior models. It contains a function to consistently estimate higher order moments of the population covariance spectral distribution using the spectral of the sample covariance matrix (Bai et al. (2010) <doi:10.1111/j.1467-842X.2010.00590.x>). In addition, it contains a function to sample from 3-variate chi-squared random vectors approximately with a given correlation matrix when the degrees of freedom are large.
ari Automated R Instructor
Create videos from ‘R Markdown’ documents, or images and audio files. These images can come from image files or HTML slides, and the audio files can be provided by the user or computer voice narration can be created using ‘Amazon Polly’. The purpose of this package is to allow users to create accessible, translatable, and reproducible lecture videos. See <https://…/> for more information.
aricode Efficient Computations of Standard Clustering Comparison Measures
Implements an efficient O(n) algorithm based on bucket-sorting for fast computation of standard clustering comparison measures. Available measures include adjusted Rand index (ARI), normalized information distance (NID), normalized mutual information (NMI), normalized variation information (NVI) and entropy, as described in Vinh et al (2009) <doi:10.1145/1553374.1553511>.
arkdb Archive and Unarchive Databases Using Flat Files
Flat text files provide a more robust, compressible, and portable way to store tables. This package provides convenient functions for exporting tables from relational database connections into compressed text files and streaming those text files back into a database without requiring the whole table to fit in working memory.
arpr Advanced R Pipes
Provides convenience functions for programming with magrittr pipes. Conditional pipes, a string prefixer and a function to pipe the given object into a specific argument given by character name are currently supported. It is named after the dadaist Hans Arp, a friend of Rene Magritte.
arqas Application in R for Queueing Analysis and Simulation
Provides functions for compute the main characteristics of the following queueing models: M/M/1, M/M/s, M/M/1/k, M/M/s/k, M/M/1/Inf/H, M/ M/s/Inf/H, M/M/s/Inf/H with Y replacements, M/M/Inf, Open Jackson Networks and Closed Jackson Networks. Moreover, it is also possible to simulate similar queueing models with any type of arrival or service distribution: G/ G/1, G/G/s, G/G/1/k, G/G/s/k, G/G/1/Inf/H, G/G/s/Inf/H, G/G/s/Inf/H with Y replacements, Open Networks and Closed Networks. Finally, contains functions for fit data to a statistic distribution.
arrangements Fast Generators and Iterators for Permutations, Combinations and Partitions
Fast generators and iterators for permutations, combinations and partitions. The iterators allow users to generate arrangements in a memory efficient manner and the generated arrangements are in lexicographical (dictionary) order. Permutations and combinations can be drawn with/without replacement and support multisets. It has been demonstrated that ‘arrangements’ outperforms most of the existing packages of similar kind. Some benchmarks could be found at <https://…/benchmark.html>.
arsenal An Arsenal of ‘R’ Functions for Large-Scale Statistical Summaries
An Arsenal of ‘R’ functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in ‘R’ and ‘RStudio’ and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types ‘by’ the levels of a categorical variable; modelsum(), which performs simple model fits on the same endpoint for many variables (univariate or adjusted for standard covariates); and freqlist(), a powerful frequency table across many categorical variables.
ART Aligned Rank Transform for Nonparametric Factorial Analysis
An implementation of the Aligned Rank Transform technique for factorial analysis (see references below for details) including models with missing terms (unsaturated factorial models). The function first computes a separate aligned ranked response variable for each effect of the user-specified model, and then runs a classic ANOVA on each of the aligned ranked responses. For further details, see Higgins, J. J. and Tashtoush, S. (1994). An aligned rank transform test for interaction. Nonlinear World 1 (2), pp. 201-211. Wobbrock, J.O., Findlater, L., Gergle, D. and Higgins,J.J. (2011). The Aligned Rank Transform for nonparametric factorial analyses using only ANOVA procedures. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’11). New York: ACM Press, pp. 143-146. <doi:10.1145/1978942.1978963>.
artfima Fit ARTFIMA Model
Fit and simulate ARTFIMA. Theoretical autocovariance function and spectral density function for stationary ARTFIMA.
ARTIVA Time-Varying DBN Inference with the ARTIVA (Auto Regressive TIme VArying) Model
Reversible Jump MCMC (RJ-MCMC)sampling for approximating the posterior distribution of a time varying regulatory network, under the Auto Regressive TIme VArying (ARTIVA) model (for a detailed description of the algorithm, see Lebre et al. BMC Systems Biology, 2010). Starting from time-course gene expression measurements for a gene of interest (referred to as ‘target gene’) and a set of genes (referred to as ‘parent genes’) which may explain the expression of the target gene, the ARTIVA procedure identifies temporal segments for which a set of interactions occur between the ‘parent genes’ and the ‘target gene’. The time points that delimit the different temporal segments are referred to as changepoints (CP).
arules Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides interfaces to C implementations of the association mining algorithms Apriori and Eclat by C. Borgelt.
arulesCBA Classification Based on Association Rules
Provides a function to build an association rule-based classifier for data frames, and to classify incoming data frames using such a classifier.
aRxiv Interface to the arXiv API
An interface to the API for arXiv, a repository of electronic preprints for computer science, mathematics, physics, quantitative biology, quantitative finance, and statistics.
as.color Assign Random Colors to Unique Items in a Vector
The as.color function takes an R vector of any class as an input, and outputs a vector of unique hexadecimal color values that correspond to the unique input values. This is most handy when overlaying points and lines for data that correspond to different levels or factors. The function will also print the random seed used to generate the colors. If you like the color palette generated, you can save the seed and reuse those colors.
asciiSetupReader Reads ‘SPSS’ and ‘SAS’ Files from ASCII Data Files (.txt) and Setup Files (.sps or .sas)
Lets you open an ‘SPSS’ or ‘SAS’ data file using a .txt file that has the data and a .sps or .sas file with setup instructions. This will only run in a txt-sps or txt-sas pair in which the setup file contains instructions to open that text file. It will NOT open other text files, .sav, .por, or ‘SAS’ files.
ashr Methods for Adaptive Shrinkage, using Empirical Bayes
The R package ‘ashr’ implements an Empirical Bayes approach for large-scale hypothesis testing and false discovery rate (FDR) estimation based on the methods proposed in M. Stephens, 2016, ‘False discovery rates: a new deal’, <DOI:10.1093/biostatistics/kxw041>. These methods can be applied whenever two sets of summary statistics—estimated effects and standard errors—are available, just as ‘qvalue’ can be applied to previously computed p-values. Two main interfaces are provided: ash(), which is more user-friendly; and ash.workhorse(), which has more options and is geared toward advanced users. The ash() and ash.workhorse() also provides a flexible modeling interface that can accomodate a variety of likelihoods (e.g., normal, Poisson) and mixture priors (e.g., uniform, normal).
asht Applied Statistical Hypothesis Tests
Some hypothesis test functions with a focus on non-asymptotic methods that have matching confidence intervals.
ASICS Automatic Statistical Identification in Complex Spectra
With a set of pure metabolite spectra, ASICS quantifies metabolites concentration in a complex spectrum. The identification of metabolites is performed by fitting a mixture model to the spectra of the library with a sparse penalty. The method and its statistical properties are described in Tardivel et al. (2017) <doi:10.1007/s11306-017-1244-5>.
AsioHeaders Asio C++ Header Files
Asio is a cross-platform C++ library for network and low-level I/O programming that provides developers with a consistent asynchronous model using a modern C++ approach. ‘Asio’ is also included in Boost but requires linking when used with Boost. Standalone it can be used header-only provided a recent-enough compiler. ‘Asio’ is written and maintained by Christopher M. Kohlhoff. ‘Asio’ is released under the ‘Boost Software License’, Version 1.0.
ASIP Automated Satellite Image Processing
Perform complex satellite image processes automatically and efficiently. Package currently supports satellite images from most widely used Landsat 4,5,7 and 8 and ASTER L1T data. The primary uses of this package is given below. 1. Conversion of optical bands to top of atmosphere reflectance. 2. Conversion of thermal bands to corresponding temperature images. 3. Derive application oriented products directly from source satellite image bands. 4. Compute user defined equation and produce corresponding image product. 5. Other basic tools for satellite image processing. References. i. Chander and Markham (2003) <doi:10.1109/TGRS.2003.818464>. ii. Roy, (2014) <doi:10.1016/j.rse.2014.02.001>. iii. Abrams (2000) <doi:10.1080/014311600210326>.
askpass Safe Password Entry for R, Git, and SSH
Cross-platform utilities for prompting the user for credentials or a passphrase, for example to authenticate with a server or read a protected key. Includes native programs for MacOS and Windows, hence no ‘tcltk’ is required. Password entry can be invoked in two different ways: directly from R via the askpass() function, or indirectly as password-entry back-end for ‘ssh-agent’ or ‘git-credential’ via the SSH_ASKPASS and GIT_ASKPASS environment variables. Thereby the user can be prompted for credentials or a passphrase if needed when R calls out to git or ssh.
aSPC An Adaptive Sum of Powered Correlation Test (aSPC) for Global Association Between Two Random Vectors
The aSPC test is designed to test global association between two groups of variables potentially with moderate to high dimension (e.g. in hundreds). The aSPC is particularly useful when the association signals between two groups of variables are sparse.
aSPU Adaptive Sum of Powered Score Test
R codes for the (adaptive) Sum of Powered Score (‘SPU’ and ‘aSPU’) tests, inverse variance weighted Sum of Powered score (‘SPUw’ and ‘aSPUw’) tests and gene-based and some pathway based association tests (Pathway based Sum of Powered Score tests (‘SPUpath’) and adaptive ‘SPUpath’ (‘aSPUpath’) test, Gene-based Association Test that uses an extended Simes procedure (‘GATES’), Hybrid Set-based Test (‘HYST’), extended version of ‘GATES’ test for pathway-based association testing (‘Gates-Simes’). ). The tests can be used with genetic and other data sets with covariates. The response variable is binary or quantitative.
asremlPlus Augments the Use of ‘Asreml’ in Fitting Mixed Models
Provides functions that assist in automating the testing of terms in mixed models when ‘asreml’ is used to fit the models. The package ‘asreml’ is marketed by ‘VSNi’ ( ) as ‘asreml-R’ and provides a computationally efficient algorithm for fitting mixed models using Residual Maximum Likelihood. The content falls into the following natural groupings: (i) Data, (ii) Object manipulation functions, (iii) Model modification functions, (iv) Model testing functions, (v) Model diagnostics functions, (vi) Prediction production and presentation functions, (vii) Response transformation functions, and (viii) Miscellaneous functions. A history of the fitting of a sequence of models is kept in a data frame. Procedures are available for choosing models that conform to the hierarchy or marginality principle and for displaying predictions for significant terms in tables and graphs.
ASSA Applied Singular Spectrum Analysis (ASSA)
Functions to model and decompose time series into principal components using singular spectrum analysis (de Carvalho and Rua (2017) <doi:10.1016/j.ijforecast.2015.09.004>; de Carvalho et al (2012) <doi:10.1016/j.econlet.2011.09.007>).
AssayCorrector Detection and Correction of Spatial Bias in HTS Screens
(1) Detects plate-specific spatial bias by identifying rows and columns of all plates of the assay affected by this bias (following the results of the Mann-Whitney U test) as well as assay-specific spatial bias by identifying well locations (i.e., well positions scanned across all plates of a given assay) affected by this bias (also following the results of the Mann-Whitney U test); (2) Allows one to correct plate-specific spatial bias using either the additive or multiplicative PMP (Partial Mean Polish) method (the most appropriate spatial bias model can be either specified by the user or determined by the program following the results of the Kolmogorov-Smirnov two-sample test) to correct the assay measurements as well as to correct assay-specific spatial bias by carrying out robust Z-scores within each plate of the assay and then traditional Z-scores across well locations. Assertions to Check Properties of Data
A set of predicates and assertions for checking the properties of (country independent) complex data types. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. Assertions to Check Properties of Strings
A set of predicates and assertions for checking the properties of US-specific complex data types. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.files Assertions to Check Properties of Files
A set of predicates and assertions for checking the properties of files and connections. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.numbers Assertions to Check Properties of Numbers
A set of predicates and assertions for checking the properties of numbers. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly. Assertions to Check Properties of Variables
A set of predicates and assertions for checking the properties of variables, such as length, names and attributes. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.reflection Assertions for Checking the State of R
A set of predicates and assertions for checking the state and capabilities of R, the operating system it is running on, and the IDE being used. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.sets Assertions to Check Properties of Sets
A set of predicates and assertions for checking the properties of sets. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.strings Assertions to Check Properties of Strings
A set of predicates and assertions for checking the properties of strings. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertive.types Assertions to Check Types of Variables
A set of predicates and assertions for checking the types of variables. This is mainly for use by other package developers who want to include run-time testing features in their own packages. End-users will usually want to use assertive directly.
assertr Assertive programming for R analysis pipelines
The assertr package supplies a suite of functions designed to verify assumptions about data early in an dplyr/magrittr analysis pipeline so that data errors are spotted early and can be addressed quickly.
assist A Suite of R Functions Implementing Spline Smoothing Techniques
A comprehensive package for fitting various non-parametric/semi-parametric linear/nonlinear fixed/mixed smoothing spline models.
ASSOCShiny Interactive Document for Working with Association Rule Mining Analysis
An interactive document on the topic of association rule mining analysis using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
assortnet Calculate the Assortativity Coefficient of Weighted and Binary Networks
Functions to calculate the assortment of vertices in social networks. This can be measured on both weighted and binary networks, with discrete or continuous vertex values.
AST Age-Spatial-Temporal Model
Fits a model to adjust and consider additional variations in three dimensions of age groups, time, and space on residuals excluded from a prediction model that have residual such as: linear regression, mixed model and so on. Details are given in Foreman et al. (2015) <doi:10.1186/1478-7954-10-1>.
asus Adaptive SURE Thresholding Using Side Information
Provides the ASUS procedure for estimating a high dimensional sparse parameter in the presence of auxiliary data that encode side information on sparsity. It is a robust data combination procedure in the sense that even when pooling non-informative auxiliary data ASUS would be at least as efficient as competing soft thresholding based methods that do not use auxiliary data. For more information, please see the website <http://…/ASUS.htm> and the accompanying paper.
asVPC Average Shifted Visual Predictive Checks
The visual predictive checks are well-known method to validate the nonlinear mixed effect model, especially in pharmacometrics area. The average shifted visual predictive checks are the newly developed method of Visual predictive checks combined with the idea of the average shifted histogram.
asymmetry The Slide-Vector Model for Multidimensional Scaling of Asymmetric Data
The slide-vector model is provided in this package together with functions for the analysis and graphical display of asymmetry. The slide vector model is a scaling model for asymmetric data. A distance model is fitted to the symmetric part of the data whereas the asymmetric part of the data is represented by projections of the coordinates onto the slide-vector. The slide-vector points in the direction of large asymmetries in the data. The distance is modified in such a way that the distance between two points that are parallel to the slide-vector is larger in the direction of this vector. The distance is smaller in the opposite direction. If the line connecting two points is perpendicular to the slide-vector the difference between the two projections is zero. In this case the distance between the two points is symmetric. The algorithm for fitting this model is derived from the majorization approach to multidimensional scaling.
atable Create Tables for Reporting Clinical Trials
Create Tables For Reporting Clinical Trials. Calculates descriptive statistics and hypothese tests, arranges the results in a table ready for reporting with LaTeX or Word.
ATE Inference for Average Treatment Effects using Covariate Balancing
Nonparametric estimation and inference for average treatment effects based on covariate balancing.
atlas Stanford ‘ATLAS’ Search Engine API
Stanford ‘ATLAS’ (Advanced Temporal Search Engine) is a powerful tool that allows constructing cohorts of patients extremely quickly and efficiently. This package is designed to interface directly with an instance of ‘ATLAS’ search engine and facilitates API queries and data dumps. Prerequisite is a good knowledge of the temporal language to be able to efficiently construct a query. More information available at <https://…/start>.
ATR Alternative Tree Representation
Plot party trees in left-right orientation instead of the classical top-down layout.
aTSA Alternative Time Series Analysis
Contains some tools for testing, analyzing time series data and fitting popular time series models such as ARIMA, Moving Average and Holt Winters, etc. Most functions also provide nice and clear outputs like SAS does, such as identify, estimate and forecast, which are the same statements in PROC ARIMA in SAS.
attachment Deal with Dependencies
Tools to help manage dependencies during package development. This can retrieve all dependencies that are used in R files in the ‘R’ directory, in Rmd files in ‘vignettes’ directory and in ‘roxygen2’ documentation of functions. There is a function to update the Description file of your package and a function to create a file with the R commands to install all dependencies of your package. All functions to retrieve dependencies of R scripts and Rmd files can be used independently of a package development.
attempt Easy Condition Handling
A friendlier condition handler, inspired by ‘purrr’ mappers and based on ‘rlang’. ‘attempt’ extends and facilitates condition handling by providing a consistent grammar, and provides a set of easy to use functions for common tests and conditions. ‘attempt’ only depends on ‘rlang’, and focuses on speed, so it can be easily integrated in other functions and used in data analysis.
attrCUSUM Tools for Attribute VSI CUSUM Control Chart
An implementation of tools for design of attribute variable sampling interval cumulative sum chart. It currently provides information for monitoring of mean increase such as average number of sample to signal, average time to signal, a matrix of transient probabilities, suitable control limits when the data are (zero inflated) Poisson/binomial distribution. Functions in the tools can be easily applied to other count processes. Also, tools might be extended to more complicated cumulative sum control chart. We leave these issues as our perpetual work.
auctestr Statistical Testing for AUC Data
Performs statistical testing to compare predictive models based on multiple observations of the A’ statistic (also known as Area Under the Receiver Operating Characteristic Curve, or AUC). Specifically, it implements a testing method based on the equivalence between the A’ statistic and the Wilcoxon statistic. For more information, see Hanley and McNeil (1982) <doi:10.1148/radiology.143.1.7063747>.
auditor Model Audit – Verification, Validation, and Error Analysis
Provides an easy to use unified interface for creating validation plots for any model. The ‘auditor’ helps to avoid repetitive work consisting of writing code needed to create residual plots. This visualizations allow to asses and compare the goodness of fit, performance, and similarity of models.
augmentedRCBD Analysis of Augmented Randomised Complete Block Designs
Functions for analysis of data generated from experiments in augmented randomised complete block design according to Federer, W.T. (1961) <doi:10.2307/2527837>. Computes analysis of variance, adjusted means, descriptive statistics, genetic variability statistics etc. Further includes data visualization and report generation functions.
augSIMEX Analysis of Data with Mixed Measurement Error and Misclassification in Covariates
Implementation of the augmented Simulation-Extrapolation (SIMEX) algorithm proposed by Yi et al. (2015) <doi:10.1080/01621459.2014.922777> for analyzing the data with mixed measurement error and misclassification. The main function provides a similar summary output as that of glm() function. Both parametric and empirical SIMEX are considered in the package.
aurelius Generates PFA Documents from R Code and Optionally Runs Them
Provides tools for converting R objects and syntax into the Portable Format for Analytics (PFA). Allows for testing validity and runtime behavior of PFA documents through rPython and Titus, a more complete implementation of PFA for Python. The Portable Format for Analytics is a specification for event-based processors that perform predictive or analytic calculations and is aimed at helping smooth the transition from statistical model development to large-scale and/or online production. See <> for more information.
AurieLSHGaussian Creates a Neighbourhood Using Locality Sensitive Hashing for Gaussian Projections
Uses locality sensitive hashing and creates a neighbourhood graph for a data set and calculates the adjusted rank index value for the same. It uses Gaussian random planes to decide the nature of a given point. Datar, Mayur, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni(2004) <doi:10.1145/997817.997857>.
auRoc Various Methods to Estimate the AUC
Estimate the AUC using a variety of methods as follows: (1) frequentist nonparametric methods based on the Mann-Whitney statistic or kernel methods. (2) frequentist parametric methods using the likelihood ratio test based on higher-order asymptotic results, the signed log-likelihood ratio test, the Wald test, or the approximate ”t” solution to the Behrens-Fisher problem. (3) Bayesian parametric MCMC methods.
auto.pca Automatic Variable Reduction Using Principal Component Analysis
PCA done by eigenvalue decomposition of a data correlation matrix, here it automatically determines the number of factors by eigenvalue greater than 1 and it gives the uncorrelated variables based on the rotated component scores, Such that in each principal component variable which has the high variance are selected. It will be useful for non-statisticians in selection of variables. For more information, see the <http://…/ijcem_032013_06.pdf> web page.
autoBagging Learning to Rank Bagging Workflows with Metalearning
A framework for automated machine learning. Concretely, the focus is on the optimisation of bagging workflows. A bagging workflows is composed by three phases: (i) generation: which and how many predictive models to learn; (ii) pruning: after learning a set of models, the worst ones are cut off from the ensemble; and (iii) integration: how the models are combined for predicting a new observation. autoBagging optimises these processes by combining metalearning and a learning to rank approach to learn from metadata. It automatically ranks 63 bagging workflows by exploiting past performance and dataset characterization. A complete description of the method can be found in: Pinto, F., Cerqueira, V., Soares, C., Mendes-Moreira, J. (2017): ‘autoBagging: Learning to Rank Bagging Workflows with Metalearning’ arXiv preprint arXiv:1706.09367.
automagic Automagically Document and Install Packages Necessary to Run R Code
Parse R code in a given directory for R packages and attempt to install them from CRAN or GitHub. Optionally use a dependencies file for tighter control over which package versions to install.
automl Deep Learning with Metaheuristic
Fits from simple regression to highly customizable deep neural networks either with gradient descent or metaheuristic, using automatic hyper parameters tuning and custom cost function. A mix inspired by the common tricks on Deep Learning and Particle Swarm Optimization.
AutoModel Automated Hierarchical Multiple Regression with Assumptions Checking
A set of functions that automates the process and produces reasonable output for hierarchical multiple regression models. It allows you to specify predictor blocks, from which it generates all of the linear models, and checks the assumptions of the model, producing the requisite plots and statistics to allow you to judge the suitability of the model.
AutoPipe Automated Transcriptome Classifier Pipeline: Comprehensive Transcriptome Analysis
An unsupervised fully-automated pipeline for transcriptome analysis or a supervised option to identify characteristic genes from predefined subclasses. We rely on the ‘pamr’ <http://…/pamr.html> clustering algorithm to cluster the Data and then draw a heatmap of the clusters with the most significant genes and the least significant genes according to the ‘pamr’ algorithm. This way we get easy to grasp heatmaps that show us for each cluster which are the clusters most defining genes.
autoplotly Automatic Generation of Interactive Visualizations for Popular Statistical Results
Functionalities to automatically generate interactive visualizations for popular statistical results supported by ‘ggfortify’, such as time series, PCA, clustering and survival analysis, with ‘plotly.js’ <> and ‘ggplot2’ style. The generated visualizations can also be easily extended using ‘ggplot2’ and ‘plotly’ syntax while staying interactive.
AutoregressionMDE Minimum Distance Estimation in Autoregressive Model
Consider autoregressive model of order p where the distribution function of innovation is unknown, but innovations are independent and symmetrically distributed. The package contains a function named ARMDE which takes X (vector of n observations) and p (order of the model) as input argument and returns minimum distance estimator of the parameters in the model.
autoSEM Performs Specification Search in Structural Equation Models
Implements multiple heuristic search algorithms for automatically creating structural equation models.
autoshiny Automatic Transformation of an ‘R’ Function into a ‘shiny’ App
Static code compilation of a ‘shiny’ app given an R function (into ‘ui.R’ and ‘server.R’ files or into a ‘shiny’ app object). See examples at <https://…/autoshiny>.
av Working with Audio and Video
Bindings to ‘FFmpeg’ <http://…/> AV library for working with audio and video in R. Generate high quality videos files by capturing images from the R graphics device combined with custom audio stream. This package interfaces directly to the C API and does not require any command line utilities.
available Check if the Title of a Package is Available, Appropriate and Interesting
Check if a given package name is available to use. It checks the name’s validity. Checks if it is used on ‘GitHub’, ‘CRAN’ and ‘Bioconductor’. Checks for unintended meanings by querying Urban Dictionary, ‘Wiktionary’ and Wikipedia.
aVirtualTwins Adaptation of Virtual Twins Method from Jared Foster
Research of subgroups in random clinical trials with binary outcome and two treatments groups. This is an adaptation of the Jared Foster method.
AWR AWS’ Java ‘SDK’ for R
Installs the compiled Java modules of the Amazon Web Services (‘AWS’) ‘SDK’ to be used in downstream R packages interacting with ‘AWS’. See <https://…/sdk-for-java> for more information on the ‘AWS’ ‘SDK’ for Java.
AWR.Kinesis Amazon ‘Kinesis’ Consumer Application for Stream Processing
Fetching data from Amazon ‘Kinesis’ Streams using the Java-based ‘MultiLangDaemon’ interacting with Amazon Web Services (‘AWS’) for easy stream processing from R. For more information on ‘Kinesis’, see <https://…/kinesis>.
AWR.KMS A Simple Client to the ‘AWS’ Key Management Service
Encrypt plain text and ‘decrypt’ cipher text using encryption keys hosted at Amazon Web Services (‘AWS’) Key Management Service (‘KMS’), on which see <https://…/kms> for more information.
aws.alexa Client for the Amazon Alexa Web Information Services API
Use the Amazon Alexa Web Information Services API to find information about domains, including the kind of content that they carry, how popular are they—rank and traffic history, sites linking to them, among other things. See <https://…/> for more information.
aws.cloudtrail AWS CloudTrail Client Package
A simple client package for the Amazon Web Services (‘AWS’) ‘CloudTrail’ ‘API’ <https://…/>.
aws.comprehend AWS Comprehend’ Client Package
Client for ‘AWS Comprehend’ <https://…/comprehend>, a cloud natural language processing service that can perform a number of quantitative text analyses, including language detection, sentiment analysis, and feature extraction.
aws.iam AWS IAM Client Package
A simple client for the Amazon Web Services (‘AWS’) Identity and Access Management (‘IAM’) ‘API’ <https://…/>.
aws.kms AWS Key Management Service’ Client Package
Client package for the ‘AWS Key Management Service’ <https://…/>, a cloud service for managing encryption keys.
aws.polly Client for AWS Polly
A client for AWS Polly <http://…/polly>, a speech synthesis service.
aws.s3 AWS S3 Client Package
A simple client package for the Amazon Web Services (AWS) Simple Storage Service (S3) REST API <https://…/>. AWS SES Client Package
A simple client package for the Amazon Web Services (AWS) Simple Email Service (SES) <http://…/> REST API.
aws.signature Amazon Web Services Request Signatures
Generates request signatures for Amazon Web Services (AWS) APIs.
aws.sns AWS SNS Client Package
A simple client package for the Amazon Web Services (AWS) Simple Notification Service (SNS) API.
aws.sqs AWS SQS Client Package
A simple client package for the Amazon Web Services (AWS) Simple Queue Service (SQS) API.
aws.transcribe Client for ‘AWS Transcribe’
Client for ‘AWS Transcribe’ <https://…/transcribe>, a cloud transcription service that can convert an audio media file in English and other languages into a text transcript.
aws.translate Client for ‘AWS Translate’
A client for ‘AWS Translate’ <https://…/translate>, a machine translation service that will convert a text input in one language into a text output in another language.
awsjavasdk Boilerplate R Access to the Amazon Web Services (‘AWS’) Java SDK
Provides boilerplate access to all of the classes included in the Amazon Web Services (‘AWS’) Java Software Development Kit (SDK) via package:’rJava’. According to Amazon, the ‘SDK helps take the complexity out of coding by providing Java APIs for many AWS services including Amazon S3, Amazon EC2, DynamoDB, and more’. You can read more about the included Java code on Amazon’s website: <https://…/>.
awspack Amazon Web Services Bundle Package
A bundle of all of ‘cloudyr’ project <http://…/> packages for Amazon Web Services (‘AWS’) <https://…/>. It depends upon all of the ‘cloudyr’ project’s ‘AWS’ packages. It is mainly useful for installing the entire suite of packages; more likely than not you will only want to load individual packages one at a time.
AzureGraph Simple Interface to ‘Microsoft Graph’
A simple interface to the ‘Microsoft Graph’ API <https://…/overview>. ‘Graph’ is a comprehensive framework for accessing data in various online Microsoft services. Currently, this package aims to provide an R interface only to the ‘Azure Active Directory’ part, with a view to supporting interoperability of R and ‘Azure’: users, groups, registered apps and service principals. However it can be easily extended to cover other services.
AzureKeyVault Key and Secret Management in ‘Azure’
Manage keys, certificates, secrets, and storage accounts in Microsoft’s ‘Key Vault’ service: <https://…/key-vault>. Provides facilities to store and retrieve secrets, use keys to encrypt, decrypt, sign and verify data, and manage certificates. Integrates with the ‘AzureAuth’ package to enable authentication with a certificate, and with the ‘openssl’ package for importing and exporting.
AzureKusto Interface to ‘Kusto’/’Azure Data Explorer’
An interface to ‘Azure Data Explorer’, also known as ‘Kusto’, a fast, highly scalable data exploration service from Microsoft: <https://…/>. Includes ‘DBI’ and ‘dplyr’ interfaces, with the latter modelled after the ‘dbplyr’ package, whereby queries are translated from R into the native ‘KQL’ query language and executed lazily. On the admin side, the package extends the object framework provided by ‘AzureRMR’ to support creation and deletion of databases, and management of database principals.
AzureML Discover, Publish and Consume Web Services on Microsoft Azure Machine Learning
Provides an interface with Microsoft Azure to easily publish functions and trained models as a web service, and discover and consume web service.
AzureRMR Interface to ‘Azure Resource Manager’
A lightweight but powerful R interface to the ‘Azure Resource Manager’ REST API. The package exposes classes and methods for ‘OAuth’ authentication and working with subscriptions and resource groups. It also provides functionality for creating and deleting ‘Azure’ resources and deploying templates. While ‘AzureRMR’ can be used to manage any ‘Azure’ service, it can also be extended by other packages to provide extra functionality for specific services.


BACCT Bayesian Augmented Control for Clinical Trials
Implements the Bayesian Augmented Control (BAC, a.k.a. Bayesian historical data borrowing) method under clinical trial setting by calling ‘Just Another Gibbs Sampler’ (‘JAGS’) software. In addition, the ‘BACCT’ package evaluates user-specified decision rules by computing the type-I error/power, or probability of correct go/no-go decision at interim look. The evaluation can be presented numerically or graphically. Users need to have ‘JAGS’ 4.0.0 or newer installed due to a compatibility issue with ‘rjags’ package. Currently, the package implements the BAC method for binary outcome only. Support for continuous and survival endpoints will be added in future releases. We would like to thank AbbVie’s Statistical Innovation group and Clinical Statistics group for their support in developing the ‘BACCT’ package.
bacistool Bayesian Classification and Information Sharing (BaCIS) Tool for the Design of Multi-Group Phase II Clinical Trials
Provides the design of multi-group phase II clinical trials with binary outcomes using the hierarchical Bayesian classification and information sharing (BaCIS) model. Subgroups are classified into two clusters on the basis of their outcomes mimicking the hypothesis testing framework. Subsequently, information sharing takes place within subgroups in the same cluster, rather than across all subgroups. This method can be applied to the design and analysis of multi-group clinical trials with binary outcomes.
backpipe Backward Pipe Operator
Provides a backward-pipe operator for ‘magrittr’ (%<%) or ‘pipeR’ (%<<%) that allows for a performing operations from right-to-left. This is useful in instances where there is right-to-left ordering commonly observed with nested structures such as trees/directories and markup languages such as HTML and XML.
backports Reimplementations of Functions Introduced Since R-3.0.0
Provides implementations of functions which have been introduced in R since version 3.0.0. The backports are conditionally exported which results in R resolving the function names to the version shipped with R (if available) and uses the implemented backports as fallback. This way package developers can make use of the new functions without without worrying about the minimum required R version.
backShift Learning Causal Cyclic Graphs from Unknown Shift Interventions
Code for ‘backShift’, an algorithm to estimate the connectivity matrix of a directed (possibly cyclic) graph with hidden variables. The underlying system is required to be linear and we assume that observations under different shift interventions are available. For more details, see http://…/1506.02494 .
bacr Bayesian Adjustment for Confounding
Estimating the average causal effect based on the Bayesian Adjustment for Confounding (BAC) algorithm.
badger Badge for R Package
Query information and generate badge for using in README and GitHub Pages.
bain Bayes Factors for Informative Hypotheses
Computes approximated adjusted fractional Bayes factors for equality, inequality, and about equality constrained hypotheses. S3 methods are available for specific types of lm() models, namely ANOVA, ANCOVA, and multiple regression, and for the t_test(). The statistical underpinnings are described in Hoijtink, Mulder, van Lissa, and Gu, (2018) <doi:10.31234/>, Gu, Mulder, and Hoijtink, (2018) <doi:10.1111/bmsp.12110>, Hoijtink, Gu, and Mulder, (2018) <doi:10.1111/bmsp.12145>, and Hoijtink, Gu, Mulder, and Rosseel, (2018) <doi:10.1037/met0000187>.
bairt Bayesian Analysis of Item Response Theory Models
Bayesian estimation of the two and three parameter models of item response theory (IRT). Also, it is possible to use a web interactive application intended for the making of an MCMC estimation and model-fit of the IRT models.
BalanceCheck Balance Check for Multiple Covariates in Matched Observational Studies
Two practical tests are provided for assessing whether multiple covariates in a treatment group and a matched control group are balanced in observational studies.
BALD Robust Loss Development Using MCMC
Bayesian analysis of loss development on insurance triangles or ‘BALD’ is a Bayesian model of developing aggregate loss triangles in property casualty insurance. This actuarial model makes use of a heteroskedastic and skewed t-likelihood with endogenous degrees of freedom, employs model averaging by means of Reversible Jump MCMC, and accommodates a structural break in the path of the consumption of benefits. Further, the model is capable of incorporating expert information in the calendar year effect. In an accompanying vignette, this model is applied to two widely studied General Liability and Auto Bodily Injury Liability loss triangles. For a description of the methodology, see Frank A. Schmid (2010) <doi:10.2139/ssrn.1501706>.
Ball Statistical Inference and Sure Independence Screening via Ball Statistics
Hypothesis tests and sure independence screening (SIS) procedure based on ball statistics, including ball divergence <doi:10.1214/17-AOS1579>, ball covariance, and ball correlation <doi:10.1080/01621459.2018.1462709>, are developed to analyze complex data. The ball divergence and ball covariance based distribution-free tests are implemented to examine equality of multivariate distributions and independence between random vectors of arbitrary dimensions. Furthermore, a generic non-parametric SIS procedure based on ball correlation and all of its variants are implemented to tackle the challenge in the context of ultra high dimensional data.
BAMBI Bivariate Angular Mixture Models
Fit (using Bayesian methods) and simulate mixtures of univariate and bivariate angular distributions.
bamlss Bayesian Additive Models for Location Scale and Shape (and Beyond)
R infrastructures for Bayesian regression models.
bamp Bayesian Age-Period-Cohort Modeling and Prediction
Bayesian Age-Period-Cohort Modeling and Prediction using efficient Markov Chain Monte Carlo Methods. This is the R version of the previous BAMP software as described in Volker Schmid and Leonhard Held (2007) <DOI:10.18637/jss.v021.i08> Bayesian Age-Period-Cohort Modeling and Prediction – BAMP, Journal of Statistical Software 21:8. This package includes checks of convergence using Gelman’s R.
BANFF Bayesian Network Feature Finder
Provides efficient Bayesian nonparametric models for network feature selection
bang Bayesian Analysis, No Gibbs
Provides functions for the Bayesian analysis of some simple commonly-used models, without using Markov Chain Monte Carlo (MCMC) methods such as Gibbs sampling. The ‘rust’ package <https://…/package=rust> is used to simulate a random sample from the required posterior distribution. At the moment three conjugate hierarchical models are available: beta-binomial, gamma-Poisson and a 1-way analysis of variance (ANOVA).
bannerCommenter Make Banner Comments with a Consistent Format
A convenience package for use while drafting code. It facilitates making stand-out comment lines decorated with bands of characters. The input text strings are converted into R comment lines, suitably formatted. These are then displayed in a console window and, if possible, automatically transferred to a clipboard ready for pasting into an R script. Designed to save time when drafting R scripts that will need to be navigated and maintained by other programmers.
BarBorGradient Function Minimum Approximator
Tool to find where a function has its lowest value(minimum). The functions can be any dimensions. Recommended use is with eps=10^-10, but can be run with 10^-20, although this depends on the function. Two more methods are in this package, simple gradient method (Gradmod) and Powell method (Powell). These are not recommended for use, their purpose are purely for comparison.
Barnard Barnard’s Unconditional Test
Barnard’s unconditional test for 2×2 contingency tables.
BART Bayesian Additive Regression Trees
Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary and time-to-event outcomes. For more information on BART, see Chipman, George and McCulloch (2010) <doi:10.1214/09-AOAS285> and Sparapani, Logan, McCulloch and Laud (2016) <doi:10.1002/sim.6893>.
bartMachine Bayesian Additive Regression Trees
An advanced implementation of Bayesian Additive Regression Trees with expanded features for data analysis and visualization.
bartMachineJARs bartMachine JARs
These are bartMachine’s Java dependency libraries. Note: this package has no functionality of its own and should not be installed as a standalone package without bartMachine.
Barycenter Wasserstein Barycenter
Computation of a Wasserstein Barycenter. The package implements a method described in Cuturi (2014) ‘Fast Computation of Wasserstein Barycenters’. The paper is available at <http://…/cuturi14.pdf>. To speed up the computation time the main iteration step is based on ‘RcppArmadillo’.
BAS Bayesian Model Averaging using Bayesian Adaptive Sampling
Package for Bayesian Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner’s g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the Liang et al hyper-g priors (JASA 2008) or mixtures of g-priors in GLMS of Li and Clyde 2015. Other model selection criteria include AIC and BIC. Sampling probabilities may be updated based on the sampled models using Sampling w/out Replacement or an MCMC algorithm samples models using the BAS tree structure as an efficient hash table. Allows uniform or beta-binomial prior distributions on models, and may force variables to always be included.
basad Bayesian Variable Selection with Shrinking and Diffusing Priors
Provides a Bayesian variable selection approach using continuous spike and slab prior distributions. The prior choices here are motivated by the shrinking and diffusing priors studied in Narisetty & He (2014) <DOI:10.1214/14-AOS1207>.
base2grob Convert Base Plot to ‘grob’ Object
Convert base plot function call (using expression or formula) to ‘grob’ object that compatible to the ‘grid’ ecosystem. With this package, we are able to e.g. using ‘cowplot’ to align base plots with ‘ggplot’ objects and using ‘ggsave’ to export base plot to file.
base64url Fast and URL-Safe Base64 Encoder and Decoder
In contrast to RFC3548, the 62nd character (‘+’) is replaced with ‘-‘, the 63rd character (‘/’) is replaced with ‘_’. Furthermore, the encoder does not fill the string with trailing ‘=’. The resulting encoded strings comply to the regular expression pattern ‘[A-Za-z0-9_-]’ and thus are safe to use in URLs or for file names.
basefun Infrastructure for Computing with Basis Functions
Some very simple infrastructure for basis functions.
basicMCMCplots Trace Plots, Density Plots and Chain Comparisons for MCMC Samples
Provides a function for examining posterior MCMC samples from a single chain using trace plots and density plots, and from multiple chains by comparing posterior medians and credible intervals from each chain. These plotting functions have a variety of options, such as figure sizes, legends, parameters to plot, and saving plots to file. Functions interface with the NIMBLE software package, see de Valpine, Turek, Paciorek, Anderson-Bergman, Temple Lang and Bodik (2017) <doi:10.1080/10618600.2016.1172487>.
basicspace Recovering a Basic Space from Issue Scales
Conducts Aldrich-McKelvey and Blackbox Scaling (Poole et al 2016) <doi:10.18637/jss.v069.i07> to recover latent dimensions of judgment.
basictabler Construct Rich Tables for Output to ‘HTML’/’Excel’
Easily create tables from data frames/matrices. Create/manipulate tables row-by-row, column-by-column or cell-by-cell. Use common formatting/styling to output rich tables as ‘HTML’, ‘HTML widgets’ or to ‘Excel’.
basicTrendline Add Trendline of Basic Regression Models to Plot
Add trendline of basic linear or nonlinear regression models and show equation to plot as simple as possible.
basket Basket Trial Analysis
Implementation of multisource exchangeability models for Bayesian analyses of prespecified subgroups arising in the context of basket trial design and monitoring. The R ‘basket’ package facilitates implementation of the binary, symmetric multi-source exchangeability model (MEM) with posterior inference arising from both exact computation and Markov chain Monte Carlo sampling. Analysis output includes full posterior samples as well as posterior probabilities, highest posterior density (HPD) interval boundaries, effective sample sizes (ESS), mean and median estimations, posterior exchangeability probability matrices, and maximum a posteriori MEMs. In addition to providing ‘basketwise’ analyses, the package includes similar calculations for ‘clusterwise’ analyses for which subgroups are combined into meta-baskets, or clusters, using graphical clustering algorithms that treat the posterior exchangeability probabilities as edge weights. In addition plotting tools are provided to visualize basket and cluster densities as well as their exchangeability. References include Hyman, D.M., Puzanov, I., Subbiah, V., Faris, J.E., Chau, I., Blay, J.Y., Wolf, J., Raje, N.S., Diamond, E.L., Hollebecque, A. and Gervais, R (2015) <doi:10.1056/NEJMoa1502309>; Hobbs, B.P. and Landin, R. (2018) <doi:10.1002/sim.7893>; Hobbs, B.P., Kane, M.J., Hong, D.S. and Landin, R. (2018) <doi:10.1093/annonc/mdy457>; and Kaizer, A.M., Koopmeiners, J.S. and Hobbs, B.P. (2017) <doi:10.1093/biostatistics/kxx031>.
BASS Bayesian Adaptive Spline Surfaces
Bayesian fitting and sensitivity analysis methods for adaptive spline surfaces. Built to handle continuous and categorical inputs as well as functional or scalar output. An extension of the methodology in Denison, Mallick and Smith (1998) <doi:10.1023/A:1008824606259>.
bastah Big Data Statistical Analysis for High-Dimensional Models
Big data statistical analysis for high-dimensional models is made possible by modifying lasso.proj() in ‘hdi’ package by replacing its nodewise-regression with sparse precision matrix computation using ‘BigQUIC’.
BatchExperiments Statistical Experiments on Batch Computing Clusters
Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.
BatchGetSymbols Downloads and Organizes Financial Data for Multiple Tickers
Makes it easy to download a large number of trade data from Yahoo or Google Finance.
BatchJobs Batch Computing with R
Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page.
batchscr Batch Script Helpers
Handy frameworks, such as error handling and log generation, for batch scripts. Use case: in scripts running in remote servers, set error handling mechanism for downloading and uploading and record operation log.
batchtools Tools for Computation on Batch Systems
As a successor of the packages ‘BatchJobs’ and ‘BatchExperiments’, this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers ‘IBM Spectrum LSF’ (<http://…/> ), ‘OpenLava’ (<http://…/> ), ‘Univia Grid Engine’/’Oracle Grid Engine’ (<http://…/> ), ‘Slurm’ (<http://…/> ), ‘Torque/PBS’ (<http://…/> ), or ‘Docker Swarm’ (<https://…/> ). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define large-scale computer experiments in a well-organized and reproducible way.
BaTFLED3D Bayesian Tensor Factorization Linked to External Data
BaTFLED is a machine learning algorithm designed to make predictions and determine interactions in data that varies along three independent modes. For example BaTFLED was developed to predict the growth of cell lines when treated with drugs at different doses. The first mode corresponds to cell lines and incorporates predictors such as cell line genomics and growth conditions. The second mode corresponds to drugs and incorporates predictors indicating known targets and structural features. The third mode corresponds to dose and there are no dose-specific predictors (although the algorithm is capable of including predictors for the third mode if present). See ‘BaTFLED3D_vignette.rmd’ for a simulated example.
batteryreduction An R Package for Data Reduction by Battery Reduction
Battery reduction is a method used in data reduction. It uses Gram-Schmidt orthogonal rotations to find out a subset of variables best representing the original set of variables.
bayesAB Fast Bayesian Methods for AB Testing
bayesAB provides a suite of functions that allow the user to analyze A/B test data in a Bayesian framework. bayesAB is intended to be a drop-in replacement for common frequentist hypothesis test such as the t-test and chi-sq test. Bayesian methods provide several benefits over frequentist methods in the context of A/B tests – namely in interpretability. Instead of p-values you get direct probabilities on whether A is better than B (and by how much). Instead of point estimates your posterior distributions are parametrized random variables which can be summarized any number of ways. Bayesian tests are also immune to ‘peeking’ and are thus valid whenever a test is stopped.
bayesammi Bayesian Estimation of the Additive Main Effects and Multiplicative Interaction Model
Performs Bayesian estimation of the additive main effects and multiplicative interaction (AMMI) model. The method is explained in Crossa, J., Perez-Elizalde, S., Jarquin, D., Cotes, J.M., Viele, K., Liu, G. and Cornelius, P.L. (2011) (<doi:10.2135/cropsci2010.06.0343>).
BayesBinMix Bayesian Estimation of Mixtures of Multivariate Bernoulli Distributions
Fully Bayesian inference for estimating the number of clusters and related parameters to heterogeneous binary data.
bayesboot An Implementation of Rubin’s (1981) Bayesian Bootstrap
Functions for performing the Bayesian bootstrap as introduced by Rubin (1981) <doi:10.1214/aos/1176345338> and for summarizing the result. The implementation can handle both summary statistics that works on a weighted version of the data and summary statistics that works on a resampled data set.
BayesBridge Bridge Regression
Bayesian bridge regression.
bayesCL Bayesian Inference on a GPU using OpenCL
Bayesian Inference on a GPU. The package currently supports sampling from PolyaGamma, Multinomial logit and Bayesian lasso.
BayesCombo Bayesian Evidence Combination
Combine diverse evidence across multiple studies to test a high level scientific theory. The methods can also be used as an alternative to a standard meta-analysis.
BayesCTDesign Two Arm Bayesian Clinical Trial Design with and Without Historical Control Data
A set of functions to help clinical trial researchers calculate power and sample size for two-arm Bayesian randomized clinical trials that do or do not incorporate historical control data. At some point during the design process, a clinical trial researcher who is designing a basic two-arm Bayesian randomized clinical trial needs to make decisions about power and sample size within the context of hypothesized treatment effects. Through simulation, the simple_sim() function will estimate power and other user specified clinical trial characteristics at user specified sample sizes given user defined scenarios about treatment effect,control group characteristics, and outcome. If the clinical trial researcher has access to historical control data, then the researcher can design a two-arm Bayesian randomized clinical trial that incorporates the historical data. In such a case, the researcher needs to work through the potential consequences of historical and randomized control differences on trial characteristics, in addition to working through issues regarding power in the context of sample size, treatment effect size, and outcome. If a researcher designs a clinical trial that will incorporate historical control data, the researcher needs the randomized controls to be from the same population as the historical controls. What if this is not the case when the designed trial is implemented? During the design phase, the researcher needs to investigate the negative effects of possible historic/randomized control differences on power, type one error, and other trial characteristics. Using this information, the researcher should design the trial to mitigate these negative effects. Through simulation, the historic_sim() function will estimate power and other user specified clinical trial characteristics at user specified sample sizes given user defined scenarios about historical and randomized control differences as well as treatment effects and outcomes. The results from historic_sim() and simple_sim() can be printed with print_table() and graphed with plot_table() methods. Outcomes considered are Gaussian, Poisson, Bernoulli, Lognormal, Weibull, and Piecewise Exponential.
bayesdfa Bayesian Dynamic Factor Analysis (DFA) with ‘Stan’
Implements Bayesian dynamic factor analysis with ‘Stan’. Dynamic factor analysis is a dimension reduction tool for multivariate time series. ‘bayesdfa’ extends conventional dynamic factor models in several ways. First, extreme events may be estimated in the latent trend by modeling process error with a student-t distribution. Second, autoregressive and moving average components can be optionally included. Third, the estimated dynamic factors can be analyzed with hidden Markov models to evaluate support for latent regimes.
bayesDP Tools for the Bayesian Discount Prior Function
Functions for augmenting data with historical controls using the Bayesian discount prior function for 1 arm and 2 arm clinical trials.
BayesESS Determining Effective Sample Size
Determines effective sample size of a parametric prior distribution in Bayesian models. To learn more about Bayesian effective sample size, see: Morita, S., Thall, P. F., & Muller, P. (2008) <https://…/25502095>.
BayesFactor Computation of Bayes Factors for Common Designs
A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.
BayesFactorExtras Extra functions for use with the BayesFactor R package
BayesFactorExtras is an R package which contains extra features related to the BayesFactor package, such as plots and analysis reports.
BayesFM Bayesian Inference for Factor Modeling
Collection of procedures to perform Bayesian analysis on a variety of factor models. Currently, it includes: Bayesian Exploratory Factor Analysis (befa), an approach to dedicated factor analysis with stochastic search on the structure of the factor loading matrix. The number of latent factors, as well as the allocation of the manifest variables to the factors, are not fixed a priori but determined during MCMC sampling. More approaches will be included in future releases of this package.
BayesGOF Bayesian Modeling via Goodness of Fit
Non-parametric method for learning prior distribution starting with parametric (subjective) prior. It performs four interconnected tasks: (i) characterizes the uncertainty of the elicited prior; (ii) exploratory diagnostic for checking prior-data conflict; (iii) computes the final statistical prior density estimate; and (iv) performs macro- and micro-inference. Primary reference is Mukhopadhyay, S. and Fletcher, D. (2017, Technical Report).
BayesH Bayesian Regression Model with Mixture of Two Scaled Inverse Chi Square as Hyperprior
Functions to performs Bayesian regression model with mixture of two scaled inverse chi square as hyperprior distribution for variance of each regression coefficient.
BayesianFROC FROC Analysis by Bayesian Approaches
For details please see vignettes in this package. This package aims to provide new methods for the so-called Free-response Receiver Operating Characteristic (FROC) analysis. The ultimate aim of FROC analysis is to compare observer performances, which means comparing characteristics, such as area under the curve (AUC) or figure of merit (FOM). In this package, we only use the notion of AUC for modality comparison. In the radiological FROC context, by a word modality we mean imaging methods such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT),Positron Emission Tomography (PET). So there is a problem that which imaging method is better to detect the lesions from shadows in radiographs. To solve the modality comparison issues, this package provides the new methods using hierarchical Bayesian models proposed by the author of this package. Using this package, one can obtain at least one conclusion that which imaging methods are better for finding lesions in radiographs with the case of your data. Fitting FROC statistical models is sometimes not so good, it can easily confirm by drawing FROC curves and comparing the curves and the points constructed by False Positive fractions (FPFs) and True Positive Fractions (TPFs), we can validate the goodness of fit intuitively. Such validation is also implemented by the Chi square goodness of fit statistics in the Bayesian context in which the parameter is not deterministic, thus by integrating it with the posterior predictive measure, we get a desired value. To compare each imaging methods, i.e., modalities, we evaluate the AUC for each modality which gives us a comparison of modalities. FROC is developed by Dev Chakraborty, his FROC model in 1989 paper relies on the maximal likelihood methodology. In this package, I modified and provided the alternative Bayesian FROC model. Strictly speaking, his model does not coincide with models in this package. I hope that medical researchers use not only the frequentist method but also alternative Bayesian methods. In medical research, many problems are considered under only frequentist methods, such as the notion of p-values. But p-value is sometimes misunderstood. Bayesian methods provide very simple, direct, intuitive answer for research questions. To know how to use this package, please execute the following codes from the R (R studio) console, demo(demo_MRMC,package=’BayesianFROC’); demo(demo_srsc, package=’BayesianFROC’); demo(demo_stan, package = ‘BayesianFROC’); demo(demo_drawcurves_srsc, package=’BayesianFROC’); demo_Bayesian_FROC(). References: Dev Chakraborty (1989) <doi:10.1118/1.596358> Maximum likelihood analysis of free – response receiver operating characteristic (FROC) data. Pre-print: Issei Tsunoda; Bayesian Models for free-response receiver operating characteristic analysis. Combining frequentist methods with Bayesian methods, we can obtain more reliable answer for research questions.
BayesianGLasso Bayesian Graphical Lasso
Implements a data-augmented block Gibbs sampler for simulating the posterior distribution of concentration matrices for specifying the topology and parameterization of a Gaussian Graphical Model (GGM). This sampler was originally proposed in Wang (2012) <doi:10.1214/12-BA729>.
BayesianNetwork Bayesian Network Modeling and Analysis
A Shiny web application for creating interactive Bayesian Network models, learning the structure and parameters of Bayesian networks, and utilities for classical network analysis.
BayesianTools General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics
General-purpose MCMC and SMC samplers, as well as plot and diagnostic functions for Bayesian statistics, with a particular focus on calibrating complex system models. Implemented samplers include various Metropolis MCMC variants (including adaptive and/or delayed rejection MH), the T-walk, two differential evolution MCMCs, two DREAM MCMCs, and a sequential Monte Carlo (SMC) particle filter.
bayesImageS Bayesian Methods for Image Segmentation using a Potts Model
Various algorithms for segmentation of 2D and 3D images, such as computed tomography and satellite remote sensing. This package implements Bayesian image analysis using the hidden Potts model with external field prior. Latent labels are sampled using chequerboard updating or Swendsen-Wang. Algorithms for the smoothing parameter include pseudolikelihood, path sampling, the exchange algorithm, and approximate Bayesian computation (ABC).
BayesLCA Bayesian Latent Class Analysis
Bayesian Latent Class Analysis using several different methods.
bayeslm Efficient Sampling for Gaussian Linear Regression with Arbitrary Priors
Efficient sampling for Gaussian linear regression with arbitrary priors.
bayesloglin Bayesian Analysis of Contingency Table Data
The function MC3() searches for log-linear models with the highest posterior probability. The function gibbsSampler() is a blocked Gibbs sampler for sampling from the posterior distribution of the log-linear parameters. The functions findPostMean() and findPostCov() compute the posterior mean and covariance matrix for decomposable models which, for these models, is available in closed form.
bayeslongitudinal Adjust Longitudinal Regression Models Using Bayesian Methodology
Adjusts longitudinal regression models using Bayesian methodology for covariance structures of composite symmetry (SC), autoregressive ones of order 1 AR (1) and autoregressive moving average of order (1,1) ARMA (1,1).
BayesMAMS Designing Bayesian Multi-Arm Multi-Stage Studies
Calculating Bayesian sample sizes for multi-arm trials where several experimental treatments are compared to a common control, perhaps even at multiple stages.
bayesmeta Bayesian Random-Effects Meta-Analysis
A collection of functions allowing to derive the posterior distribution of the two parameters in a random-effects meta-analysis, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, etc.
BayesNetBP Bayesian Network Belief Propagation
Belief propagation methods in Bayesian Networks to propagate evidence through the network. The implementation of these methods are based on the article: Cowell, RG (2005). Local Propagation in Conditional Gaussian Bayesian Networks <http://…/>.
BayesPiecewiseICAR Hierarchical Bayesian Model for a Hazard Function
Fits a piecewise exponential hazard to survival data using a Hierarchical Bayesian model with an Intrinsic Conditional Autoregressive formulation for the spatial dependency in the hazard rates for each piece. This function uses Metropolis- Hastings-Green MCMC to allow the number of split points to vary. This function outputs graphics that display the histogram of the number of split points and the trace plots of the hierarchical parameters. The function outputs a list that contains the posterior samples for the number of split points, the location of the split points, and the log hazard rates corresponding to these splits. Additionally, this outputs the posterior samples of the two hierarchical parameters, Mu and Sigma^2.
bayesplot Plotting for Bayesian Models
Plotting functions for posterior analysis, model checking, and MCMC diagnostics. The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of R packages for Bayesian modeling, particularly (but not exclusively) packages interfacing with Stan.
bayesreg Bayesian Regression Models with Continuous Shrinkage Priors
Fits linear or logistic regression model using Bayesian continuous shrinkage prior distributions. Handles ridge, lasso, horseshoe and horseshoe+ regression with logistic, Gaussian, Laplace or Student-t distributed targets.
Bayesrel Bayesian Reliability Estimation
So far, it provides the most common single test reliability estimates, being: Coefficient Alpha, Guttman’s lambda-2/-4/-6, greatest lower bound and Mcdonald’s Omega. The Bayesian estimates are provided with credible intervals. The method for the Bayesian estimates, except for omega, is sampling from the posterior inverse Wishart for the covariance matrix based measures. See Murphy (2007) <https://…/murphy-2007.pdf>. Gibbs Sampling from the joint conditional distributions of a single factor model in the case of omega. See Lee (2007, ISBN:978-0-470-02424-9). Methods for the glb are from Moltner and Revelle (2018) <https://…/glb.algebraic>; lambda-4 is from Benton (2015) <doi:10.1007/978-3-319-07503-7_19>; the principal factor analysis is from Schlegel (2017) <https://…/>; and the analytic alpha interval is from Bonnett and Wright (2014) <doi:10.1002/job.1960>.
BayesRS Bayes Factors for Hierarchical Linear Models with Continuous Predictors
Runs hierarchical linear Bayesian models. Samples from the posterior distributions of model parameters in JAGS (Just Another Gibbs Sampler; Plummer, 2003, <http://…/> ). Computes Bayes factors for group parameters of interest with the Savage-Dickey density ratio (Wetzels, Raaijmakers, Jakab, Wagenmakers, 2009, <doi:10.3758/PBR.16.4.752>).
BayesS5 Bayesian Variable Selection Using Simplified Shotgun Stochastic Search with Screening (S5)
In p >> n settings, full posterior sampling using existing Markov chain Monte Carlo (MCMC) algorithms is highly inefficient and often not feasible from a practical perspective. To overcome this problem, we propose a scalable stochastic search algorithm that is called the Simplified Shotgun Stochastic Search (S5) and aimed at rapidly explore interesting regions of model space and finding the maximum a posteriori(MAP) model. Also, the S5 provides an approximation of posterior probability of each model (including the marginal inclusion probabilities).
BayesSenMC Different Models of Posterior Distributions of Adjusted Odds Ratio
Generates different posterior distributions of adjusted odds ratio under different priors of sensitivity and specificity, and plots the models for comparison. It also provides estimations for the specifications of the models using diagnostics of exposure status with a non-linear mixed effects model. It implements the methods that are first proposed by Chu et al. (2006) <doi:10.1016/j.annepidem.2006.04.001> and Chu et al. (2010) <doi:10.1177/0272989X09353452>.
BayesSpec Bayesian Spectral Analysis Techniques
An implementation of methods for spectral analysis using the Bayesian framework. It includes functions for modelling spectrum as well as appropriate plotting and output estimates. There is segmentation capability with RJ MCMC (Reversible Jump Markov Chain Monte Carlo). The package takes these methods predominantly from the 2012 paper ‘AdaptSPEC: Adaptive Spectral Estimation for Nonstationary Time Series’ <DOI:10.1080/01621459.2012.716340>.
BayesSummaryStatLM MCMC Sampling of Bayesian Linear Models via Summary Statistics
Methods for generating Markov Chain Monte Carlo (MCMC) posterior samples of Bayesian linear regression model parameters that require only summary statistics of data as input. Summary statistics are useful for systems with very limited amounts of physical memory. The package provides two functions: one function that computes summary statistics of data and one function that carries out the MCMC posterior sampling for Bayesian linear regression models where summary statistics are used as input. The function utilizes the R package ‘ff’ to handle data sets that are too large to fit into a user’s physical memory, by reading in data in chunks.
bayestestR Understand and Describe Bayesian Models and Posterior Distributions
Provides utilities to describe posterior distributions and Bayesian models. It includes point-estimates such as Maximum A Posteriori (MAP), measures of dispersion (Highest Density Interval – HDI; Kruschke, 2014 <doi:10.1016/B978-0-12-405888-0.09999-2>) and indices used for null-hypothesis testing (such as ROPE percentage and pd).
BayesTree Bayesian Additive Regression Trees
Implementation of BART:Bayesian Additive Regression Trees, Chipman, George, McCulloch (2010)
BayesTreePrior Bayesian Tree Prior Simulation
Provides a way to simulate from the prior distribution of Bayesian trees by Chipman et al. (1998) <DOI:10.2307/2669832>. The prior distribution of Bayesian trees is highly dependent on the design matrix X, therefore using the suggested hyperparameters by Chipman et al. (1998) <DOI:10.2307/2669832> is not recommended and could lead to unexpected prior distribution. This work is part of my master thesis (In revision, expected 2016) and a journal publication I’m working on.
BayesVarSel Bayes Factors, Model Choice and Variable Selection in Linear Models
Conceived to calculate Bayes factors in linear models and then to provide a formal Bayesian answer to testing and variable selection problems. From a theoretical side, the emphasis in this package is placed on the prior distributions and it allows a wide range of them: Jeffreys (1961); Zellner and Siow(1980)<doi:10.1007/bf02888369>; Zellner and Siow(1984); Zellner (1986)<doi:10.2307/2233941>; Fernandez et al. (2001)<doi:10.1016/s0304-4076(00)00076-2>; Liang et al. (2008)<doi:10.1198/016214507000001337> and Bayarri et al. (2012)<doi:10.1214/12-aos1013>. The interaction with the package is through a friendly interface that syntactically mimics the well-known lm() command of R. The resulting objects can be easily explored providing the user very valuable information (like marginal, joint and conditional inclusion probabilities of potential variables; the highest posterior probability model, HPM; the median probability model, MPM) about the structure of the true -data generating- model. Additionally, this package incorporates abilities to handle problems with a large number of potential explanatory variables through parallel and heuristic versions of the main commands, Garcia-Donato and Martinez-Beneito (2013)<doi:10.1080/01621459.2012.742443>.
bayfoxr Global Bayesian Foraminifera Core Top Calibration
A Bayesian, global planktic foraminifera core top calibration to modern sea-surface temperatures. Includes four calibration models, considering species-specific calibration parameters and seasonality.
bazar Miscellaneous Basic Functions
A collection of miscellaneous functions for copying objects to the clipboard (‘Copy’); manipulating strings (‘concat’, ‘mgsub’, ‘trim’, ‘verlan’); loading or showing packages (‘library_with_rep’, ‘require_with_rep’, ‘sessionPackages’); creating or testing for named lists (‘nlist’, ‘as.nlist’, ‘is.nlist’), formulas (‘is.formula’), empty objects (‘as.empty’, ‘is.empty’), whole numbers (‘as.wholenumber’, ‘is.wholenumber’); testing for equality (‘almost.equal’, ‘’); getting modified versions of usual functions (‘rle2’, ‘sumNA’); making a pause or a stop (‘pause’, ‘stopif’); and others (‘erase’, ‘%nin%’, ‘unwhich’).
bbw Blocked Weighted Bootstrap
The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population-proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in rapid assessment method or RAM and simple spatial sampling method or S3M surveys). The method was developed by Accion Contra la Faim, Brixton Health, Concern Worldwide, Global Alliance for Improved Nutrition, UNICEF Sierra Leone, UNICEF Sudan and Valid International. It has been tested by the Centers for Disease Control (CDC) using infant and young child feeding (IYCF) data. See Cameron et al (2008) <doi:10.1162/rest.90.3.414> for application of bootstrap to cluster samples. See Aaron et al (2016) <doi:10.1371/journal.pone.0163176> and Aaron et al (2016) <doi:10.1371/journal.pone.0162462> for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys.
BCEA Bayesian Cost Effectiveness Analysis
Produces an economic evaluation of a Bayesian model in the form of MCMC simulations. Given suitable variables of cost and effectiveness / utility for two or more interventions, BCEA computes the most cost-effective alternative and produces graphical summaries and probabilistic sensitivity analysis.
BCEE The Bayesian Causal Effect Estimation Algorithm
Implementation of the Bayesian Causal Effect Estimation algorithm, a data-driven method for the estimation of the causal effect of a continuous exposure on a continuous outcome. For more details, see Talbot et al. (2015).
bcf Causal Inference for a Binary Treatment and Continuous Outcome using Bayesian Causal Forests
Causal inference for a binary treatment and continuous outcome using Bayesian Causal Forests. See Hahn, Murray and Carvalho (2017) <arXiv:1706.09523> for additional information. This implementation relies on code originally accompanying Pratola et. al. (2013) <arXiv:1309.1906>.
bcgam Bayesian Constrained Generalised Linear Models
Fits generalised partial linear regression models using a Bayesian approach, where shape and smoothness constraints are imposed on nonparametrically modelled predictors through shape-restricted splines, and no constraints are imposed on optional parametrically modelled covariates. See Meyer et al. (2011) <doi/10.1080/10485252.2011.597852> for more details. IMPORTANT: before installing ‘bcgam’, you need to install ‘Rtools’ (Windows) or ‘Xcode’ (Mac OS X). These are required for the correct installation of ‘nimble’ (<https://…/download> ).
bcpa Behavioral change point analysis of animal movement
The Behavioral Change Point Analysis (BCPA) is a method of identifying hidden shifts in the underlying parameters of a time series, developed specifically to be applied to animal movement data which is irregularly sampled. The method is based on: E. Gurarie, R. Andrews and K. Laidre A novel method for identifying behavioural changes in animal movement data (2009) Ecology Letters 12:5 395-408.
bcROCsurface Bias-Corrected Methods for Estimating the ROC Surface of Continuous Diagnostic Tests
The bias-corrected estimation methods for the receiver operating characteristics ROC surface and the volume under ROC surfaces (VUS) under missing at random (MAR) assumption.
bcrypt Blowfish’ Password Hashing Algorithm
An R interface to the ‘OpenBSD Blowfish’ password hashing algorithm, as described in ‘A Future-Adaptable Password Scheme’ by ‘Niels Provos’. The implementation is derived from the ‘py-bcrypt’ module for Python which is a wrapper for the ‘OpenBSD’ implementation.
bcs Bayesian Compressive Sensing Using Laplace Priors
A Bayesian method for solving the compressive sensing problem. In particular, this package implements the algorithm ‘Fast Laplace’ found in the paper ‘Bayesian Compressive Sensing Using Laplace Priors’ by Babacan, Molina, Katsaggelos (2010) <DOI:10.1109/TIP.2009.2032894>.
bdchecks Biodiversity Data Checks
Supplies a Shiny app and a set of functions to perform and managing data checks for biodiversity data.
bdclean A User-Friendly Biodiversity Data Cleaning App for the Inexperienced R User
Provides features to manage the complete workflow for biodiversity data cleaning. Uploading data, gathering input from users (in order to adjust cleaning procedures), cleaning data and finally, generating various reports and several versions of the data. Facilitates user-level data cleaning, designed for the inexperienced R user. T Gueta et al (2018) <doi:10.3897/biss.2.25564>. T Gueta et al (2017) <doi:10.3897/tdwgproceedings.1.20311>.
BDEsize Efficient Determination of Sample Size in Balanced Design of Experiments
Provides the sample size in balanced design of experiments and three graphs ; detectable standardized effect size vs power, sample size vs detectable standardized effect size, and sample size vs power. Sample size is computed in order to detect a certain standardized effect size with power at the significance level. Three graphs show the mutual relationship between the sample size, power and the detectable standardized effect size. By investigating those graphs, it can be checked that which effects are sensitive to the efficient sample size determination. Lenth,R.V.(2006-9) <http://…/Power> Lim, Yong Bin (1998) Marvin, A., Kastenbaum, A. and Hoel, D.G. (1970) <doi:10.2307/2334851> Montgomery, Douglas C. (2013, ISBN: 0849323312).
BDgraph Bayesian Structure Learning in Graphical Models using Birth-Death MCMC
Provides statistical tools for Bayesian structure learning in undirected graphical models for continuous, discrete, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models literature, including Mohammadi and Wit (2015) <doi:10.1214/14-BA889> and Mohammadi et al. (2017) <doi:10.1111/rssc.12171>. To speed up the computations, the BDMCMC sampling algorithms are implemented in parallel using OpenMP in C++.
bdlp Transparent and Reproducible Artificial Data Generation
The main function generateDataset() processes a user-supplied .R file that contains metadata parameters in order to generate actual data. The metadata parameters have to be structured in the form of metadata objects, the format of which is outlined in the package vignette. This approach allows to generate artificial data in a transparent and reproducible manner.
bdots Bootstrapped Differences of Time Series
Analyze differences among time series curves with Oleson et al’s modified p-value technique.
bdpopt Optimisation of Bayesian Decision Problems
Optimisation of the expected utility in single-stage and multi-stage Bayesian decision problems. The expected utility is estimated by simulation. For single-stage problems, JAGS is used to draw MCMC samples.
bdvis Biodiversity Data Visualizations
Biodiversity data visualizations using R would be helpful to understand completeness of biodiversity inventory, extent of geographical, taxonomic and temporal coverage, gaps and biases in data.
BDWreg Bayesian Inference for Discrete Weibull Regression
A Bayesian regression model for discrete response, where the conditional distribution is modelled via a discrete Weibull distribution. This package provides an implementation of Metropolis-Hastings and Reversible-Jumps algorithms to draw samples from the posterior. It covers a wide range of regularizations through any two parameter prior. Examples are Laplace (Lasso), Gaussian (ridge), Uniform, Cauchy and customized priors like a mixture of priors. An extensive visual toolbox is included to check the validity of the results as well as several measures of goodness-of-fit.
BE Bioequivalence Study Data Analysis
Analyze bioequivalence study data in a industrial strength. Sample size could be determined for various crossover designs, such as 2×2 design, 2×4 design, 4×4 design, Balaam design, Two-sequence dual design, and William design. Reference: Chow SC, Liu JP. Design and Analysis of Bioavailability and Bioequivalence Studies. 3rd ed. (2009, ISBN:978-1-58488-668-6).
beam Fast Bayesian Inference in Large Gaussian Graphical Models
Fast Bayesian inference of marginal and conditional independence structures between variables from high-dimensional data (Leday and Richardson (2018) <arXiv:1803.08155>).
beast Bayesian Estimation of Change-Points in the Slope of Multivariate Time-Series
Assume that a temporal process is composed of contiguous segments with differing slopes and replicated noise-corrupted time series measurements are observed. The unknown mean of the data generating process is modelled as a piecewise linear function of time with an unknown number of change-points. The package infers the joint posterior distribution of the number and position of change-points as well as the unknown mean parameters per time-series by MCMC sampling. A-priori, the proposed model uses an overfitting number of mean parameters but, conditionally on a set of change-points, only a subset of them influences the likelihood. An exponentially decreasing prior distribution on the number of change-points gives rise to a posterior distribution concentrating on sparse representations of the underlying sequence, but also available is the Poisson distribution. See Papastamoulis et al (2017) <arXiv:1709.06111> for a detailed presentation of the method.
beginr Functions for R Beginners
Useful functions for R beginners, including hints for the arguments of the ‘plot()’ function, self-defined functions for error bars, user-customized pair plots and hist plots, enhanced linear regression figures, etc.. This package could be helpful to R experts as well.
behaviorchange Tools for Behavior Change Researchers and Professionals
Contains specialised analyses and visualisation tools for behavior change science. These facilitate conducting determinant studies (for example, using confidence interval-based estimation of relevance, CIBER, or CIBERlite plots) and systematically developing, reporting, and analysing interventions (for example, using acyclic behavior change diagrams). This package is especially useful for researchers in the field of behavior change or health psychology and to behavior change professionals such as intervention developers and prevention workers.
belg Boltzmann Entropy of a Landscape Gradient
Calculates the Boltzmann entropy of a landscape gradient. It uses the analytical method created by Gao, P., Zhang, H. and Li, Z., 2018 (<doi:10.1111/tgis.12315>).
benchr High Precise Measurement of R Expressions Execution Time
Provides infrastructure to accurately measure and compare the execution time of R expressions.
bentcableAR Bent-Cable Regression for Independent Data or Autoregressive Time Series
Included are two main interfaces for fitting and diagnosing bent-cable regressions for autoregressive time-series data or independent data (time series or otherwise): ‘’ and ‘’. Some components in the package can also be used as stand-alone functions. The bent cable (linear-quadratic-linear) generalizes the broken stick (linear-linear), which is also handled by this package. Version 0.2 corrects a glitch in the computation of confidence intervals for the CTP. References that were updated from Versions 0.2.1 and 0.2.2 appear in Version 0.2.3 and up. Version 0.3.0 improves robustness of the error-message producing mechanism. It is the author’s intention to distribute any future updates via GitHub.
Bergm Bayesian Exponential Random Graph Models
Set of tools to analyse Bayesian exponential random graph models.
BeSS Best Subset Selection for Sparse Generalized Linear Model and Cox Model
An implementation of best subset selection in generalized linear model and Cox proportional hazard model via the primal dual active set algorithm. The algorithm formulates coefficient parameters and residuals as primal and dual variables and utilizes efficient active set selection strategies based on the complementarity of the primal and dual variables.
bestNormalize Normalizing Transformation Functions
Estimate a suite of normalizing transformations, including a new technique based on ranks which can guarantee normally distributed transformed data if there are no ties: Ordered Quantile Normalization. The package is built to estimate the best normalizing transformation for a vector consistently and accurately. It implements the Box-Cox transformation, the Yeo-Johnson transformation, three types of Lambert WxF transformations, and the Ordered Quantile normalization transformation.
betaboost Boosting Beta Regression
Implements boosting beta regression for potentially high-dimensional data (Mayr et al., 2018 <doi:10.1093/ije/dyy093>). The ‘betaboost’ package uses the same parametrization as ‘betareg’ (Cribari-Neto and Zeileis, 2010 <doi:10.18637/jss.v034.i02>) to make results directly comparable. The underlying algorithms are implemented via the R add-on packages ‘mboost’ (Hofner et al., 2014 <doi:10.1007/s00180-012-0382-5>) and ‘gamboostLSS’ (Mayr et al., 2012 <doi:10.1111/j.1467-9876.2011.01033.x>).
betacal Beta Calibration
Fit beta calibration models and obtain calibrated probabilities from them.
betas Standardized Beta Coefficients
Computes standardized beta coefficients and corresponding standard errors for the following models: – linear regression models with numerical covariates only – linear regression models with numerical and factorial covariates – weighted linear regression models – robust linear regression models with numerical covariates only.
beyondWhittle Bayesian Spectral Inference for Stationary Time Series
Implementations of a Bayesian parametric (autoregressive), a Bayesian nonparametric (Whittle likelihood with Bernstein-Dirichlet prior) and a Bayesian semiparametric (autoregressive likelihood with Bernstein-Dirichlet correction) procedure are provided. The work is based on the corrected parametric likelihood by C. Kirch et al (2017) <arXiv:1701.04846>. It was supported by DFG grant KI 1443/3-1.
bfork Basic Unix Process Control
Wrappers for fork()/waitpid() meant to allow R users to quickly and easily fork child processes and wait for them to finish.
bfp Bayesian Fractional Polynomials
Implements the Bayesian paradigm for fractional polynomial models under the assumption of normally distributed error terms.
bgeva Binary Generalized Extreme Value Additive Models
Routine for fitting regression models for binary rare events with linear and nonlinear covariate effects when using the quantile function of the Generalized Extreme Value random variable.
BGLR Bayesian Generalized Linear Regression
Bayesian Generalized Linear Regression.
bgsmtr Bayesian Group Sparse Multi-Task Regression
Fits a Bayesian group-sparse multi-task regression model using Gibbs sampling. The hierarchical prior encourages shrinkage of the estimated regression coefficients at both the gene and SNP level. The model has been applied successfully to imaging phenotypes of dimension up to 100; it can be used more generally for multivariate (non-imaging) phenotypes.
BH Boost C++ Header Files
Boost provides free peer-reviewed portable C++ source libraries. A large part of Boost is provided as C++ template code which is resolved entirely at compile-time without linking. This package aims to provide the most useful subset of Boost libraries for template use among CRAN package. By placing these libraries in this package, we offer a more efficient distribution system for CRAN as replication of this code in the sources of other packages is avoided.
BHPMF Uncertainty Quantified Matrix Completion using Bayesian Hierarchical Matrix Factorization
Fills the gaps of a matrix incorporating a hierarchical side information while providing uncertainty quantification.
bhrcr Bayesian Hierarchical Regression on Clearance Rates in the Presence of Lag and Tail Phases
An implementation of the Bayesian Clearance Estimator (Fogarty et al. (2015) <doi:10.1111/biom.12307>). It takes serial measurements of a response on an individual (e.g., parasite load after treatment) that is decaying over time and performs Bayesian hierarchical regression of the clearance rates on the given covariates. This package provides tools to calculate WWARN PCE (WorldWide Antimalarial Resistance Network’s Parasite Clearance Estimator) estimates of the clearance rates as well.
bib2df Parse a BibTeX File to a Tibble
Parse a BibTeX file to a tidy tibble (trimmed down version of data.frame) to make it accessible for further analysis and visualization.
BiBitR R Wrapper for Java Implementation of BiBit
A simple R wrapper for the Java BiBit algorithm from ‘A biclustering algorithm for extracting bit-patterns from binary datasets’ from Domingo et al. (2011) <DOI:10.1093/bioinformatics/btr464>. An adaption for the BiBit algorithm which allows noise in the biclusters is also included.
BibPlots Plot Functions for JIF (Journal Impact Factor) and Paper Percentiles
Currently, the package provides two functions for plotting and analyzing bibliometric data (JIF and paper percentile values). Further extension to more plot variants is planned.
biclique Maximal Complete Bipartite Graphs
A tool for enumerating maximal complete bipartite graphs. The input should be a edge list file or a binary matrix file. The output are maximal complete bipartite graphs. Algorithms used can be found in this paper Y Zhang et al. BMC Bioinformatics 2014 15:110 <doi:10.1186/1471-2105-15-110>.
bife Binary Choice Models with Fixed Effects
Estimates fixed effects binary choice models (logit and probit) with potentially many individual fixed effects and computes average partial effects. Incidental parameter bias can be reduced with a bias-correction proposed by Hahn and Newey (2004) <doi:10.1111/j.1468-0262.2004.00533.x>.
BIGDAWG Case-Control Analysis of Multi-Allelic Loci
Data sets and functions for chi-squared Hardy-Weinberg and case-control association tests of highly polymorphic genetic data [e.g., human leukocyte antigen (HLA) data]. Performs association tests at multiple levels of polymorphism (haplotype, locus and HLA amino-acids) as described in Pappas DJ, Marin W, Hollenbach JA, Mack SJ (2016) <doi:10.1016/j.humimm.2015.12.006>. Combines rare variants to a common class to account for sparse cells in tables as described by Hollenbach JA, Mack SJ, Thomson G, Gourraud PA (2012) <doi:10.1007/978-1-61779-842-9_14>.
bigdist Store Distance Matrices on Disk
Provides utilities to compute, store and access distance matrices on disk as file-backed matrices provided by the ‘bigstatsr’ package. File-backed distance matrices are stored as a symmetric matrix to facilitate out-of-memory operations on file-backed matrix while the in-memory ‘dist’ object stores only the lower diagonal elements. ‘disto’ provides an unified interface to work with in-memory and disk-based distance matrices.
bigFastlm Fast Linear Models for Objects from the ‘bigmemory’ Package
A reimplementation of the fastLm() functionality of ‘RcppEigen’ for big.matrix objects for fast out-of-memory linear model fitting.
bigIntegerAlgos R Tool for Factoring Big Integers
Features the multiple polynomial quadratic sieve algorithm for factoring large integers and a vectorized factoring function that returns the complete factorization of an integer. Utilizes the C library GMP (GNU Multiple Precision Arithmetic) and classes created by Antoine Lucas et al. found in the ‘gmp’ package.
bigKRLS Optimized Kernel Regularized Least Squares
Functions for Kernel-Regularized Least Squares optimized for speed and memory usage are provided along with visualization tools. For working papers, sample code, and recent presentations visit <https://…/>.
biglasso Big Lasso: Extending Lasso Model Fitting to Big Data in R
Extend lasso and elastic-net model fitting for ultrahigh-dimensional, multi-gigabyte data sets that cannot be loaded into memory. Compared to existing lasso-fitting packages, it preserves equivalently fast computation speed but is much more memory-efficient, thus allowing for very powerful big data analysis even with only a single laptop.
bigmatch Making Optimal Matching Size-Scalable Using Optimal Calipers
Implements optimal matching with near-fine balance in large observational studies with the use of optimal calipers to get a sparse network. The caliper is optimal in the sense that it is as small as possible such that a matching exists. Glover, F. (1967). <DOI:10.1002/nav.3800140304>. Katriel, I. (2008). <DOI:10.1287/ijoc.1070.0232>. Rosenbaum, P.R. (1989). <DOI:10.1080/01621459.1989.10478868>. Yang, D., Small, D. S., Silber, J. H., and Rosenbaum, P. R. (2012). <DOI:10.1111/j.1541-0420.2011.01691.x>.
bigReg Generalized Linear Models (GLM) for Large Data Sets
Allows the user to carry out GLM on very large data sets. Data can be created using the data_frame() function and appended to the object with object$append(data); data_frame and data_matrix objects are available that allow the user to store large data on disk. The data is stored as doubles in binary format and any character columns are transformed to factors and then stored as numeric (binary) data while a look-up table is stored in a separate .meta_data file in the same folder. The data is stored in blocks and GLM regression algorithm is modified and carries out a MapReduce- like algorithm to fit the model. The functions bglm(), and summary() and bglm_predict() are available for creating and post-processing of models. The library requires Armadillo installed on your system. It probably won’t function on windows since multi-core processing is done using mclapply() which forks R on Unix/Linux type operating systems.
bigrquery An Interface to Google’s BigQuery API
Easily talk to Google’s BigQuery database from R.
bigRR Generalized Ridge Regression (with special advantage for p >> n cases)
The package fits large-scale (generalized) ridge regression for various distributions of response. The shrinkage parameters (lambdas) can be pre-specified or estimated using an internal update routine (fitting a heteroscedastic effects model, or HEM). It gives possibility to shrink any subset of parameters in the model. It has special computational advantage for the cases when the number of shrinkage parameters exceeds the number of observations. For example, the package is very useful for fitting large-scale omics data, such as high-throughput genotype data (genomics), gene expression data (transcriptomics), metabolomics data, etc.
BigSEM Constructing Large Systems of Structural Equations
Construct large systems of structural equations using the two-stage penalized least squares (2SPLS) method proposed by Chen, Zhang and Zhang (2016).
bigsnpr Analysis of Massive SNP Arrays
Easy-to-use, efficient, flexible and scalable tools for the analysis of massive SNP arrays. Preprint: Privé et al. (2017) <doi:10.1101/190926>.
bigstatsr Statistical Tools for Filebacked Big Matrices
Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more. A scientific paper associated with this package is in preparation.
bigstep Stepwise Selection for Large Data Sets
Selecting linear models for large data sets using modified stepwise procedure and modern selection criteria (like modifications of Bayesian Information Criterion). Selection can be performed on data which exceed RAM capacity. Special selection strategy is available, faster than classical stepwise procedure.
bigtcr Nonparametric Analysis of Bivariate Gap Time with Competing Risks
For studying recurrent disease and death with competing risks, comparisons based on the well-known cumulative incidence function can be confounded by different prevalence rates of the competing events. Alternatively, comparisons of the conditional distribution of the survival time given the failure event type are more relevant for investigating the prognosis of different patterns of recurrence disease. This package implements a nonparametric estimator for the conditional cumulative incidence function and a nonparametric conditional bivariate cumulative incidence function for the bivariate gap times proposed in Huang et al. (2016) <doi:10.1111/biom.12494>.
bigtime Sparse Estimation of Large Time Series Models
Estimation of large Vector AutoRegressive (VAR), Vector AutoRegressive with Exogenous Variables X (VARX) and Vector AutoRegressive Moving Average (VARMA) Models with Structured Lasso Penalties, see Nicholson, Bien and Matteson (2017) <arXiv:1412.5250v2> and Wilms, Basu, Bien and Matteson (2017) <arXiv:1707.09208>.
billboarder Create Interactive Chart with the JavaScript ‘Billboard’ Library
Provides an ‘htmlwidgets’ interface to ‘billboard.js’, a re-usable easy interface JavaScript chart library, based on D3 v4+. Chart types include line charts, scatterplots, bar charts, pie/donut charts and gauge charts. All charts are interactive, and a proxy method is implemented to smoothly update a chart without rendering it again in ‘shiny’ apps.
bimixt Estimates Mixture Models for Case-Control Data
Estimates non-Gaussian mixture models of case-control data. The four types of models supported are binormal, two component constrained, two component unconstrained, and four component. The most general model is the four component model, under which both cases and controls are distributed according to a mixture of two unimodal distributions. In the four component model, the two component distributions of the control mixture may be distinct from the two components of the case mixture distribution. In the two component unconstrained model, the components of the control and case mixtures are the same; however the mixture probabilities may differ for cases and controls. In the two component constrained model, all controls are distributed according to one of the two components while cases follow a mixture distribution of the two components. In the binormal model, cases and controls are distributed according to distinct unimodal distributions. These models assume that Box-Cox transformed case and control data with a common lambda parameter are distributed according to Gaussian mixture distributions. Model parameters are estimated using the expectation-maximization (EM) algorithm. Likelihood ratio test comparison of nested models can be performed using the lr.test function. AUC and PAUC values can be computed for the model-based and empirical ROC curves using the auc and pauc functions, respectively. The model-based and empirical ROC curves can be graphed using the roc.plot function. Finally, the model-based density estimates can be visualized by plotting a model object created with the bimixt.model function.
BimodalIndex The Bimodality Index
Defines the functions used to compute the bimodal index as defined by Wang et al. (2009) <https://…/>.
Binarize Binarization of One-Dimensional Data
Provides methods for the binarization of one-dimensional data and some visualization functions.
BinarybalancedCut Threshold Cut Point of Probability for a Binary Classifier Model
Allows to view the optimal probability cut-off point at which the Sensitivity and Specificity meets and its a best way to minimize both Type-1 and Type-2 error for a binary Classifier in determining the Probability threshold.
BinaryEMVS Variable Selection for Binary Data Using the EM Algorithm
Implements variable selection for high dimensional datasets with a binary response variable using the EM algorithm. Both probit and logit models are supported. Also included is a useful function to generate high dimensional data with correlated variables.
BinaryEPPM Mean and Variance Modeling of Binary Data
Modeling under- and over-dispersed binary data using extended Poisson process models (EPPM).
binaryGP Fit and Predict a Gaussian Process Model with (Time-Series) Binary Response
Allows the estimation and prediction for binary Gaussian process model. The mean function can be assumed to have time-series structure. The estimation methods for the unknown parameters are based on penalized quasi-likelihood/penalized quasi-partial likelihood and restricted maximum likelihood. The predicted probability and its confidence interval are computed by Metropolis-Hastings algorithm. More details can be seen in Sung et al (2017) <arXiv:1705.02511>.
binaryLogic Binary Logic
Convert to binary numbers (Base2). Shift, rotate, summary. Based on logical vector.
binb binb’ is not ‘Beamer’
A collection of ‘LaTeX’ styles using ‘Beamer’ customization for pdf-based presentation slides in ‘RMarkdown’. At present it contains ‘RMarkdown’ adaptations of the LaTeX themes ‘Metropolis’ (formerly ‘mtheme’) theme by Matthias Vogelgesang and others (now included in ‘TeXLive’), and the ‘IQSS’ by Ista Zahn (which is included here). Additional (free) fonts may be needed: ‘Metropolis’ prefers ‘Fira’, and ‘IQSS’ requires ‘Libertinus’.
bindr Parametrized Active Bindings
Provides a simple interface for creating active bindings where the bound function accepts additional arguments.
bindrcpp An ‘Rcpp’ Interface to Active Bindings
Provides an easy way to fill an environment with active bindings that call a C++ function.
binman A Binary Download Manager
Tools and functions for managing the download of binary files. Binary repositories are defined in ‘YAML’ format. Defining new pre-download, download and post-download templates allow additional repositories to be added.
binnednp Nonparametric Estimation for Interval-Grouped Data
Kernel density and distribution estimation for interval-grouped data (Reyes, Francisco-Fernandez and Cao 2016, 2017) <doi:10.1080/10485252.2016.1163348>, <doi:10.1007/s11749-017-0523-9>, (Gonzalez-Andujar, Francisco-Fernandez, Cao, Reyes, Urbano, Forcella and Bastida 2016) <doi:10.1111/wre.12216> and nonparametric estimation of seedling emergence indices (Cao, Francisco-Fernandez, Anand, Bastida and Gonzalez-Andujar 2011) <doi:10.1017/S002185961100030X>.
binomen Taxonomic’ Specification and Parsing Methods
Includes functions for working with taxonomic data, including functions for combining, separating, and filtering taxonomic groups by any rank or name. Allows standard (SE) and non-standard evaluation (NSE).
BinQuasi Analyzing Replicated ChIP Sequencing Data Using Quasi-Likelihood
Identify peaks in ChIP-seq data with biological replicates using a one-sided quasi-likelihood ratio test in quasi-Poisson or quasi-negative binomial models.
binsmooth Generate PDFs and CDFs from Binned Data
Provides several methods for generating density functions based on binned data. Data are assumed to be nonnegative, but the bin widths need not be uniform, and the top bin may be unbounded. All PDF smoothing methods maintain the areas specified by the binned data. (Equivalently, all CDF smoothing methods interpolate the points specified by the binned data.) An estimate for the mean of the distribution may be supplied as an optional argument, which greatly improves the reliability of statistics computed from the smoothed density functions. Methods include step function, recursive subdivision, and optimized spline.
binsreg Binscatter Estimation and Inference
Provides tools for statistical analysis using the binscatter methods developed by Cattaneo, Crump, Farrell and Feng (2019a) <arXiv:1902.09608> and Cattaneo, Crump, Farrell and Feng (2019b) <arXiv:1902.09615>. Binscatter provides a flexible way of describing the mean relationship between two variables based on partitioning/binning of the independent variable of interest. binsreg() implements binscatter estimation and robust (pointwise and uniform) inference of regression functions and derivatives thereof, with particular focus on constructing binned scatter plots. binsregtest() implements hypothesis testing procedures for parametric functional forms of and nonparametric shape restrictions on the regression function. binsregselect() implements data-driven procedures for selecting the number of bins for binscatter estimation. All the commands allow for covariate adjustment, smoothness restrictions and clustering.
binst Data Preprocessing, Binning for Classification and Regression
Various supervised and unsupervised binning tools including using entropy, recursive partition methods and clustering.
BiocManager Access the Bioconductor Project Package Repository
A convenient tool to install and update Bioconductor packages.
Biocomb Feature Selection and Classification with the Embedded Validation Procedures for Biomedical Data Analysis
Contains functions for the data analysis with the emphasis on biological data, including several algorithms for feature ranking, feature selection, classification algorithms with the embedded validation procedures. The functions can deal with numerical as well as with nominal features. Includes also the functions for calculation of feature AUC (Area Under the ROC Curve) and HUM (hypervolume under manifold) values and construction 2D- and 3D- ROC curves. Biocomb provides the calculation of Area Above the RCC (AAC) values and construction of Relative Cost Curves (RCC) to estimate the classifier performance under unequal misclassification costs problem. Biocomb has the special function to deal with missing values, including different imputing schemes.
biogeo Point Data Quality Assessment and Coordinate Conversion
Functions for error detection and correction in point data quality datasets that are used in species distribution modelling. Includes functions for parsing and converting coordinates into decimal degrees from various formats.
Bioi Biological Image Analysis
Single linkage clustering and connected component analyses are often performed on biological images. ‘Bioi’ provides a set of functions for performing these tasks. This functionality is implemented in several key functions that can extend to from 1 to many dimensions. The single linkage clustering method implemented here can be used on n-dimensional data sets, while connected component analyses are limited to 3 or fewer dimensions.
bioplots Visualization of Overlapping Results with Heatmap
Visualization of complex biological datasets is essential to understand complementary spects of biology in big data era. In addition, analyzing of multiple datasets enables to understand biologcal processes deeply and accurately. Multiple datasets produce multiple analysis results, and these overlappings are usually visualized in Venn diagram. bioplots is a tiny R package that generates a heatmap to visualize overlappings instead of using Venn diagram.
biorxivr Search and Download Papers from the bioRxiv Preprint Server
The bioRxiv preprint server ( ) is a website where scientists can post preprints of scholarly texts in biology. Users can search and download PDFs in bulk from the preprint server. The text of abstracts are stored as raw text within R, and PDFs can easily be saved and imported for text mining with packages such as ‘tm’.
Bios2cor From Biological Sequences and Simulations to Correlation Analysis
The package is dedicated to computation and analysis of correlation/co-variation in multiple sequence alignments and in side chain motions during molecular dynamics simulations. Features include the ability to compute correlation/co-variation using a variety of scoring functions between either sequence positions in alignments or side chain dihedral angles in molecular dynamics simulations and to analyze the correlation/co-variation matrix through a variety of tools including network representation and principal components analysis. In addition, several utility functions are based on the R graphical environment to provide friendly tools for help in data interpretation. Examples of sequence co-variation analysis and utility tools are provided in: Pele J, Moreau M, Abdi H, Rodien P, Castel H, Chabbert M. (2014) <doi:10.1002/prot.24570>. This work was supported by the Franch National Research Agency (Grant number: ANR-11-BSV2-026).
bioset Convert a Matrix of Raw Values into Nice and Tidy Data
Functions to help dealing with raw data from measurements, like reading and transforming raw values organised in matrices, calculating and converting concentrations and calculating precision of duplicates / triplicates / … . It is compatible with and building on top of ‘tidyverse’-packages.
biospear Biomarker Selection in Penalized Regression Models
Provides some tools for developing and validating prediction models, estimate expected survival of patients and visualize them graphically. Most of the implemented methods are based on penalized regressions such as: the lasso (Tibshirani R (1996)), the elastic net (Zou H et al. (2005) <doi:10.1111/j.1467-9868.2005.00503.x>), the adaptive lasso (Zou H (2006) <doi:10.1198/016214506000000735>), the stability selection (Meinshausen N et al. (2010) <doi:10.1111/j.1467-9868.2010.00740.x>), some extensions of the lasso (Ternes et al. (2016) <doi:10.1002/sim.6927>), some methods for the interaction setting (Ternes N et al. (2016) <doi:10.1002/bimj.201500234>), or others. A function generating simulated survival data set is also provided.
biostat3 Utility Functions, Datasets and Extended Examples for Survival Analysis
Utility functions, datasets and extended examples for survival analysis. This includes a range of other packages, some simple wrappers for time-to-event analyses, datasets, and extensive examples in HTML with R scripts. The package also supports the course Biostatistics III entitled ‘Survival analysis for epidemiologists in R’.
bipartite Visualising bipartite networks and calculating some (ecological) indices
Bipartite provides functions to visualise webs and calculate a series of indices commonly used to describe pattern in ecological webs. It focuses on webs consisting of only two trophic levels, e.g. pollination webs or predator-prey-webs. Visualisation is important to get an idea of what we are actually looking at, while the indices summarise different aspects of the webs topology.
bipartiteD3 Interactive Bipartite Graphs
Generates interactive bipartite graphs using the D3 library. Designed for use with the ‘bipartite’ analysis package. Sources open source ‘vis-js’ library (<http://…/> ). Adapted from examples at <https://…/NPashaP> (released under GPL-3).
BiplotGUI Interactive Biplots in R
Provides a GUI with which users can construct and interact with biplots.
birdnik Connector for the Wordnik API
A connector to the API for ‘Wordnik’ <>, a dictionary service that also provides bigram generation, word frequency data, and a whole host of other functionality.
biscale Tools and Palettes for Bivariate Thematic Mapping
Provides a ‘ggplot2’ centric approach to bivariate mapping. This is a technique that maps two quantities simultaneously rather than the single value that most thematic maps display. The package provides a suite of tools for calculating breaks using multiple different approaches, a selection of palettes appropriate for bivariate mapping and a scale function for ‘ggplot2’ calls that adds those palettes to maps. A tool for creating bivariate legends is also included.
bisque Approximate Bayesian Inference via Sparse Grid Quadrature Evaluation (BISQuE) for Hierarchical Models
Implementation of the ‘bisque’ strategy for approximate Bayesian posterior inference. See Hewitt and Hoeting (2019) <arXiv:1904.07270> for complete details. ‘bisque’ combines conditioning with sparse grid quadrature rules to approximate marginal posterior quantities of hierarchical Bayesian models. The resulting approximations are computationally efficient for many hierarchical Bayesian models. The ‘bisque’ package allows approximate posterior inference for custom models; users only need to specify the conditional densities required for the approximation.
bitops Bitwise Operations
Functions for bitwise operations on integer vectors.
BiTrinA Binarization and Trinarization of One-Dimensional Data
Provides methods for the binarization and trinarization of one-dimensional data and some visualization functions.
bitsqueezr Quantize Floating-Point Numbers for Improved Compressibility
Provides a implementation of floating-point quantization algorithms for use in precision-preserving compression, similar to the approach taken in the ‘netCDF operators’ (NCO) software package and described in Zender (2016) <doi:10.5194/gmd-2016-63>.
biva Business Intelligence
Interactive shiny application for working with different kinds of data. Visuals for data are provided. Runtime examples are provided.
bivariate Bivariate Probability Distributions
Provides alternatives to persp() for plotting bivariate functions, including both step and continuous functions. Also, provides convenience functions for constructing and plotting bivariate probability distributions. Currently, only normal distributions are supported but other probability distributions are likely to be added in the near future.
Bivariate.Pareto Bivariate Pareto Models
Perform competing risks analysis under bivariate Pareto models. See Shih et al. (2018, to appear).
BivRegBLS Tolerance Intervals and Errors-in-Variables Regressions in Method Comparison Studies
Assess the agreement in method comparison studies by tolerance intervals and errors-in-variables regressions. The Ordinary Least Square regressions (OLSv and OLSh), the Deming Regression (DR), and the (Correlated)-Bivariate Least Square regressions (BLS and CBLS) can be used with unreplicated or replicated data. The BLS and CBLS are the two main functions to estimate a regression line, while XY.plot and MD.plot are the two main graphical functions to display, respectively an (X,Y) plot or (M,D) plot with the BLS or CBLS results. Assuming no proportional bias, the (M,D) plot (Band-Altman plot) may be simplified by calculating horizontal lines intervals with tolerance intervals (beta-expectation (type I) or beta-gamma content (type II)).
bivrp Bivariate Residual Plots with Simulation Polygons
Generates bivariate residual plots with simulation polygons for any diagnostics and bivariate model from which functions to extract the desired diagnostics, simulate new data and refit the models are available.
biwavelet Conduct Univariate and Bivariate Wavelet Analyses
This is a port of the WTC MATLAB package written by Aslak Grinsted and the wavelet program written by Christopher Torrence and Gibert P. Compo. This package can be used to perform univariate and bivariate (cross-wavelet, wavelet coherence, wavelet clustering) analyses.
bkmr Bayesian Kernel Machine Regression
Implementation of a statistical approach for estimating the joint health effects of multiple concurrent exposures.
BKPC Bayesian Kernel Projection Classifier
Bayesian kernel projection classifier is a nonlinear multicategory classifier which performs the classification of the projections of the data to the principal axes of the feature space. A Gibbs sampler is implemented to find the posterior distributions of the parameters.
blackbox Black Box Optimization and Exploration of Parameter Space
Performs prediction of a response function from simulated response values, allowing black-box optimization of functions estimated with some error. blackbox includes a simple user interface for such applications, as well as more specialized functions designed to be called by the Migraine software (see URL). The latter functions are used for prediction of likelihood surfaces and implied likelihood ratio confidence intervals, and for exploration of predictor space of the surface. Prediction of the response is based on ordinary kriging (with residual error) of the input. Estimation of smoothing parameters is performed by generalized cross validation.
blaise Read and Write FWF Files in the Blaise Format
Can be used to read and write a fwf with an accompanying blaise datamodel. When supplying a datamodel for writing, the dataframe will be automatically converted to that format and checked for compatibility. Supports dataframes, tibbles and LaF objects.
BlandAltmanLeh Plots (slightly extended) Bland-Altman plots
Bland-Altman Plots using base graphics as well as ggplot2, slightly extended by confidence intervals, with detailed return values and a sunflowerplot option for data with ties.
blandr Bland-Altman Method Comparison
Carries out Bland Altman analyses (also known as a Tukey mean-difference plot) as described by JM Bland and DG Altman in 1986 <doi:10.1016/S0140-6736(86)90837-8>. This package was created in 2015 as existing Bland-Altman analysis functions did not calculate confidence intervals. This package was created to rectify this, and create reproducible plots.
blastula Easily Send HTML Email Messages
Compose and send out responsive HTML email messages that render perfectly across a range of email clients and device sizes. Messages are composed using ‘Markdown’ and a text interpolation system that allows for the injection of evaluated R code within the message body, footer, and subject line. Helper functions let the user insert embedded images, web link buttons, and ‘ggplot2’ plot objects into the message body. Messages can be sent through an ‘SMTP’ server or through the ‘Mailgun’ API service <http://…/>.
blatr Send Emails Using ‘Blat’ for Windows
A wrapper around the Blat command line SMTP mailer for Windows. Blat is public domain software, but be sure to read the license before use. It can be found at the Blat website .
blavaan Bayesian Latent Variable Analysis
Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models.
BLCOP Black-Litterman and Copula Opinion Pooling Frameworks
An implementation of the Black-Litterman Model and Atilio Meucci’s copula opinion pooling framework.
blendedLink A New Link Function that Blends Two Specified Link Functions
A new link function that equals one specified link function up to a cutover then a linear rescaling of another specified link function. For use in glm() or glm2(). The intended use is in binary regression, in which case the first link should be set to ‘log’ and the second to ‘logit’. This ensures that fitted probabilities are between 0 and 1 and that exponentiated coefficients can be interpreted as relative risks for probabilities up to the cutoff.
Blendstat Joint Analysis of Experiments with Mixtures and Random Effects
Package to perform a joint analysis of experiments with mixtures and random effects, assuming a process variable, represented by a covariate, Kalirajan K P (1990) <doi:10.1080/757582835>.
blink Record Linkage for Empirically Motivated Priors
An implementation of the model in Steorts (2015) <DOI:10.1214/15-BA965SI>, which performs Bayesian entity resolution for categorical and text data, for any distance function defined by the user. In addition, the precision and recall are in the package to allow one to compare to any other comparable method such as logistic regression, Bayesian additive regression trees (BART), or random forests. The experiments are reproducible and illustrated using a simple vignette.
blkbox Data Exploration with Multiple Machine Learning Algorithms
Allows data to be processed by multiple machine learning algorithms at the same time, enables feature selection of data by single a algorithm or combinations of multiple. Easy to use tool for k-fold cross validation and nested cross validation.
BLModel Black-Litterman Posterior Distribution
Posterior distribution in the Black-Litterman model is computed from a prior distribution given in the form of a time series of asset returns and a continuous distribution of views provided by the user as an external function.
blob A Simple S3 Class for Representing Vectors of Binary Data (‘BLOBS’)
R’s raw vector is useful for storing a single binary object. What if you want to put a vector of them in a data frame? The blob package provides the blob object, a list of raw vectors, suitable for use as a column in data frame.
BlockFeST Bayesian Calculation of Region-Specific Fixation Index to Detect Local Adaptation
An R implementation of an extension of the ‘BayeScan’ software (Foll, 2008) <DOI:10.1534/genetics.108.092221> for codominant markers, adding the option to group individual SNPs into pre-defined blocks. A typical application of this new approach is the identification of genomic regions, genes, or gene sets containing one or more SNPs that evolved under directional selection.
blockRAR Block Design for Response-Adaptive Randomization
Computes power for response-adaptive randomization with a block design that captures both the time and treatment effect. T. Chandereng, R. Chappell (2019) <arXiv:1904.07758>.
blockseg Two Dimensional Change-Points Detection
Segments a matrix in blocks with constant values.
blorr Tools for Developing Binary Logistic Regression Models
Tools designed to make it easier for beginner and intermediate users to build and validate binary logistic regression models. Includes bivariate analysis, comprehensive regression output, model fit statistics, variable selection procedures, model validation techniques and a ‘shiny’ app for interactive model building.
Blossom Functions for making statistical comparisons with distance-function based permutation tests
Blossom is an R package with functions for making statistical comparisons with distance-function based permutation tests developed by P.W. Mielke, Jr. and colleagues at Colorado State University and for testing parameters estimated in linear models with permutation procedures developed by B. S. Cade and colleagues at the Fort Collins Science Center, U.S. Geological Survey. This implementation in R has allowed for numerous improvements not supported by the Cade and Richards Fortran implementation, including use of categorical predictor variables in most routines.
Blossom Statistical Package for R
BLPestimatoR Performs a BLP Demand Estimation
Provides the estimation algorithm to perform the demand estimation described in Berry, Levinsohn and Pakes (1995) <DOI:10.2307/2171802> . The routine uses analytic gradients and offers a large number of implemented integration methods and optimization routines.
blsAPI Request Data From The U.S. Bureau of Labor Statistics API
Allows users to request data for one or multiple series through the U.S. Bureau of Labor Statistics API. Users provide parameters as specified in http://…/api_signature.htm and the function returns a JSON string.
BLSM Bayesian Latent Space Model
Provides a Bayesian latent space model for complex networks, either weighted or unweighted. Given an observed input graph, the estimates for the latent coordinates of the nodes are obtained through a Bayesian MCMC algorithm. The overall likelihood of the graph depends on a fundamental probability equation, which is defined so that ties are more likely to exist between nodes whose latent space coordinates are close. The package is mainly based on the model by Hoff, Raftery and Handcock (2002) <doi:10.1198/016214502388618906> and contains some extra features (e.g., removal of the Procrustean step, weights implemented as coefficients of the latent distances, 3D plots). The original code related to the above model was retrieved from <https://…/>. Users can inspect the MCMC simulation, create and customize insightful graphical representations or apply clustering techniques.
BMA Bayesian Model Averaging
Package for Bayesian model averaging for linear models, generalizable linear models and survival models (cox regression).
BMAmevt Multivariate Extremes: Bayesian Estimation of the Spectral Measure
Toolkit for Bayesian estimation of the dependence structure in Multivariate Extreme Value parametric models.
BMisc Miscellaneous Functions for Panel Data, Quantiles, and Printing Results
These are miscellaneous functions for working with panel data, quantiles, and printing results. For panel data, the package includes functions for making a panel data balanced (that is, dropping missing individuals that have missing observations in any time period), converting id numbers to row numbers, and to treat repeated cross sections as panel data under the assumption of rank invariance. For quantiles, there are functions to make ecdf functions from a set of data points (this is particularly useful when a distribution function is created in several steps) and to combine distribution functions based on some external weights; these distribution functions can easily be inverted to obtain quantiles. Finally, there are several other miscellaneous functions for obtaining weighted means, weighted distribution functions, and weighted quantiles; to generate summary statistics and their differences for two groups; and to drop covariates from formulas.
bmixture Bayesian Estimation for Finite Mixture of Distributions
Provides statistical tools for Bayesian estimation for finite mixture of distributions, mainly mixture of Gamma, Normal and t-distributions.
bmlm Bayesian Multilevel Mediation
Easy estimation of Bayesian multilevel mediation models with Stan.
bmotif Counting Motifs in Bipartite Networks
Counts occurrences of motifs in bipartite networks, as well as the number of times each node appears in each unique position within motifs. Intended for use in ecology, but its methods are general and can be applied to any bipartite network.
bnclassify Learning Bayesian Network Classifiers from Data
Implementation of different algorithms for learning discrete Bayesian network classifiers from data, including wrapper algorithms and those based on Chow-Liu’s algorithm.
BNDataGenerator Data Generator based on Bayesian Network Model
Data generator based on Bayesian network model
BNN Bayesian Neural Network for High-Dimensional Nonlinear Variable Selection
Perform Bayesian variable selection for high-dimensional nonlinear systems and also can be used to test nonlinearity for a general regression problem. The computation can be accelerated using multiple CPUs. You can refer to Liang, F., Li, Q. and Zhou, L. (2017) at <https://…/SAMSI_DPDA-Liang.pdf> for detail. The publication ‘Bayesian Neural Networks for Selection of drug sensitive Genes’ will be appear on Journals of American Statistical Association soon.
bnnSurvival Bagged k-Nearest Neighbors Survival Prediction
Implements a bootstrap aggregated (bagged) version of the k-nearest neighbors survival probability prediction method (Lowsky et al. 2013). In addition to the bootstrapping of training samples, the features can be subsampled in each baselearner to break the correlation between them. The Rcpp package is used to speed up the computation.
bnormnlr Bayesian Estimation for Normal Heteroscedastic Nonlinear Regression Models
Implementation of Bayesian estimation in normal heteroscedastic nonlinear regression Models following Cepeda-Cuervo, (2001)
bnpa Bayesian Networks & Path Analysis
We proposed a hybrid approach using the computational and statistical resources of the Bayesian Networks to learn a network structure from a data set using 4 different algorithms and the robustness of the statistical methods present in the Structural Equation Modeling to check the goodness of fit from model over data. We built an intermediate algorithm to join the features of ‘bnlearn’ and ‘lavaan’ R packages. The Bayesian Networks structure learning algorithms used were ‘Hill-Climbing’, ‘Max-Min Hill-Climbing’, ‘Restricted Maximization’ and ‘Tabu Search’.
BNPmix Algorithms for Pitman-Yor Process Mixtures
Contains different algorithms to both univariate and multivariate Pitman-Yor process mixture models, and Griffiths-Milne Dependent Dirichlet process mixture models. Pitman-Yor process mixture models are flexible Bayesian nonparametric models to deal with density estimation. Estimation could be done via importance conditional sampler, or via slice sampler, as done by Walker (2007) <doi:10.1080/03610910601096262>, or using a marginal sampler, as in Escobar and West (1995) <doi:10.2307/2291069> and extensions. The package contains also the procedures to estimate via importance conditional sampler a GM-Dependent Dirichlet process mixture model.
BNPMIXcluster Bayesian Nonparametric Model for Clustering with Mixed Scale Variables
Bayesian nonparametric approach for clustering that is capable to combine different types of variables (continuous, ordinal and nominal) and also accommodates for different sampling probabilities in a complex survey design. The model is based on a location mixture model with a Poisson-Dirichlet process prior on the location parameters of the associated latent variables. The package performs the clustering model described in Carmona, C., Nieto-Barajas, L. E., Canale, A. (2016) <http://…/1612.00083>.
BNPTSclust A Bayesian Nonparametric Algorithm for Time Series Clustering
Performs the algorithm for time series clustering described in Nieto-Barajas and Contreras-Cristan (2014).
BNSL Bayesian Network Structure Learning
From a given dataframe, this package learns its Bayesian network structure based on a selected score.
bnspatial Spatial Implementation of Bayesian Networks and Mapping
Package for the spatial implementation of Bayesian Networks and mapping in geographical space. It makes maps of expected value (or most likely state) given known and unknown conditions, maps of uncertainty measured as both coefficient of variation or Shannon index (entropy), maps of probability associated to any states of any node of the network. Some additional features are provided as well, such as parallel processing options, data discretization routines and function wrappers designed for users with minimal knowledge of the R programming language.
bnstruct Bayesian Network Structure Learning from Data with Missing Values
Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Hill-climbing heuristic search, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference.
bnviewer Interactive Visualization of Bayesian Networks
Interactive visualization of Bayesian Networks. The ‘bnviewer’ package reads various structure learning algorithms provided by the ‘bnlearn’ package and allows you to view them interactively.
boclust A Clustering Method Based on Boosting on Single Attributes
An overlap clustering algorithm for categorical ultra-dimension data.
BoltzMM Boltzmann Machines with MM Algorithms
Provides probability computation, data generation, and model estimation for fully-visible Boltzmann machines. It follows the methods described in Nguyen and Wood (2016a) <doi:10.1162/NECO_a_00813> and Nguyen and Wood (2016b) <doi:10.1109/TNNLS.2015.2425898>.
BonEV An Improved Multiple Testing Procedure for Controlling False Discovery Rates
An improved multiple testing procedure for controlling false discovery rates which is developed based on the Bonferroni procedure with integrated estimates from the Benjamini-Hochberg procedure and the Storey’s q-value procedure. It controls false discovery rates through controlling the expected number of false discoveries.
bookdown Authoring Books with R Markdown
Output formats and utilities for authoring books with R Markdown.
bookdownplus Generate Varied Books and Documents with R ‘bookdown’ Package
A collection and selector of R ‘bookdown’ templates. ‘bookdownplus’ helps you write academic journal articles, guitar books, chemical equations, mails, calendars, and diaries. R ‘bookdownplus’ extends the features of ‘bookdown’, and simplifies the procedure. Users only have to choose a template, clarify the book title and author name, and then focus on writing the text. No need to struggle in YAML and LaTeX.
BoolFilter Optimal Estimation of Partially Observed Boolean Dynamical Systems
Tools for optimal and approximate state estimation as well as network inference of Partially-Observed Boolean Dynamical Systems.
boostmtree Boosted Multivariate Trees for Longitudinal Data
Implements Friedman’s gradient descent boosting algorithm for longitudinal data using multivariate tree base learners. A time-covariate interaction effect is modeled using penalized B-splines (P-splines) with estimated adaptive smoothing parameter.
bootcluster Bootstrapping Estimates of Clustering Stability
Implementation of the bootstrapping approach for the estimation of clustering stability on observation and cluster level, as well as its application in estimating the number of clusters.
bootnet Bootstrap Methods for Various Network Estimation Routines
Bootstrap standard errors on various network estimation routines, such as EBICglasso from the qgraph package and IsingFit from the IsingFit package.
bootsPLS Bootstrap Subsamplings of Sparse Partial Least Squares – Discriminant Analysis for Classification and Signature Identification
Bootstrap Subsamplings of sparse Partial Least Squares – Discriminant Analysis (sPLS-DA) for Classification and Signature Identification. The method is applicable to any classification problem with more than 2 classes. It relies on bootstrap subsamplings of sPLS-DA and provides tools to select the most stable variables (defined as the ones consistently selected over the bootstrap subsamplings) and to predict the class of test samples.
bootstrapFP Bootstrap Algorithms for Finite Population Inference
Finite Population bootstrap algorithms to estimate the variance of the Horvitz-Thompson estimator for single-stage sampling. For a survey of bootstrap methods for finite populations, see Mashreghi et Al. (2016) <doi:10.1214/16-SS113>.
bootTimeInference Robust Performance Hypothesis Testing with the Sharpe Ratio
Applied researchers often test for the difference of the Sharpe ratios of two investment strategies. A very popular tool to this end is the test of Jobson and Korkie, which has been corrected by Memmel. Unfortunately, this test is not valid when returns have tails heavier than the normal distribution or are of time series nature. Instead, we propose the use of robust inference methods. In particular, we suggest to construct a studentized time series bootstrap confidence interval for the difference of the Sharpe ratios and to declare the two ratios different if zero is not contained in the obtained interval. This approach has the advantage that one can simply resample from the observed data as opposed to some null-restricted data.
boottol Bootstrap Tolerance Levels for Credit Scoring Validation Statistics
Used to create bootstrap tolerance levels for the Kolmogorov-Smirnov (KS) statistic, the area under receiver operator characteristic curve (AUROC) statistic, and the Gini coefficient for each score cutoff.
BootWPTOS Test Stationarity using Bootstrap Wavelet Packet Tests
Provides significance tests for second-order stationarity for time series using bootstrap wavelet packet tests.
bor Transforming Behavioral Observation Records into Data Matrices
Transforms focal observations’ data, where different types of social interactions can be recorded by multiple observers, into asymmetric data matrices. Each cell in these matrices provides counts on the number of times a specific type of social interaction was initiated by the row subject and directed to the column subject.
Boruta Wrapper Algorithm for All Relevant Feature Selection
An all relevant feature selection wrapper algorithm. It finds relevant features by comparing original attributes’ importance with importance achievable at random, estimated using their permuted copies.
BoSSA A Bunch of Structure and Sequence Analysis
Reads and plots phylogenetic placements obtained using the ‘epa’, ‘pplacer’ and ‘guppy’ softwares.
boxcoxmix Response Transformations for Random Effect and Variance Component Models
Response transformations for overdispersed generalized linear models and variance component models using nonparametric profile maximum likelihood estimation. The main function is optim.boxcox().
bpa Basic Pattern Analysis
Run basic pattern analyses on character sets, digits, or combined input containing both characters and numeric digits. Useful for data cleaning and for identifying columns containing multiple or nonstandard formats.
bpnreg Bayesian Projected Normal Regression Models for Circular Data
Fitting Bayesian multiple and mixed-effect regression models for circular data based on the projected normal distribution. Both continuous and categorical predictors can be included. Sampling from the posterior is performed via an MCMC algorithm. Posterior descriptives of all parameters, model fit statistics and Bayes factors for hypothesis tests for inequality constrained hypotheses are provided. See Cremers, Mulder & Klugkist (2018) <doi:10.1111/bmsp.12108> and Nuñez-Antonio & Guttiérez-Peña (2014) <doi:10.1016/j.csda.2012.07.025>.
bpp Computations Around Bayesian Predictive Power
Implements functions to update Bayesian Predictive Power Computations after not stopping a clinical trial at an interim analysis. Such an interim analysis can either be blinded or unblinded. Code is provided for Normally distributed endpoints with known variance, with a prominent example being the hazard ratio.
BradleyTerryScalable Fits the Bradley-Terry Model to Potentially Large and Sparse Networks of Comparison Data
Facilities are provided for fitting the simple, unstructured Bradley-Terry model to networks of binary comparisons. The implemented methods are designed to scale well to large, potentially sparse, networks. A fairly high degree of scalability is achieved through the use of EM and MM algorithms, which are relatively undemanding in terms of memory usage (relative to some other commonly used methods such as iterative weighted least squares, for example). Both maximum likelihood and Bayesian MAP estimation methods are implemented. The package provides various standard methods for a newly defined ‘btfit’ model class, such as the extraction and summarisation of model parameters and the simulation of new datasets from a fitted model. Tools are also provided for reshaping data into the newly defined ‘btdata’ class, and for analysing the comparison network, prior to fitting the Bradley-Terry model. This package complements, rather than replaces, the existing ‘BradleyTerry2’ package. (BradleyTerry2 has rather different aims, which are mainly the specification and fitting of ‘structured’ Bradley-Terry models in which the strength parameters depend on covariates.)
braidReports Visualize Combined Action Response Surfaces and Report BRAID Analyses
Provides functions to generate, format, and style surface plots for visualizing combined action data. Also provides functions for reporting on a BRAID analysis, including plotting curve-shifts, calculating IAE values, and producing full BRAID analysis reports.
braidrm Fitting Dose Response with the BRAID Combined Action Model
Contains functions for evaluating, analyzing, and fitting combined action dose response surfaces with the Bivariate Response to Additive Interacting Dose (BRAID) model of combined action.
brainKCCA Region-Level Connectivity Network Construction via Kernel Canonical Correlation Analysis
It is designed to calculate connection between (among) brain regions and plot connection lines. Also, the summary function is included to summarize group-level connectivity network. Kang, Jian (2016) <doi:10.1016/j.neuroimage.2016.06.042>.
brant Test for Parallel Regression Assumption
Tests the parallel regression assumption for ordinal logit models generated with the function polr() from the package MASS.
braQCA Bootstrapped Robustness Assessment for Qualitative Comparative Analysis
Test the robustness of a user’s Qualitative Comparative Analysis solutions to randomness, using the bootstrapped assessment: baQCA(). This package also includes a function that provides recommendations for improving solutions to reach typical significance levels: brQCA(). After applying recommendations from brQCA(), QCAdiff() shows which cases are excluded from the final result.
brea Bayesian Recurrent Event Analysis
A function to produce MCMC samples for posterior inference in semiparametric Bayesian discrete time competing risks recurrent events models.
breakDown Break Down Plots
Break Down Plots are inspired by waterfall plots created by ‘xgboostExplainer’ package (see <https://…/xgboostExplainer> ). The idea behind Break Down Plots it to decompose model prediction for a single observation. Break Down Plots show the contribution of every variable present in the model. Such plots will work for binary classifiers and general regression models.
breakfast Multiple Change-Point Detection and Segmentation
Performs multiple change-point detection in data sequences, or data sequence segmentation, using computationally efficient multiscale methods. This version only implements the ‘Tail-Greedy Unbalanced Haar’ change-point detection methodology; more methods will be added in future versions. To start with, see the function segment.mean.
BreakoutDetection Breakout Detection via Robust E-Statistics
BreakoutDetection is an open-source R package that makes breakout detection simple and fast. The BreakoutDetection package can be used in wide variety of contexts. For example, detecting breakout in user engagement post an A/B test, detecting behavioral change, or for problems in econometrics, financial engineering, political and social sciences.
brglm2 Bias Reduction in Generalized Linear Models
Estimation and inference from generalized linear models based on various methods for bias reduction. The brglmFit fitting method can achieve reduction of estimation bias either through the adjusted score equations approach in Firth (1993) <https://…/80.1.27> and Kosmidis and Firth (2009) <https://…/asp055>, or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as in Cordeiro and McCullagh (1991) <http://…/2345592>. In the special case of generalized linear models for binomial and multinomial responses, the adjusted score equations approach returns estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite (e.g. complete and quasi-complete separation). Estimation in all cases takes place via a quasi Fisher scoring algorithm, and S3 methods for the construction of of confidence intervals for the reduced-bias estimates are provided.
bridgedist An Implementation of the Bridge Distribution with Logit-Link as in Wang and Louis (2003)
An implementation of the bridge distribution with logit-link in R. In Wang and Louis (2003) <doi:10.1093/biomet/90.4.765>, such a univariate bridge distribution was derived as the distribution of the random intercept that ‘bridged’ a marginal logistic regression and a conditional logistic regression. The conditional and marginal regression coefficients are a scalar multiple of each other. Such is not the case if the random intercept distribution was Gaussian.
briqr Interface to the ‘Briq’ API
An interface to the ‘Briq’ API <>. ‘Briq’ is a tool that aims to promote employee engagement by helping employees recognize and reward each other. Employees can praise and thank one another (for achieving a company goal, for example) by giving virtual credits (known as ‘briqs’ or ‘bqs’) that can be redeemed for various rewards. The ‘Briq’ API lets you create, read, update and delete users, user groups, transactions and messages. This package provides functions that simplify getting the users, user groups and transactions of your organization into R.
BRISC Fast Inference for Large Spatial Datasets using BRISC
Fits Bootstrap with univariate spatial regression models using Bootstrap for Rapid Inference on Spatial Covariances (BRISC) for large datasets using Nearest Neighbor Gaussian Processes detailed in Saha and Datta (2018) <doi:10.1002/sta4.184>.
briskaR Biological Risk Assessment
A spatio-temporal exposure-hazard model for assessing biological risk and impact. The model is based on stochastic geometry for describing the landscape and the exposed individuals, a dispersal kernel for the dissemination of contaminants and an ecotoxicological equation.
brlrmr Bias Reduction with Missing Binary Response
Provides two main functions, il() and fil(). The il() function implements the EM algorithm developed by Ibrahim and Lipsitz (1996) <DOI:10.2307/2533068> to estimate the parameters of a logistic regression model with the missing response when the missing data mechanism is nonignorable. The fil() function implements the algorithm proposed by Maity et. al. (2017+) <https://…/brlrmr> to reduce the bias produced by the method of Ibrahim and Lipsitz (1996) <DOI:10.2307/2533068>.
brm Binary Regression Model
Fits novel models for the conditional relative risk, risk difference and odds ratio.
brms Bayesian Regression Models using Stan
Write and fit Bayesian generalized linear mixed models using Stan for full Bayesian inference.
broom Convert Statistical Analysis Objects into Tidy Data Frames
Convert statistical analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like dplyr, tidyr and ggplot2. The package provides three S3 generics: tidy, which summarizes a model’s statistical findings such as coefficients of a regression; augment, which adds columns to the original data such as predictions, residuals and cluster assignments; and glance, which provides a one-row summary of model-level statistics.
broom.mixed Tidying Methods for Mixed Models
Convert fitted objects from various R mixed-model packages into tidy data frames along the lines of the ‘broom’ package. The package provides three S3 generics for each model: tidy(), which summarizes a model’s statistical findings such as coefficients of a regression; augment(), which adds columns to the original data such as predictions, residuals and cluster assignments; and glance(), which provides a one-row summary of model-level statistics.
broomExtra Grouped Statistical Analyses in a Tidy Way
Collection of functions to assist ‘broom’ and ‘broom.mixed’ package-related data analysis workflows. In particular, the generic functions tidy(), glance(), and augment() choose appropriate S3 methods from these two packages depending on which package exports the needed method. Additionally, ‘grouped_’ variants of the generics provides a convenient way to execute functions across a combination of grouping variable(s) in a dataframe.
brotli A Compression Format Optimized for the Web
A lossless compressed data format that compresses data using a combination of the LZ77 algorithm and Huffman coding, with efficiency comparable to the best currently available general-purpose compression methods. Brotli is similar in speed to deflate but offers more dense compression.
Brq Bayesian Analysis of Quantile Regression Models
Bayesian estimation and variable selection for quantile regression models.
brr Bayesian Inference on the Ratio of Two Poisson Rates
Implementation of the Bayesian inference for the two independent Poisson samples model, using the semi-conjugate family of prior distributions.
brt Biological Relevance Testing
Analyses of large-scale -omics datasets commonly use p-values as the indicators of statistical significance. However, considering p-value alone neglects the importance of effect size (i.e., the mean difference between groups) in determining the biological relevance of a significant difference. Here, we present a novel algorithm for computing a new statistic, the biological relevance testing (BRT) index, in the frequentist hypothesis testing framework to address this problem.
Brundle Normalisation Tools for Inter-Condition Variability of ChIP-Seq Data
Inter-sample condition variability is a key challenge of normalising ChIP-seq data. This implementation uses either spike-in or a second factor as a control for normalisation. Input can either be from ‘DiffBind’ or a matrix formatted for ‘DESeq2’. The output is either a ‘DiffBind’ object or the default ‘DESeq2’ output. Either can then be processed as normal. Supporting manuscript Guertin, Markowetz and Holding (2017) <doi:10.1101/182261>.
bsearchtools Binary Search Tools
Exposes the binary search functions of the C++ standard library (std::lower_bound, std::upper_bound) plus other convenience functions, allowing faster lookups on sorted vectors.
BSGS Bayesian Sparse Group Selection
The integration of Bayesian variable and sparse group variable selection approaches for regression models.
BSGW Bayesian Survival Model using Generalized Weibull Regression
Bayesian survival model using Weibull regression on both scale and shape parameters.
bshazard Nonparametric Smoothing of the Hazard Function
The function estimates the hazard function non parametrically from a survival object (possibly adjusted for covariates). The smoothed estimate is based on B-splines from the perspective of generalized linear mixed models. Left truncated and right censoring data are allowed.
BSL Bayesian Synthetic Likelihood with Graphical Lasso
Bayesian synthetic likelihood (BSL, Price et al. (2018) <doi:10.1080/10618600.2017.1302882>) is an alternative to standard, non-parametric approximate Bayesian computation (ABC). BSL assumes a multivariate normal distribution for the summary statistic likelihood and it is suitable when the distribution of the model summary statistics is sufficiently regular. This package provides a Metropolis Hastings Markov chain Monte Carlo implementation of BSL and BSL with graphical lasso (BSLasso, An et al. (2018) <https://…/> ), which is computationally more efficient when the dimension of the summary statistic is high. Extensions to this package are planned.
BSPADATA Bayesian Proposal to Fit Spatial Econometric Models
The purpose of this package is to fit the three Spatial Econometric Models proposed in Anselin (1988, ISBN:9024737354) in the homoscedastic and the heteroscedatic case. The fit is made through MCMC algorithms and observational working variables approach.
bsplinePsd Bayesian Nonparametric Spectral Density Estimation Using B-Spline Priors
Implementation of a Metropolis-within-Gibbs MCMC algorithm to flexibly estimate the spectral density of a stationary time series. The algorithm updates a nonparametric B-spline prior using the Whittle likelihood to produce pseudo-posterior samples and is based on the work presented by Edwards, Meyer, and Christensen (2017) <arXiv:1707.04878>.
bssm Bayesian Inference of State Space Models
Efficient methods for Bayesian inference of state space models via particle Markov chain Monte Carlo and importance sampling type corrected Markov chain Monte Carlo. Gaussian, Poisson, binomial, or negative binomial observation densities and Gaussian state dynamics, as well as general non-linear Gaussian models are supported.
btb Beyond the Border
Kernel density estimation dedicated to urban geography.
BTdecayLasso Bradley-Terry Model with Exponential Time Decayed Log-Likelihood and Adaptive Lasso
We apply Bradley-Terry Model to estimate teams’ ability in paired comparison data. Exponential Decayed Log-likelihood function is applied for dynamic approximation of current rankings and Lasso penalty is applied for variance reduction and grouping. The main algorithm applies the Augmented Lagrangian Method described by Masarotto and Varin (2012) <doi:10.1214/12-AOAS581>.
btergm Temporal Exponential Random Graph Models by Bootstrapped Pseudolikelihood
Temporal Exponential Random Graph Models (TERGM) estimated by maximum pseudolikelihood with bootstrapped confidence intervals or Markov Chain Monte Carlo maximum likelihood. Goodness of fit assessment for ERGMs, TERGMs, and SAOMs. Micro-level interpretation of ERGMs and TERGMs.
BTR Training and Analysing Asynchronous Boolean Models
Tools for inferring asynchronous Boolean models from single-cell expression data.
bucky Bucky’s Archive for Data Analysis in the Social Sciences
Provides functions for various statistical techniques commonly used in the social sciences, including functions to compute clustered robust standard errors, combine results across multiply-imputed data sets, and simplify the addition of robust and clustered robust standard errors. The package was originally developed, in part, to assist porting of replication code from ‘Stata’ and attempts to replicate default options from ‘Stata’ where possible.
BUCSS Bias and Uncertainty Corrected Sample Size
Implements a method of correcting for publication bias and uncertainty when planning sample sizes in a future study from an original study.
buildmer Stepwise Elimination and Term Reordering for Mixed-Effects Regression
Finds the largest possible regression model that will still converge for various types of regression analyses (including mixed models and generalized additive models) and then optionally performs stepwise elimination similar to the forward and backward effect selection methods in SAS, based on the change in log-likelihood, Akaike’s Information Criterion, or the Bayesian Information Criterion.
bulletcp Automatic Groove Identification via Bayesian Changepoint Detection
Provides functionality to automatically detect groove locations via a Bayesian changepoint detection method to be used in the data preprocessing step of forensic bullet matching algorithms. The methods in this package are based on those in Stephens (1994) <doi:10.2307/2986119>. Bayesian changepoint detection will simply be an option in the function from the package ‘bulletxtrctr’ which identifies the groove locations.
BullsEyeR Topic Modelling
Helps in initial processing like converting text to lower case, removing punctuation, numbers, stop words, stemming, sparsity control and term frequency inverse document frequency processing. Helps in recognizing domain or corpus specific stop words. Makes use of ‘ldatunig’ output to pick optimal number of topics for topic modelling. Helps in extracting dominant words or key words that represent the context/topics of the content in each document.
bullwhipgame Bullwhip Effect Demo in Shiny
The bullwhipgame is an educational game that has as purpose the illustration and exploration of the bullwhip effect,i.e, the increase in demand variability along the supply chain. Marchena Marlene (2010) <arXiv:1009.3977>.
bupaR Business Process Analytics in R
Functionalities for process analysis in R. This packages implements an S3-class for event log objects, and related handler functions. Imports related packages for subsetting event data, computation of descriptive statistics, handling of Petri Net objects and visualization of process maps.
bustt Bus and Transit Time Calculations
Calculate and work with time and schedules for bus, train, etc on transit data. Answer questions like: What is the time between any train arrival at 59-Street Columbus Circle on Saturdays What is the time between trains for stops along the A Train on weekdays
bvarsv Bayesian Analysis of a Vector Autoregressive Model with Stochastic Volatility and Time-Varying Parameters
R/C++ implementation of the model proposed by Primiceri (‘Time Varying Structural Vector Autoregressions and Monetary Policy’, Review of Economic Studies, 2005), with a focus on generating posterior predictive distributions.
BVSNLP Bayesian Variable Selection in High Dimensional Settings using Non-Local Prior
Variable/Feature selection in high or ultra-high dimensional settings has gained a lot of attention recently specially in cancer genomic studies. This package provides a Bayesian approach to tackle this problem, where it exploits mixture of point masses at zero and nonlocal priors to improve the performance of variable selection and coefficient estimation. It performs variable selection for binary response and survival time response datasets which are widely used in biostatistic and bioinformatics community. Benefiting from parallel computing ability, it reports necessary outcomes of Bayesian variable selection such as Highest Posterior Probability Model (HPPM), Median Probability Model (MPM) and posterior inclusion probability for each of the covariates in the model. The option to use Bayesian Model Averaging (BMA) is also part of this package that can be exploited for predictive power measurements in real datasets.
bwd Backward Procedure for Change-Point Detection
Implements a backward procedure for single and multiple change point detection proposed by Shin et al. <arXiv:1812.10107>. The backward approach is particularly useful to detect short and sparse signals which is common in copy number variation (CNV) detection.
BWStest Baumgartner Weiss Schindler Test of Equal Distributions
Performs the ‘Baumgartner-Weiss-Schindler’ two-sample test of equal probability distributions.
bytescircle Statistics About Bytes Contained in a File as a Circle Plot
Shows statistics about bytes contained in a file as a circle graph of deviations from mean in sigma increments. The function can be useful for statistically analyze the content of files in a glimpse: text files are shown as a green centered crown, compressed and encrypted files should be shown as equally distributed variations with a very low CV (sigma/mean), and other types of files can be classified between these two categories depending on their text vs binary content, which can be useful to quickly determine how information is stored inside them (databases, multimedia files, etc).
bzinb Bivariate Zero-Inflated Negative Binomial Model Estimator
Provides a maximum likelihood estimation of Bivariate Zero-Inflated Negative Binomial (BZINB) model or the nested model parameters. Also estimates the underlying correlation of the a pair of count data. See Cho, H., Preisser, J., Liu, C., and Wu, D. (In preparation) for details.


c060 Extended Inference for Lasso and Elastic-Net Regularized Cox and Generalized Linear Models
c060 provides additional functions to perform stability selection, model validation and parameter tuning for glmnet models
c2c Compare Two Classifications or Clustering Solutions of Varying Structure
Compare two classifications or clustering solutions that may or may not have the same number of classes, and that might have hard or soft (fuzzy, probabilistic) membership. Calculate various metrics to assess how the clusters compare to each other. The calculations are simple, but provide a handy tool for users unfamiliar with matrix multiplication. This package is not geared towards traditional accuracy assessment for classification/ mapping applications – the motivating use case is for comparing a probabilistic clustering solution to a set of reference or existing class labels that could have any number of classes (that is, without having to degrade the probabilistic clustering to hard classes).
c3 C3.js’ Chart Library
Create interactive charts with the ‘C3.js’ <http://…/> charting library. All plot types in ‘C3.js’ are available and include line, bar, scatter, and mixed geometry plots. Plot annotations, labels and axis are highly adjustable. Interactive web based charts can be embedded in R Markdown documents or Shiny web applications.
C443 See a Forest for the Trees
Getting insight into a forest of classification trees, by calculating similarities between the trees, and subsequently clustering them. Each cluster is represented by it’s most central cluster member. Sies, A & Van Mechelen, I. (paper submitted for publication).
CA3variants Three-Way Correspondence Analysis Variants
Provides three variants of three-way correspondence analysis (ca) three-way symmetrical ca, three-way non-symmetrical ca, three-way ordered symmetrical ca.
CADStat Provides a GUI to Several Statistical Methods
Using JGR, provides a GUI to several statistical methods – scatterplot, boxplot, linear regression, generalized linear regression, quantile, regression, conditional probability calculations, and regression trees.
caesar Encrypts and Decrypts Strings
Encrypts and decrypts strings using either the Caesar cipher or a pseudorandom number generation (using set.seed()) method.
CAinterprTools Graphical Aid in Correspondence Analysis Interpretation and Significance Testings
Allows to plot a number of information related to the interpretation of Correspondence Analysis’ results. It provides the facility to plot the contribution of rows and columns categories to the principal dimensions, the quality of points display on selected dimensions, the correlation of row and column categories to selected dimensions, etc. It also allows to assess which dimension(s) is important for the data structure interpretation by means of different statistics and tests. The package also offers the facility to plot the permuted distribution of the table total inertia as well as of the inertia accounted for by pairs of selected dimensions. Different facilities are also provided that aim to produce interpretation-oriented scatterplots. Reference: <doi:10.1016/j.softx.2015.07.001>.
CAISEr Comparison of Algorithms with Iterative Sample Size Estimation
Functions for performing experimental comparisons of algorithms using adequate sample sizes for power and accuracy.
calACS Count All Common Subsequences
Count all common subsequences between 2 string sequences, with items separated by the same delimiter. The first string input is a length- one vector, the second string input can be a vector or list containing multiple strings. Algorithm from Wang, H. All common subsequences (2007) IJCAI International Joint Conference on Artificial Intelligence, pp. 635-640.
Calculator.LR.FNs Calculator for LR Fuzzy Numbers
Arithmetic operations scalar multiplication, addition, subtraction, multiplication and division of LR fuzzy numbers (which are on the basis of Zadeh extension principle) have a complicate form for using in fuzzy Statistics, fuzzy Mathematics, machine learning, fuzzy data analysis and etc. Calculator for LR Fuzzy Numbers package, i.e. Calculator.LR.FNs package, relieve and aid applied users to achieve a simple and closed form for some complicated operator based on LR fuzzy numbers and also the user can easily draw the membership function of the obtained result by this package.
calcWOI Calculates the Wavelet-Based Organization Index
Calculates the original wavelet-based organization index, the modified wavelet-based organization index and the local wavelet-based organization index of an arbitrary 2D array using Wavelet Transform of Eckley et al (2010) (<doi:10.1111/j.1467-9876.2009.00721.x>) and Eckley and Nason (2011) (<doi:10.18637/jss.v043.i03>).
calendar Create, Read, Write, and Work with ‘iCalander’ Files, Calendars and Scheduling Data
Provides function to create, read, write, and work with ‘iCalander’ files (which typically have ‘.ics’ or ‘.ical’ extensions), and the scheduling data, calendars and timelines of people, organisations and other entities that they represent. ‘iCalendar’ is an open standard for exchanging calendar and scheduling information between users and computers, described at <https://…/>.
CALF Coarse Approximation Linear Function
Contains a greedy algorithm for coarse approximation linear function.
CalibrateSSB Weighting and Estimation for Panel Data with Non-Response
Function to calculate weights and estimates for panel data with non-response.
CalibratR Mapping ML Scores to Calibrated Predictions
Transforms your uncalibrated Machine Learning scores to well-calibrated prediction estimates that can be interpreted as probability estimates. The implemented BBQ (Bayes Binning in Quantiles) model is taken from Naeini (2015, ISBN:0-262-51129-0).
CaliCo Code Calibration in a Bayesian Framework
Calibration of every computational code. It uses a Bayesian framework to rule the estimation. With a new data set, the prediction will create a prevision set taking into account the new calibrated parameters. The choices between several models is also available. The methods are described in the paper Carmassi et al. (2018) <arXiv:1801.01810>.
callr Call R from R
It is sometimes useful to perform a computation in a separate R process, without affecting the current R process at all. This packages does exactly that.
calpassapi R Interface to Access CalPASS API
Implements methods for querying data from CalPASS using its API. CalPASS Plus. MMAP API V1. <https://…/index.html>.
CAM Causal Additive Model (CAM)
The code takes an n x p data matrix and fits a Causal Additive Model (CAM) for estimating the causal structure of the underlying process. The output is a p x p adjacency matrix (a one in entry (i,j) indicates an edge from i to j). Details of the algorithm can be found in: P. Bühlmann, J. Peters, J. Ernest: “CAM: Causal Additive Models, high-dimensional order search and penalized regression”, Annals of Statistics 42:2526-2556, 2014.
canvasXpress Visualization Package for CanvasXpress in R
Enables creation of visualizations using the CanvasXpress framework in R. CanvasXpress is a standalone JavaScript library for reproducible research with complete tracking of data and end-user modifications stored in a single PNG image that can be played back. See <> for more information.
cap Covariate Assisted Principal (CAP) Regression for Covariance Matrix Outcomes
Performs Covariate Assisted Principal (CAP) Regression for covariance matrix outcomes. The method identifies the optimal projection direction which maximizes the log-likelihood function of the log-linear heteroscedastic regression model in the projection space. See Zhao et al. (2018), Covariate Assisted Principal Regression for Covariance Matrix Outcomes, <doi:10.1101/425033> for details.
capitalR Capital Budgeting Analysis, Annuity Loan Calculations and Amortization Schedules
Provides Capital Budgeting Analysis functionality and the essential Annuity loan functions. Also computes Loan Amortization Schedules including schedules with irregular payments.
capn Capital Asset Pricing for Nature
Implements approximation methods for natural capital asset prices suggested by Fenichel and Abbott (2014) <doi:10.1086/676034> in Journal of the Associations of Environmental and Resource Economists (JAERE), Fenichel et al. (2016) <doi:10.1073/pnas.1513779113> in Proceedings of the National Academy of Sciences (PNAS), and Yun et al. (2017) in PNAS (accepted), and their extensions: creating Chebyshev polynomial nodes and grids, calculating basis of Chebyshev polynomials, approximation and their simulations for: V-approximation (single and multiple stocks, PNAS), P-approximation (single stock, PNAS), and Pdot-approximation (single stock, JAERE). Development of this package was generously supported by the Knobloch Family Foundation.
caRamel Automatic Calibration by Evolutionary Multi Objective Algorithm
Multi-objective optimizer initially developed for the calibration of hydrological models. The algorithm is a hybrid of the MEAS algorithm (Efstratiadis and Koutsoyiannis (2005) <doi:10.13140/RG.2.2.32963.81446>) by using the directional search method based on the simplexes of the objective space and the epsilon-NGSA-II algorithm with the method of classification of the parameter vectors archiving management by epsilon-dominance (Reed and Devireddy <doi:10.1142/9789812567796_0004>).
carData Companion to Applied Regression Data Sets
Datasets to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage (forthcoming).
careless Procedures for Computing Indices of Careless Responding
When taking online surveys, participants sometimes respond to items without regard to their content. These types of responses, referred to as careless or insufficient effort responding, constitute significant problems for data quality, leading to distortions in data analysis and hypothesis testing, such as spurious correlations. The ‘R’ package ‘careless’ provides solutions designed to detect such careless / insufficient effort responses by allowing easy calculation of indices proposed in the literature. It currently supports the calculation of longstring, even-odd consistency, psychometric synonyms/antonyms, Mahalanobis distance, and intra-individual response variability (also termed inter-item standard deviation). For a review of these methods, see Curran (2016) <doi:10.1016/j.jesp.2015.07.006>.
caret Classification and Regression Training
Misc functions for training and plotting classification and regression models.
caretEnsemble Ensembles of Caret Models
Functions for creating ensembles of caret models: caretList, caretEnsemble, and caretStack. caretList is a convenience function for fitting multiple caret::train models to the same dataset. caretEnsemble will make a linear combination of these models using greedy forward selection, and caretStack will make linear or non-linear combinations of these models, using a caret::train model as a meta-model.
carfima Continuous-Time Fractionally Integrated ARMA Process for Irregularly Spaced Long-Memory Time Series Data
We provide a toolbox to fit a continuous-time fractionally integrated ARMA process (CARFIMA) on univariate and irregularly spaced time series data via frequentist or Bayesian machinery. A general-order CARFIMA(p, H, q) model for p>q is specified in Tsai and Chan (2005)<doi:10.1111/j.1467-9868.2005.00522.x> and it involves p+q+2 unknown model parameters, i.e., p AR parameters, q MA parameters, Hurst parameter H, and process uncertainty (standard deviation) sigma. The package produces their maximum likelihood estimates and asymptotic uncertainties using a global optimizer called the differential evolution algorithm. It also produces their posterior distributions via Metropolis within a Gibbs sampler equipped with adaptive Markov chain Monte Carlo for posterior sampling. These fitting procedures, however, may produce numerical errors if p>2. The toolbox also contains a function to simulate discrete time series data from CARFIMA(p, H, q) process given the model parameters and observation times.
carpenter Build Common Tables of Summary Statistics for Reports
Mainly used to build tables that are commonly presented for bio-medical/health research, such as basic characteristic tables or descriptive statistics.
carrier Isolate Functions for Remote Execution
Sending functions to remote processes can be wasteful of resources because they carry their environments with them. With the carrier package, it is easy to create functions that are isolated from their environment. These isolated functions, also called crates, print at the console with their total size and can be easily tested locally before being sent to a remote.
CARRoT Predicting Categorical and Continuous Outcomes Using Rule of Ten
Predicts categorical or continuous outcomes while concentrating on four key points. These are Cross-validation, Accuracy, Regression and Rule of Ten (CARRoT). It performs the cross-validation specified number of times by partitioning the input into training and test set and fitting linear/multinomial/binary regression models to the training set. All regression models satisfying a rule of ten events per variable are fitted and the ones with the best predictive power are given as an output. Best predictive power is understood as highest accuracy in case of binary/multinomial outcomes, smallest absolute and relative errors in case of continuous outcomes. For binary case there is also an option of finding a regression model which gives the highest AUROC (Area Under Recever Operating Curve) value. The option of parallel toolbox is also available. Methods are described in Peduzzi et al. (1996) <doi:10.1016/S0895-4356(96)00236-3> and Rhemtulla et al. (2012) <doi:10.1037/a0029315>.
CARS Covariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference
It implements the CARS procedure, which is a two-sample multiple testing procedure that utilizes an additional auxiliary variable to capture the sparsity information, hence improving power. The CARS procedure is shown to be asymptotically valid and optimal for FDR control. For more information, please see the website <http://…/CARS.html> and the accompanying paper.
carSurv Correlation-Adjusted Regression Survival (CARS) Scores
Contains functions to estimate the Correlation-Adjusted Regression Survival (CARS) Scores. The method is described in Welchowski, T. and Zuber, V. and Schmid, M., (2018), Correlation-Adjusted Regression Survival Scores for High-Dimensional Variable Selection, <arXiv:1802.08178>.
cartograflow Filtering Matrix for Flow Mapping
Functions to prepare and filter an origin-destination matrix for thematic flow mapping purposes. This comes after Bahoken, Francoise (2016), Mapping flow matrix a contribution, PhD in Geography – Territorial sciences. See Bahoken (2017) <doi:10.4000/netcom.2565>.
cartogram Create Cartograms with R
Construct a continuous area cartogram by a rubber sheet distortion algorithm.
Cartographer Interactive Maps for Data Exploration
Cartographer provides interactive maps in R Markdown documents or at the R console. These maps are suitable for data exploration. This package is an R wrapper around Elijah Meeks’s d3-carto-map and d3.js, using htmlwidgets for R.
cartography Thematic Cartography
Create and integrate maps in your R workflow. This package allows various cartographic representations: proportional symbols, chroropleth, typology, flows, discontinuities… It also proposes some additional useful features: cartographic palettes, layout (scale, north arrow, title…), labels, legends, access to cartographic API…
cartools Tools for Understanding Highway Performance
Analytical tools are designed to help people understand the complex relationships associated with freeway performance and traffic breakdown. Emphasis is placed on: (1) Traffic noise or volatility; (2) Driver behavior and safety; and (3) Stochastic modeling, models that explain breakdown and performance.
carx Censored Autoregressive Model with Exogenous Covariates
A censored time series class is designed. An estimation procedure is implemented to estimate the Censored AutoRegressive time series with eXogenous covariates (CARX), assuming normality of the innovations. Some other functions that might be useful are also included.
casebase Fitting Flexible Smooth-in-Time Hazards and Risk Functions via Logistic and Multinomial Regression
Implements the case-base sampling approach of Hanley and Miettinen (2009) <DOI:10.2202/1557-4679.1125>, Saarela and Arjas (2015) <DOI:10.1111/sjos.12125>, and Saarela (2015) <DOI:10.1007/s10985-015-9352-x>, for fitting flexible hazard regression models to survival data with single event type or multiple competing causes via logistic and multinomial regression. From the fitted hazard function, cumulative incidence, risk functions of time, treatment and profile can be derived. This approach accommodates any log-linear hazard function of prognostic time, treatment, and covariates, and readily allows for non-proportionality. We also provide a plot method for visualizing incidence density via population time plots.
CAST caret’ Applications for Spatial-Temporal Models
Supporting functionality to run ‘caret’ with spatial or spatial-temporal data. ‘caret’ is a frequently used package for model training and prediction using machine learning. This package includes functions to improve spatial-temporal modelling tasks using ‘caret’. It prepares data for Leave-Location-Out and Leave-Time-Out cross-validation which are target-oriented validation strategies for spatial-temporal models. To decrease overfitting and improve model performances, the package implements a forward feature selection that selects suitable predictor variables in view to their contribution to the target-oriented performance.
catch Covariate-Adjusted Tensor Classification in High-Dimensions
Performs classification and variable selection on high-dimensional tensors (multi-dimensional arrays) after adjusting for additional covariates (scalar or vectors) as CATCH model in Pan, Mai and Zhang (2018) <arXiv:1805.04421>. The low-dimensional covariates and the high-dimensional tensors are jointly modeled to predict a categorical outcome in a multi-class discriminant analysis setting. The Covariate-Adjusted Tensor Classification in High-dimensions (CATCH) model is fitted in two steps: (1) adjust for the covariates within each class; and (2) penalized estimation with the adjusted tensor using a cyclic block coordinate descent algorithm. The package can provide a solution path for tuning parameter in the penalized estimation step. Special case of the CATCH model includes linear discriminant analysis model and matrix (or tensor) discriminant analysis without covariates.
catcont Test for and Identify Categorical or Continuous Values
Methods and utilities for classifying vectors as categorical or continuous.
catdap Categorical Data Analysis Program Package
Categorical data analysis program package.
cate High Dimensional Factor Analysis and Confounder Adjusted Testing and Estimation
Provides several methods for factor analysis in high dimension (both n,p >> 1) and methods to adjust for possible confounders in multiple hypothesis testing.
CatEncoders Encoders for Categorical Variables
Contains some commonly used categorical variable encoders, such as ‘LabelEncoder’ and ‘OneHotEncoder’. Inspired by the encoders implemented in python ‘sklearn.preprocessing’ package (see <http://…/preprocessing.html> ).
CATkit Chronomics Analysis Toolkit (CAT): Analyze Periodicity
Performs analysis of sinusoidal rhythms in time series data: actogram, smoothing, autocorrelation, crosscorrelation, several flavors of cosinor.
CatPredi Optimal Categorisation of Continuous Variables in Prediction Models
Allows the user to categorise a continuous predictor variable in a logistic or a Cox proportional hazards regression setting, by maximising the discriminative ability of the model. I Barrio, I Arostegui, MX Rodriguez-Alvarez, JM Quintana (2015) <doi:10.1177/0962280215601873>. I Barrio, MX Rodriguez-Alvarez, L Meira-Machado, C Esteban, I Arostegui (2017) <https://…/41.1.3.barrio-etal.pdf>.
catSurv Computerized Adaptive Testing for Survey Research
Provides methods of computerized adaptive testing for survey researchers. Includes functionality for data fit with the classic item response methods including the latent trait model, Birnbaum`s three parameter model, the graded response, and the generalized partial credit model. Additionally, includes several ability parameter estimation and item selection routines. During item selection, all calculations are done in compiled C++ code.
CATT The Cochran-Armitage Trend Test
The Cochran-Armitage trend test can be applied to a two by k contingency table. The test statistic (Z) and p-value will be reported. A linear trend in the frequencies will be calculated, because the weights (0,1,2) will be used by default.
CATTexact Computation of the p-Value for the Exact Conditional Cochran-Armitage Trend Test
Provides functions for computing the one-sided p-values of the Cochran-Armitage trend test statistic for the asymptotic and the exact conditional test. The computation of the p-value for the exact test is performed using an algorithm following an idea by Mehta, et al. (1992) <doi:10.2307/1390598>.
CausalFX Methods for Estimating Causal Effects from Observational Data
Estimate causal effects of one variable on another, currently for binary data only. Methods include instrumental variable bounds, adjustment by a given covariate set, adjustment by an induced covariate set using a variation of the PC algorithm, and an effect bounding method (the Witness Protection Program) based on covariate adjustment with observable independence constraints.
CausalImpact An R package for causal inference in time series
This R package implements an approach to estimating the causal effect of a designed intervention on a time series. For example, how many additional daily clicks were generated by an advertising campaign? Answering a question like this can be difficult when a randomized experiment is not available. The package aims to address this difficulty using a structural Bayesian time-series model to estimate how the response metric might have evolved after the intervention if the intervention had not occurred. As with all approaches to causal inference on non-experimental data, valid conclusions require strong assumptions. The CausalImpact package, in particular, assumes that the outcome time series can be explained in terms of a set of control time series that were themselves not affected by the intervention. Furthermore, the relation between treated series and control series is assumed to be stable during the post-intervention period. Understanding and checking these assumptions for any given application is critical for obtaining valid conclusions.
causalMGM Causal Learning of Mixed Graphical Models
Allows users to learn undirected and directed (causal) graphs over mixed data types (i.e., continuous and discrete variables). To learn a directed graph over mixed data, it first calculates the undirected graph (Sedgewick et al, 2016) and then it uses local search strategies to prune-and-orient this graph (Sedgewick et al, 2017). AJ Sedgewick, I Shi, RM Donovan, PV Benos (2016) <doi:10.1186/s12859-016-1039-0>. AJ Sedgewick, JD Ramsey, P Spirtes, C Glymour, PV Benos (2017) <arXiv:1704.02621>.
causalpie An R Package for easily creating and visualizing sufficient-component cause models
causalpie is an R package for creating tidy sufficient-component causal models. Create and analyze sufficient causes and plot them easily in ggplot2.
causalweight Causal Inference Based on Weighting Estimators
Various estimation methods for causal inference based on inverse probability weighting. Specifically, the package includes methods for estimating average treatment effects as well as direct and indirect effects in causal mediation analysis. The models refer to the studies of Frölich (2007) <doi:10.1016/j.jeconom.2006.06.004>, Huber (2012) <doi:10.3102/1076998611411917>, Huber (2014) <doi:10.1080/07474938.2013.806197>, Huber (2014) <doi:10.1002/jae.2341>, and Frölich and Huber (2017) <doi:10.1111/rssb.12232>.
cbanalysis Coffee Break Descriptive Analysis
Contains function which subsets the input data frame based on the variable types and returns list of data frames.
cbar Contextual Bayesian Anomaly Detection in R
Detect contextual anomalies in time-series data with Bayesian data analysis. It focuses on determining a normal range of target value, and provides simple-to-use functions to abstract the outcome.
CBCgrps Compare Baseline Characteristics Between Groups
Compare baseline characteristics between two groups. The variables being compared can be factor and numeric variables. The function will automatically judge the type and distribution of the variables, and make statistical description and bivariate analysis.
CBDA Compressive Big Data Analytics
Classification performed on Big Data. It uses concepts from compressive sensing, and implements ensemble predictor (i.e., ‘SuperLearner’) and knockoff filtering as the main machine learning and feature mining engines.
cbinom Continuous Analog of a Binomial Distribution
Implementation of the d/p/q/r family of functions for a continuous analog to the standard discrete binomial with continuous size parameter and continuous support with x in [0, size + 1], following Ilienko (2013) <arXiv:1303.5990>.
cbird Clustering of Multivariate Binary Data with Dimension Reduction via L1-Regularized Likelihood Maximization
The clustering of binary data with reducing the dimensionality (CLUSBIRD) proposed by Yamamoto and Hayashi (2015) <doi:10.1016/j.patcog.2015.05.026>.
CBPS Covariate Balancing Propensity Score
Implements the covariate balancing propensity score (CBPS) proposed by Imai and Ratkovic (2014) <DOI:10.1111/rssb.12027>. The propensity score is estimated such that it maximizes the resulting covariate balance as well as the prediction of treatment assignment. The method, therefore, avoids an iteration between model fitting and balance checking. The package also implements several extensions of the CBPS beyond the cross-sectional, binary treatment setting. The current version implements the CBPS for longitudinal settings so that it can be used in conjunction with marginal structural models from Imai and Ratkovic (2015) <DOI:10.1080/01621459.2014.956872>, treatments with three- and four- valued treatment variables, continuous-valued treatments from Fong, Hazlett, and Imai (2015) <http://…/CBGPS.pdf>, and the situation with multiple distinct binary treatments administered simultaneously. In the future it will be extended to other settings including the generalization of experimental and instrumental variable estimates. Recently we have added the optimal CBPS which chooses the optimal balancing function and results in doubly robust and efficient estimator for the treatment effect as well as high dimensional CBPS when a large number of covariates exist.
cbsem Simulation, Estimation and Segmentation of Composite Based Structural Equation Models
The composites are linear combinations of their indicators in composite based structural equation models. Structural models are considered consisting of two blocks. The indicators of the exogenous composites are named by X, the indicators of the endogenous by Y. Reflective relations are given by arrows pointing from the composite to their indicators. Their values are called loadings. In a reflective-reflective scenario all indicators have loadings. Arrows are pointing to their indicators only from the endogenous composites in the formative-reflective scenario. There are no loadings at all in the formative-formative scenario. The covariance matrices are computed for these three scenarios. They can be used to simulate these models. These models can also be estimated and a segmentation procedure is included as well.
cccp Cone Constrained Convex Problems
Routines for solving convex optimization problems with cone constraints by means of interior-point methods. The implemented algorithms are partially ported from CVXOPT, a Python module for convex optimization (see for more information ).
ccdrAlgorithm CCDr Algorithm for Learning Sparse Gaussian Bayesian Networks
Implementation of the CCDr (Concave penalized Coordinate Descent with reparametrization) structure learning algorithm as described in Aragam and Zhou (2015) <http://…/aragam15a.html>. This is a fast, score-based method for learning Bayesian networks that uses sparse regularization and block-cyclic coordinate descent.
ccfa Continuous Counterfactual Analysis
Contains methods for computing counterfactuals with a continuous treatment variable as in Callaway and Huang (2017) <>. In particular, the package can be used to calculate the expected value, the variance, the interquantile range, the fraction of observations below or above a particular cutoff, or other user-supplied functions of an outcome of interest conditional on a continuous treatment. The package can also be used for computing these same functionals after adjusting for differences in covariates at different values of the treatment. Further, one can use the package to conduct uniform inference for each parameter of interest across all values of the treatment, uniformly test whether adjusting for covariates makes a difference at any value of the treatment, and test whether a parameter of interest is different from its average value at an value of the treatment.
CCMnet Simulate Congruence Class Model for Networks
Tools to simulate networks based on Congruence Class models.
ccrs Correct and Cluster Response Style Biased Data
Functions for performing Correcting and Clustering response-style-biased preference data (CCRS). The main functions are correct.RS() for correcting for response styles, and ccrs() for simultaneously correcting and content-based clustering. The procedure begin with making rank-ordered boundary data from the given preference matrix using a function called create.ccrsdata(). Then in correct.RS(), the response style is corrected as follows: the rank-ordered boundary data are smoothed by I-spline functions, the given preference data are transformed by the smoothed functions. The resulting data matrix, which is considered as bias-corrected data, can be used for any data analysis methods. If one wants to cluster respondents based on their indicated preferences (content-based clustering), ccrs() can be applied to the given (response-style-biased) preference data, which simultaneously corrects for response styles and clusters respondents based on the contents. Also, the correction result can be checked by function.
cdata Wrappers for ‘tidyr::gather()’ and ‘tidyr::spread()’
Supplies deliberately verbose wrappers for ‘tidyr::gather()’ and ‘tidyr::spread()’, and an explanatory vignette. Useful for training and for enforcing preconditions.
cdcsis Conditional Distance Correlation and Its Related Feature Screening Method
Gives conditional distance correlation and performs the conditional distance correlation sure independence screening procedure for ultrahigh dimensional data. The conditional distance correlation is a novel conditional dependence measurement of two random variables given a third variable. The conditional distance correlation sure independence screening is used for screening variables in ultrahigh dimensional setting.
cdfquantreg Quantile Regression for Random Variables on the Unit Interval
Employs a two-parameter family of distributions for modelling random variables on the (0, 1) interval by applying the cumulative distribution function (cdf) of one parent distribution to the quantile function of another.
cdparcoord Top Frequency-Based Parallel Coordinates
Parallel coordinate plotting with resolutions for large data sets and missing values.
CDVineCopulaConditional Sampling from Conditional C- and D-Vine Copulas
Provides tools for sampling from a conditional copula density decomposed via Pair-Copula Constructions as C- or D- vine. Here, the vines which can be used for such sampling are those which sample as first the conditioning variables (when following the sampling algorithms shown in Aas et al. (2009) <DOI:10.1016/j.insmatheco.2007.02.001>). The used sampling algorithm is presented and discussed in Bevacqua et al. (2017) <DOI:10.5194/hess-2016-652>, and it is a modified version of that from Aas et al. (2009) <DOI:10.1016/j.insmatheco.2007.02.001>. A function is available to select the best vine (based on information criteria) among those which allow for such conditional sampling. The package includes a function to compare scatterplot matrices and pair-dependencies of two multivariate datasets.
CEAmarkov Cost-Effectiveness Analysis using Markov Models
Provides an accurate, fast and easy way to perform cost-effectiveness analyses. This package can be used to validate results generated using different methods and can help create a standard for cost-effectiveness analyses, that will help compare results from different studies.
CEC Cross-Entropy Clustering
Cross-Entropy Clustering (CEC) divides the data into Gaussian type clusters. It performs the automatic reduction of unnecessary clusters, while at the same time allows the simultaneous use of various type Gaussian mixture models.
ceg Chain Event Graph
Create and learn Chain Event Graph (CEG) models using a Bayesian framework. It provides us with a Hierarchical Agglomerative algorithm to search the CEG model space. The package also includes several facilities for visualisations of the objects associated with a CEG. The CEG class can represent a range of relational data types, and supports arbitrary vertex, edge and graph attributes. A Chain Event Graph is a tree-based graphical model that provides a powerful graphical interface through which domain experts can easily translate a process into sequences of observed events using plain language. CEGs have been a useful class of graphical model especially to capture context-specific conditional independences. References: Collazo R, Gorgen C, Smith J. Chain Event Graph. CRC Press, ISBN 9781498729604, 2018 (forthcoming); and Barday LM, Collazo RA, Smith JQ, Thwaites PA, Nicholson AE. The Dynamic Chain Event Graph. Electronic Journal of Statistics, 9 (2) 2130-2169 <doi:10.1214/15-EJS1068>.
cellWise Analyzing Data with Cellwise Outliers
Tools for detecting cellwise outliers and robust methods to analyze data which may contain them.
cems Conditional Expectation Manifolds
Conditional expectation manifolds are an approach to compute principal curves and surfaces.
cenGAM Censored Regression with Smooth Terms
Implementation of Tobit type I and type II families for censored regression using the ‘mgcv’ package, based on methods detailed in Woods (2016) <doi:10.1080/01621459.2016.1180986>.
censorcopula Estimate Parameter of Bivariate Copula
Implement an interval censor method to break ties when using data with ties to fitting a bivariate copula.
CensSpatial Censored Spatial Models
It fits linear regression models for censored spatial data. It provides different estimation methods as the SAEM (Stochastic Approximation of Expectation Maximization) algorithm and seminaive that uses Kriging prediction to estimate the response at censored locations and predict new values at unknown locations. It also offers graphical tools for assessing the fitted model.
centiserve Find Graph Centrality Indices
Calculates centrality indices additional to the ‘igraph’ package centrality functions.
centralplot Show the Strength of Relationships Between Centre and Peripheral Items
The degree of correlation between centre and peripheral items are shown by the length of the line between them. You can self-define the length by inputing the ‘distance’ parameter. For example, you can input (1 – Pearson’s correlation coefficient) as ‘distance’ so that the stronger the correlation between centre and peripheral item, the nearer they will be in this plot. Also, If you do a hypothesis test and the null hypothesis is centre and peripheral items are the same, you can input -log(P) as distance. To sum up, the stronger the correlation between centre and peripheral is, the smaller the ‘distance’ parameter should be. Due to its high degree of freedom, it can be applied to many different circumstance.
cents Censored time series
Fit censored time series
CEoptim Cross-Entropy R Package for Optimization
Optimization solver based on the Cross-Entropy method.
CepLDA Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability
Performs cepstral based discriminant analysis of groups of time series when there exists Variability in power spectra from time series within the same group as described in R.T. Krafty (2016) ‘Discriminant Analysis of Time Series in the Presence of Within-Group Spectral Variability’ Journal of Time Series Analysis.
cetcolor CET Perceptually Uniform Colour Maps
Collection of perceptually uniform colour maps made by Peter Kovesi (2015) ‘Good Colour Maps: How to Design Them’ <arXiv:1509.03700> at the Centre for Exploration Targeting (CET).
ceterisParibus Ceteris Paribus Plots (What-If Plots) for a Single Observation
Ceteris Paribus Plots (What-If Plots) are designed to present model responses around a single point in a feature space. For example around a single prediction for an interesting observation. Plots are designed to work in a model-agnostic fashion, they are working for any predictive Machine Learning model and allow for model comparisons. The Ceteris Paribus Plots supplement the Break Down Plots from ‘breakDown’ package.
cfa Configural Frequency Analysis (CFA)
Analysis of configuration frequencies for simple and repeated measures, multiple-samples CFA, hierarchical CFA, bootstrap CFA, functional CFA, Kieser-Victor CFA, and Lindner’s test using a conventional and an accelerated algorithm.
CFC Cause-Specific Framework for Competing-Risk Analysis
Functions for combining survival curves of competing risks to produce cumulative incidence and event-free probability functions, and for summarizing and plotting the results. Survival curves can be either time-denominated or probability-denominated. Point estimates as well as Bayesian, sample-based representations of survival can utilize this framework.
cfma Causal Functional Mediation Analysis
Performs causal functional mediation analysis (CFMA) for functional treatment, functional mediator, and functional outcome. This package includes two functional mediation model types: (1) a concurrent mediation model and (2) a historical influence mediation model. See Zhao et al. (2018), Functional Mediation Analysis with an Application to Functional Magnetic Resonance Imaging Data, <arXiv:1805.06923> for details.
CGE Computing General Equilibrium
Developing general equilibrium models, computing general equilibrium and simulating economic dynamics with structural dynamic models in LI (2019, ISBN: 9787521804225) ‘General Equilibrium and Structural Dynamics: Perspectives of New Structural Economics. Beijing: Economic Science Press’.
cghRA Array CGH Data Analysis and Visualization
Provides functions to import data from Agilent CGH arrays and process them according to the cghRA workflow. Implements several algorithms such as WACA, STEPS and cnvScore and an interactive graphical interface.
cglasso L1-Penalized Censored Gaussian Graphical Models
The l1-penalized censored Gaussian graphical model (cglasso) is an extension of the graphical lasso estimator developed to handle datasets with censored observations. An EM-like algorithm is implemented to estimate the parameters of the censored Gaussian graphical models.
CGP Composite Gaussian process models
Fit composite Gaussian process (CGP) models as described in Ba and Joseph (2012) ‘Composite Gaussian Process Models for Emulating Expensive Functions’, Annals of Applied Statistics. The CGP model is capable of approximating complex surfaces that are not second-order stationary. Important functions in this package are CGP, print.CGP, summary.CGP, predict.CGP and plotCGP.
CGPfunctions Powell Miscellaneous Functions for Teaching and Learning Statistics
Miscellaneous functions useful for teaching statistics as well as actually practicing the art. They typically are not “new” methods but rather wrappers around either base R or other packages. Currently contains: ‘Plot2WayANOVA’ which as the name implies conducts a 2 way ANOVA and plots the results using ‘ggplot2’. ‘neweta’ which is a helper function that appends the results of a Type II eta squared calculation onto a classic ANOVA table. Mode which finds the modal value in a vector of data. ‘SeeDist’ which wraps around ‘ggplot2’ to provide visualizations of univariate data.
cgwtools Miscellaneous Tools
A set of tools the author has found useful for performing quick observations or evaluations of data, including a variety of ways to list objects by size, class, etc. Several other tools mimic Unix shell commands, including ‘head’, ‘tail’ ,’pushd’ ,and ‘popd’. The functions ‘seqle’ and ‘reverse.seqle’ mimic the base ‘rle’ but can search for linear sequences. The function ‘splatnd’ allows the user to generate zero-argument commands without the need for ‘makeActiveBinding’ .
chandwich Chandler-Bate Sandwich Loglikelihood Adjustment
Performs adjustments of a user-supplied independence loglikelihood function using a robust sandwich estimator of the parameter covariance matrix, based on the methodology in Chandler and Bate (2007) <doi:10.1093/biomet/asm015>. This can be used for cluster correlated data when interest lies in the parameters of the marginal distributions or for performing inferences that are robust to certain types of model misspecification. Functions for profiling the adjusted loglikelihoods are also provided, as are functions for calculating and plotting confidence intervals, for single model parameters, and confidence regions, for pairs of model parameters.
changepoint An R package for changepoint analysis
Implements various mainstream and specialised changepoint methods for finding single and multiple changepoints within data. Many popular non-parametric and frequentist methods are included. The cpt.mean, cpt.var, cpt.meanvar functions should be your first point of call. Methods for Nonparametric Changepoint Detection
Implements the multiple changepoint algorithm PELT with a nonparametric cost function based on the empirical distribution of the data. The function should be your first point of call. This package is an extension to the \code{changepoint} package which uses parametric changepoint methods. For further information on the methods see the documentation for \code{changepoint}.
changepointsHD Change-Point Estimation for Expensive and High-Dimensional Models
This implements the methods developed in, L. Bybee and Y. Atchade. (2017) <arXiv:1707.04306>. Contains a series of methods for estimating change-points given user specified black-box models. The methods include binary segmentation for multiple change-point estimation. For estimating each individual change-point the package includes simulated annealing, brute force, and, for Gaussian graphical models, an application specific rank-one update implementation. Additionally, code for estimating Gaussian graphical models is included. The goal of this package is to allow for the efficient estimation of change-points in complicated models with high dimensional data.
changepointsVar Change-Points Detections for Changes in Variance
Detection of change-points for variance of heteroscedastic Gaussian variables with piecewise constant variance function. Adelfio, G. (2012), Change-point detection for variance piecewise constant models, Communications in Statistics, Simulation and Computation, 41:4, 437-448, <doi:10.1080/03610918.2011.592248>.
ChangepointTesting Change Point Estimation for Clustered Signals
A multiple testing procedure for clustered alternative hypotheses. It is assumed that the p-values under the null hypotheses follow U(0,1) and that the distributions of p-values from the alternative hypotheses are stochastically smaller than U(0,1). By aggregating information, this method is more sensitive to detecting signals of low magnitude than standard methods. Additionally, sporadic small p-values appearing within a null hypotheses sequence are avoided by averaging on the neighboring p-values.
changer Change R Package Name
Changing the name of an existing R package is annoying but common task especially in the early stages of package development. This package (mostly) automates this task.
ChannelAttributionApp Shiny Web Application for the Multichannel Attribution Problem
Shiny Web Application for the Multichannel Attribution Problem. It is basically a user-friendly graphical interface for running and comparing all the attribution models in package ‘ChannelAttribution’. For customizations or interest in other statistical methodologies for web data analysis please contact <>.
Chaos01 0-1 Test for Chaos
Computes and plot the results of the 0-1 test for chaos proposed by Gottwald and Melbourne (2004) <DOI:10.1137/080718851>. The algorithm is available in parallel for the independent values of parameter c.
CharFun Numerical Computation Cumulative Distribution Function and Probability Density Function from Characteristic Function
The Characteristic Functions Toolbox (CharFun) consists of a set of algorithms for evaluating selected characteristic functions and algorithms for numerical inversion of the (combined and/or compound) characteristic functions, used to evaluate the probability density function (PDF) and the cumulative distribution function (CDF).
charlatan Make Fake Data
Make fake data, supporting addresses, person names, dates, times, colors, coordinates, currencies, digital object identifiers (‘DOIs’), jobs, phone numbers, ‘DNA’ sequences, doubles and integers from distributions and within a range.
chartql Simplified Language for Plots and Charts
Provides a very simple syntax for the user to generate custom plot(s) without having to remember complicated ‘ggplot2’ syntax. The ‘chartql’ package uses ‘ggplot2’ and manages all the syntax complexities internally. As an example, to generate a bar chart of company sales faceted by product category further faceted by season of the year, we simply write: ‘CHART bar X category, season Y sales’.
checkarg Check the Basic Validity of a (Function) Argument
Utility functions that allow checking the basic validity of a function argument or any other value, including generating an error and assigning a default in a single line of code. The main purpose of the package is to provide simple and easily readable argument checking to improve code robustness.
checkLuhn Checks if a Number is Valid Using the Luhn Algorithm
Confirms if the number is Luhn compliant. Can check if credit card, IMEI number or any other Luhn based number is correct. For more info see: <https://…/Luhn_algorithm>.
checkpoint Install Packages from Snapshots on the Checkpoint Server for Reproducibility
The goal of checkpoint is to solve the problem of package reproducibility in R. Specifically, checkpoint allows you to install packages as they existed on CRAN on a specific snapshot date as if you had a CRAN time machine. To achieve reproducibility, the checkpoint() function installs the packages required or called by your project and scripts to a local library exactly as they existed at the specified point in time. Only those packages are available to your project, thereby avoiding any package updates that came later and may have altered your results. In this way, anyone using checkpoint’s checkpoint() can ensure the reproducibility of your scripts or projects at any time. To create the snapshot archives, once a day (at midnight UTC) we refresh the Austria CRAN mirror, on the “Managed R Archived Network” server ( ). Immediately after completion of the rsync mirror process, we take a snapshot, thus creating the archive. Snapshot archives exist starting from 2014-09-17.
checkr Check Object Classes, Values, Names and Dimensions
Checks the classes, values, names and dimensions of scalar, vectors, lists and data frames. Issues an informative error (or warning) if checks fail. Otherwise it returns the original object allowing it to be used in pipes.
cheese Tools for Intuitive and Flexible Statistical Analysis Workflows
Contains flexible and intuitive functions to assist in carrying out tasks in a statistical analysis and to get from the raw data to presentation-ready results. A user-friendly interface is used in specialized functions that are aimed at common tasks such as building a univariate descriptive table for variables in a dataset. These high-level functions are built on a collection of low(er)-level functions that may be useful for aspects of a custom statistical analysis workflow or for general programming use.
CHFF Closest History Flow Field Forecasting for Bivariate Time Series
The software matches the current history to the closest history in a time series to build a forecast.
chi2x3way Chi-Squared and Tau Index Partitions for Three-Way Contingency Tables
Provides two index partitions for three-way contingency tables: partition of the association measure chi-squared and of the predictability index tau under several representative hypotheses about the expected frequencies (hypothesized probabilities).
chicane Capture Hi-C Analysis Engine
Toolkit for processing and calling interactions in capture Hi-C data. Converts BAM files into counts of reads linking restriction fragments, and identifies pairs of fragments that interact more than expected by chance. Significant interactions are identified by comparing the observed read count to the expected background rate from a count regression model.
chinese.misc Miscellaneous Tools for Chinese Text Mining and More
Efforts are made to make Chinese text mining easier, faster, and robust to errors. Document term matrix can be generated by only one line of code; detecting encoding, segmenting and removing stop words are done automatically. Some convenient tools are also supplied.
ChIPtest Nonparametric Methods for Identifying Differential Enrichment Regions with ChIP-Seq Data
Nonparametric Tests to identify the differential enrichment region for two conditions or time-course ChIP-seq data. It includes: data preprocessing function, estimation of a small constant used in hypothesis testing, a kernel-based two sample nonparametric test, two assumption-free two sample nonparametric test.
CHMM Coupled Hidden Markov Models
An exact and a variational inference for coupled Hidden Markov Models applied to the joint detection of copy number variations.
chngpt Estimation and Hypothesis Testing for Threshold Regression
Threshold regression models are also called two-phase regression, broken-stick regression, split-point regression, structural change models, and regression kink models. Methods for both continuous and discontinuous threshold models are included, but the support for the former is much greater. This package is described in Fong, Huang, Gilbert and Permar (2017) chngpt: threshold regression model estimation and inference, BMC Bioinformatics, in press, <DOI:10.1186/s12859-017-1863-x>.
cholera Amend, Augment and Aid Analysis of John Snow’s Cholera Data
Amends errors, augments data and aids analysis of John Snow’s map of the 1854 London cholera outbreak. The original data come from Rusty Dodson and Waldo Tobler’s 1992 digitization of Snow’s map. Those data, <http://…/snow.html>, are no longer available. However, they are preserved in the ‘HistData’ package, <https://…/package=HistData>.
chopthin The Chopthin Resampler
Resampling is a standard step in particle filtering and in sequential Monte Carlo. This package implements the chopthin resampler, which keeps a bound on the ratio between the largest and the smallest weights after resampling.
ChoR Chordalysis R Package
Learning the structure of graphical models from datasets with thousands of variables. More information about the research papers detailing the theory behind Chordalysis is available at <http://…/Research> (KDD 2016, SDM 2015, ICDM 2014, ICDM 2013). The R package development site is <https://…/Monash-ChoR>.
choroplethr Simplify the Creation of Choropleth Maps in R
Choropleths are thematic maps where geographic regions, such as states, are colored according to some metric, such as the number of people who live in that state. This package simplifies this process by 1. Providing ready-made functions for creating choropleths of common maps. 2. Providing data and API connections to interesting data sources for making choropleths. 3. Providing a framework for creating choropleths from arbitrary shapefiles. Please see the vignettes for more details.
chunked Chunkwise Text-File Processing for ‘dplyr’
Text data can be processed chunkwise using ‘dplyr’ commands. These are recorded and executed per data chunk, so large files can be processed with limited memory using the ‘LaF’ package.
chunkR Read Tables in Chunks
Read external data tables in chunks using a C++ backend.
CIEE Estimating and Testing Direct Effects in Directed Acyclic Graphs using Estimating Equations
In many studies across different disciplines, detailed measures of the variables of interest are available. If assumptions can be made regarding the direction of effects between the assessed variables, this has to be considered in the analysis. The functions in this package implement the novel approach CIEE (causal inference using estimating equations; Konigorski et al., 2017, Genetic Epidemiology, in press) for estimating and testing the direct effect of an exposure variable on a primary outcome, while adjusting for indirect effects of the exposure on the primary outcome through a secondary intermediate outcome and potential factors influencing the secondary outcome. The underlying directed acyclic graph (DAG) of this considered model is described in the vignette. CIEE can be applied to studies in many different fields, and it is implemented here for the analysis of a continuous primary outcome and a time-to-event primary outcome subject to censoring. CIEE uses estimating equations to obtain estimates of the direct effect and robust sandwich standard error estimates. Then, a large-sample Wald-type test statistic is computed for testing the absence of the direct effect. Additionally, standard multiple regression, regression of residuals, and the structural equation modeling approach are implemented for comparison.
cinterpolate Interpolation From C
Simple interpolation methods designed to be used from C code. Supports constant, linear and spline interpolation. An R wrapper is included but this package is primarily designed to be used from C code using ‘LinkingTo’. The spline calculations are classical cubic interpolation, e.g., Forsythe, Malcolm and Moler (1977) <ISBN: 9780131653320>.
CIplot Functions to Plot Confidence Interval
Plot confidence interval from the objects of statistical tests such as t.test(), var.test(), cor.test(), prop.test() and fisher.test() (‘htest’ class), Tukey test [TukeyHSD()], Dunnett test [glht() in ‘multcomp’ package], logistic regression [glm()], and Tukey or Games-Howell test [posthocTGH() in ‘userfriendlyscience’ package]. Users are able to set the styles of lines and points. This package contains the function to calculate odds ratios and their confidence intervals from the result of logistic regression.
circglmbayes Bayesian Analysis of a Circular GLM
Perform a Bayesian analysis of a circular outcome General Linear Model (GLM), which allows regressing a circular outcome on linear and categorical predictors. Posterior samples are obtained by means of an MCMC algorithm written in ‘C++’ through ‘Rcpp’. Estimation and credible intervals are provided, as well as hypothesis testing through Bayes Factors. See Mulder and Klugkist (2017) <doi:10.1016/>.
CircOutlier Detecting of Outliers in Circular Regression
Detecting of outliers in circular-circular regression models, modifying its and estimating of models parameters.
circumplex Analysis and Visualization of Circular Data
Tools for analyzing and visualizing circular data, including a generalization of the bootstrapped structural summary method from Zimmermann & Wright (2017) <doi:10.1177/1073191115621795> and functions for creating publication-ready tables and figures from the results. Future versions will include tools for circular fit and reliability analyses, as well as greatly enhanced visualization methods.
cIRT Choice Item Response Theory
Jointly model the accuracy of cognitive responses and item choices within a bayesian hierarchical framework as described by Culpepper and Balamuta (2015) <doi:10.1007/s11336-015-9484-7>. In addition, the package contains the datasets used within the analysis of the paper.
Cite An RStudio Addin to Insert BibTex Citation in Rmarkdown Documents
Contain an RStudio addin to insert BibTex citation in Rmarkdown documents with a minimal user interface.
ciTools Confidence or Prediction Intervals, Quantiles, and Probabilities for Statistical Models
Functions to append confidence intervals, prediction intervals, and other quantities of interest to data frames. All appended quantities are for the response variable, after conditioning on the model and covariates. This package has a data frame first syntax that allows for easy piping. Currently supported models include (log-) linear, (log-) linear mixed, and generalized linear models.
citr RStudio Add-in to Insert Markdown Citations
Functions and an RStudio add-in to search a BibTeX-file to create and insert formatted Markdown citations into the current document.
ciuupi Confidence Intervals Utilizing Uncertain Prior Information
Computes a confidence interval for a specified linear combination of the regression parameters in a linear regression model with iid normal errors with known variance when there is uncertain prior information that a distinct specified linear combination of the regression parameters takes a given value. This confidence interval, found by numerical constrained optimization, has the required minimum coverage and utilizes this uncertain prior information through desirable expected length properties. This confidence interval has the following three practical applications. Firstly, if the error variance has been accurately estimated from previous data then it may be treated as being effectively known. Secondly, for sufficiently large (dimension of the response vector) minus (dimension of regression parameter vector), greater than or equal to 30 (say), if we replace the assumed known value of the error variance by its usual estimator in the formula for the confidence interval then the resulting interval has, to a very good approximation, the same coverage probability and expected length properties as when the error variance is known. Thirdly, some more complicated models can be approximated by the linear regression model with error variance known when certain unknown parameters are replaced by estimates. This confidence interval is described in Kabaila, P. and Mainzer, R. (2017) <arXiv:1708.09543>, and is a member of the family of confidence intervals proposed by Kabaila, P. and Giri, K. (2009) <doi:10.1016/j.jspi.2009.03.018>.
CKLRT Composite Kernel Machine Regression Based on Likelihood Ratio Test
Composite Kernel Machine Regression based on Likelihood Ratio Test (CKLRT): in this package, we develop a kernel machine regression framework to model the overall genetic effect of a SNP-set, considering the possible GE interaction. Specifically, we use a composite kernel to specify the overall genetic effect via a nonparametric function and we model additional covariates parametrically within the regression framework. The composite kernel is constructed as a weighted average of two kernels, one corresponding to the genetic main effect and one corresponding to the GE interaction effect. We propose a likelihood ratio test (LRT) and a restricted likelihood ratio test (RLRT) for statistical significance. We derive a Monte Carlo approach for the finite sample distributions of LRT and RLRT statistics. (N. Zhao, H. Zhang, J. Clark, A. Maity, M. Wu. Composite Kernel Machine Regression based on Likelihood Ratio Test with Application for Combined Genetic and Gene-environment Interaction Effect (Submitted).)
CLA Critical Line Algorithm in Pure R
Implements ‘Markovitz’ Critical Line Algorithm (‘CLA’) for classical mean-variance portfolio optimization. Care has been taken for correctness in light of previous buggy implementations.
clam Classical Age-Depth Modelling of Cores from Deposits
Performs ‘classical’ age-depth modelling of dated sediment deposits – prior to applying more sophisticated techniques such as Bayesian age-depth modelling. Any radiocarbon dated depths are calibrated. Age-depth models are constructed by sampling repeatedly from the dated levels, each time drawing age-depth curves. Model types include linear interpolation, linear or polynomial regression, and a range of splines. See Blaauw (2010). <doi:10.1016/j.quageo.2010.01.002>.
clampSeg Idealisation of Patch Clamp Recordings
Allows for idealisation of patch clamp recordings by implementing the non-parametric JUmp Local dEconvolution Segmentation filter JULES.
clarifai Access to Clarifai API
Get description of images from Clarifai API. For more information, see Clarifai uses a large deep learning cloud to come up with descriptive labels of the things in an image. It also provides how confident it is about each of the labels.
classifierplots Generates a Visualization of Classifier Performance as a Grid of Diagnostic Plots
Generates a visualization of binary classifier performance as a grid of diagnostic plots with just one function call. Includes ROC curves, prediction density, accuracy, precision, recall and calibration plots, all using ggplot2 for easy modification. Debug your binary classifiers faster and easier!
classiFunc Classification of Functional Data
Efficient implementation of k-nearest neighbor estimator and a kernel estimator for functional data classification.
classyfireR R Interface to the ClassyFire RESTful API
Access to the ClassyFire RESTful API <>. Retrieve existing entity classifications and submit new entities for classification.
cld2 Google’s Compact Language Detector 2
Bindings to Google’s C++ library Compact Language Detector 2 (see <https://…/cld2#readme> for more information). Probabilistically detects over 80 languages in UTF-8 text (plain text or HTML). For mixed-language input it returns the top three languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes).
cld3 Google’s Compact Language Detector 3
Google’s Compact Language Detector 3 is a neural network model for language identification and the successor of ‘cld2’ (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from ‘cld2’. See <https://…/cld3#readme> for more information.
cleanEHR The Critical Care Clinical Data Processing Tools
A toolset to deal with the Critical Care Health Informatics Collaborative dataset. It is created to address various data reliability and accessibility problems of electronic healthcare records (EHR). It provides a unique platform which enables data manipulation, transformation, reduction, anonymisation, cleaning and validation.
cleanerR How to Handle your Missing Data
How to deal with missing data?Based on the concept of almost functional dependencies, a method is proposed to fill missing data, as well as help you see what data is missing. The user can specify a measure of error and how many combinations he wish to test the dependencies against, the closer to the length of the dataset, the more precise. But the higher the number, the more time it will take for the process to finish. If the program cannot predict with the accuracy determined by the user it shall not fill the data, the user then can choose to increase the error or deal with the data another way.
cleanNLP A Tidy Data Model for Natural Language Processing
Provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes the Stanford’s CoreNLP library. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. Several datasets containing token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses. Currently supports parsing text in English, French, German, and Spanish.
cleanr Helps You to Code Cleaner
Check your R code for some of the most common layout flaws. Many tried to teach us how to write code less dreadful, be it implicitly as B. W. Kernighan and D. M. Ritchie (1988) <ISBN:0-13-110362-8> in ‘The C Programming Language’ did, be it explicitly as R.C. Martin (2008) <ISBN:0-13-235088-2> in ‘Clean Code: A Handbook of Agile Software Craftsmanship’ did. So we should check our code for files too long or wide, functions with too many lines, too wide lines, too many arguments or too many levels of nesting. Note: This is not a static code analyzer like pylint or the like. Checkout https://…/lintr instead.
clespr Composite Likelihood Estimation for Spatial Data
Composite likelihood approach is implemented to estimating statistical models for spatial ordinal and proportional data based on Feng et al. (2014) <doi:10.1002/env.2306>. Parameter estimates are identified by maximizing composite log-likelihood functions using the limited memory BFGS optimization algorithm with bounding constraints, while standard errors are obtained by estimating the Godambe information matrix.
clhs Conditioned Latin Hypercube Sampling
Conditioned Latin hypercube sampling, as published by Minasny and McBratney (2006) <DOI:10.1016/j.cageo.2005.12.009>. This method proposes to stratify sampling in presence of ancillary data. An extension of this method, which propose to associate a cost to each individual and take it into account during the optimisation process, is also proposed (Roudier et al., 2012, <DOI:10.1201/b12728>).
cli Helpers for Developing Command Line Interfaces
A suite of tools designed to build attractive command line interfaces (‘CLIs’). Includes tools for drawing rules, boxes, trees, and ‘Unicode’ symbols with ‘ASCII’ alternatives.
cliapp Create Rich Command Line Applications
Create rich command line applications, with colors, headings, lists, alerts, progress bars, etc. It uses CSS for custom themes.
clickR Fix Data and Create Report Tables from Different Objects
Fixes data errors in numerical, factor and date variables and performs report tables from models and summaries.
clikcorr Censoring Data and Likelihood-Based Correlation Estimation
A profile likelihood based method of estimation and inference on the correlation coefficient of bivariate data with different types of censoring and missingness.
climbeR Calculate Average Minimal Depth of a Maximal Subtree for ‘ranger’ Package Forests
Calculates first, and second order, average minimal depth of a maximal subtree for a forest object produced by the R ‘ranger’ package. This variable importance metric is implemented as described in Ishwaran et. al. (‘High-Dimensional Variable Selection for Survival Data’, March 2010, <doi:10.1198/jasa.2009.tm08622>).
clinDR Simulation and Analysis Tools for Clinical Dose Response Modeling
Bayesian and ML Emax model fitting, graphics and simulation for clinical dose response. The summary data from the dose response meta-analyses in Thomas, Sweeney, and Somayaji (2014) <doi:10.1080/19466315.2014.924876> and Thomas and Roy (2016) <doi:10.1080/19466315.2016.1256229> are included in the package. The prior distributions for the Bayesian analyses default to the posterior predictive distributions derived from these references.
ClinicalTrialSummary Summary Measures for Clinical Trials with Survival Outcomes
Provides estimates of the several summary measures for clinical trials including the average hazard ratio, the weighted average hazard ratio, the restricted superiority probability ratio, the restricted mean survival difference and the ratio of restricted mean times lost, based on the short-term and long-term hazard ratio model (Yang, 2005 <doi:10.1093/biomet/92.1.1>) which accommodates various non-proportional hazards scenarios. The inference procedures and the asymptotic results for the summary measures are discussed in Yang (2017, pre-print).
ClinReport Statistical Reporting in Clinical Trials
It enables to create easily formatted statistical tables in ‘Microsoft Word’ documents in pretty formats according to ‘clinical standards’. It can be used also outside the scope of clinical trials, for any statistical reporting in ‘Word’. Descriptive tables for quantitative statistics (mean, median, max etc..) and/or qualitative statistics (frequencies and percentages) are available and formatted tables of Least Square Means of Linear Models, Linear Mixed Models and Generalized Linear Mixed Models coming from emmeans() function are also available. The package works with ‘officer’ and ‘flextable’ packages to export the outputs into ‘Microsoft Word’ documents.
clipr Read and Write from the System Clipboard
Simple utility functions to read from and write to the system clipboards of Windows, OS X, and Linux.
clisymbols Unicode Symbols at the R Prompt
A small subset of Unicode symbols, that are useful when building command line applications. They fall back to alternatives on terminals that do not support Unicode. Many symbols were taken from the ‘figures’ ‘npm’ package (see https://…/figures ).
CLME Constrained Inference for Linear Mixed Effects Models
Constrained inference for linear mixed effects models using residual bootstrap methodology
clogitboost Boosting Conditional Logit Model
A set of functions to fit a boosting conditional logit model.
clogitL1 Fitting Exact Conditional Logistic Regression with Lasso and Elastic Net Penalties
Tools for the fitting and cross validation of exact conditional logistic regression models with lasso and elastic net penalties. Uses cyclic coordinate descent and warm starts to compute the entire path efficiently.
clogitLasso Lasso Estimation of Conditional Logistic Regression Models
Fit a sequence of conditional logistic regression models with lasso, for small to large sized samples.
clordr Composite Likelihood Inference for Spatial Ordinal Data with Replications
Composite likelihood parameter estimate and asymptotic covariance matrix are calculated for the spatial ordinal data with replications, where spatial ordinal response with covariate and both spatial exponential covariance within subject and independent and identically distributed measurement error. Parametric bootstrapping is used to estimate the asymptotic standard error and covariance matrix.
cloudml Interface to the Google Cloud Machine Learning Platform
Interface to the Google Cloud Machine Learning Platform <https://…/ml-engine>, which provides cloud tools for training machine learning models.
clr Curve Linear Regression via Dimension Reduction
A new methodology for linear regression with both curve response and curve regressors, which is described in Cho, Goude, Brossat and Yao (2013) <doi:10.1080/01621459.2012.722900> and (2015) <doi:10.1007/978-3-319-18732-7_3>. The key idea behind this methodology is dimension reduction based on a singular value decomposition in a Hilbert space, which reduces the curve regression problem to several scalar linear regression problems.
clubSandwich Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections
Provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models. Several adjustments are incorporated to improve small- sample performance. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple-contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple-contrast hypotheses use an approximation to Hotelling’s T-squared distribution. Methods are provided for a variety of fitted models, including lm(), plm() (from package ‘plm’), gls() and lme() (from ‘nlme’), robu() (from ‘robumeta’), and rma.uni() and (from ‘metafor’).
ClueR CLUster Evaluation (CLUE)
CLUE is an R package for identifying optimal number of clusters in a given time-course dataset clustered by cmeans or kmeans algorithms.
CluMix Clustering and Visualization of Mixed-Type Data
Provides utilities for clustering subjects and variables of mixed data types. Similarities between subjects are measured by Gower’s general similarity coefficient with an extension of Podani for ordinal variables. Similarities between variables are assessed by combination of appropriate measures of association for different pairs of data types. Alternatively, variables can also be clustered by the ‘ClustOfVar’ approach. The main feature of the package is the generation of a mixed-data heatmap. For visualizing similarities between either subjects or variables, a heatmap of the corresponding distance matrix can be drawn. Associations between variables can be explored by a ‘confounderPlot’, which allows visual detection of possible confounding, collinear, or surrogate factors for some variables of primary interest. Distance matrices and dendrograms for subjects and variables can be derived and used for further visualizations and applications.
clusrank Wilcoxon Rank Sum Test for Clustered Data
Non-parametric tests (Wilcoxon rank sum test and Wilcoxon signed rank test) for clustered data.
clust.bin.pair Statistical Methods for Analyzing Clustered Matched Pair Data
Tests, utilities, and case studies for analyzing significance in clustered binary matched-pair data. The central function clust.bin.pair uses one of several tests to calculate a Chi-square statistic. Implemented are the tests Eliasziw, Obuchowski, Durkalski, and Yang with McNemar included for comparison. The utility functions and convert data between various useful formats. Thyroids and psychiatry are the canonical datasets from Obuchowski and Petryshen respectively.
clustDRM Clustering Dose-Response Curves and Fitting Appropriate Models to Them
Functions to identify the pattern of a dose-response curve. Then fit a set of appropriate models to it according to the identified pattern, followed by model averaging to estimate the effective dose.
clustEff Clusters of Effect Curves in Quantile Regression Models
Clustering method to cluster both curves effects, through quantile regression coefficient modeling, and curves in functional data analysis. Sottile G. and Adelfio G. (2017) <https://…/IWSM_2017_V2.pdf>.
cluster Cluster Analysis Extended Rousseeuw et al
Cluster analysis methods. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990).
Cluster.OBeu Cluster Analysis ‘OpenBudgets’
Estimate and return the needed parameters for visualisations designed for ‘OpenBudgets’ <http://…/> data. Calculate cluster analysis measures in budget data of municipalities across Europe, according to the ‘OpenBudgets’ data model. It involves a set of techniques and algorithms used to find and divide the data into groups of similar observations. Also, can be used generally to extract visualisation parameters convert them to ‘JSON’ format and use them as input in a different graphical interface.
ClusterBootstrap Analyze Clustered Data with Generalized Linear Models using the Cluster Bootstrap
Provides functionality for the analysis of clustered data using the cluster bootstrap.
clusterCrit Clustering Indices
Compute clustering validation indices
clusteredinterference Causal Effects from Observational Studies with Clustered Interference
Estimating causal effects from observational studies assuming clustered (or partial) interference. These inverse probability-weighted estimators target new estimands arising from population-level treatment policies. The estimands and estimators are introduced in Barkley et al. (2017) <arXiv:1711.04834>. Optimal Distance-Based Clustering for Multidimensional Data with Sequential Constraint
A dynamic programming algorithm for optimal clustering multidimensional data with sequential constraint. The algorithm minimizes the sum of squares of within-cluster distances. The sequential constraint allows only subsequent items of the input data to form a cluster. The sequential constraint is typically required in clustering data streams or items with time stamps such as video frames, GPS signals of a vehicle, movement data of a person, e-pen data, etc. The algorithm represents an extension of Ckmeans.1d.dp to multiple dimensional spaces. Similarly to the one-dimensional case, the algorithm guarantees optimality and repeatability of clustering. Method can find the optimal clustering if the number of clusters is known. Otherwise, methods and can be used.
clustermq Evaluate Function Calls on HPC Schedulers (LSF, SGE, SLURM)
Provides the Q() function to send arbitrary function calls to workers on HPC schedulers without relying on network-mounted storage. Allows using remote schedulers via SSH.
ClusterR Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans and K-Medoids Clustering
Gaussian mixture models, k-means, mini-batch-kmeans and k-medoids clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of ‘RcppArmadillo’ to speed up the computationally intensive parts of the functions.
ClusterRankTest Rank Tests for Clustered Data
Nonparametric rank based tests (rank-sum tests and signed-rank tests) for clustered data, especially useful for clusters having informative cluster size and intra-cluster group size.
ClusterStability Assessment of Stability of Individual Object or Clusters in Partitioning Solutions
Allows one to assess the stability of individual objects, clusters and whole clustering solutions based on repeated runs of the K-means and K-medoids partitioning algorithms.
clustertend Check the Clustering Tendency
Calculate some statistics aiming to help analyzing the clustering tendency of given data. In the first version, Hopkins’ statistic is implemented.
clustMixType k-Prototypes Clustering for Mixed Variable-Type Data
Functions to perform k-prototypes partitioning clustering for mixed variable-type data according to Z.Huang (1998): Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Variables, Data Mining and Knowledge Discovery 2, 283-304, <DOI:10.1023/A:1009769707641>.
ClustMMDD Variable Selection in Clustering by Mixture Models for Discrete Data
An implementation of a variable selection procedure in clustering by mixture of multinomial models for discrete data. Genotype data are examples of such data with two unordered observations (alleles) at each locus for diploid individual. The two-fold problem is seen as a model selection problem where competing models are characterized by the number of clusters K, and the subset S of clustering variables. Competing models are compared by penalized maximum likelihood criteria. We considered asymptotic criteria such as Akaike and Bayesian Information criteria, and a family of penalized criteria with penalty function to be data driven calibrated.
clustRcompaR Easy Interface for Clustering a Set of Documents and Exploring Group- Based Patterns
Provides an interface to perform cluster analysis on a corpus of text. Interfaces to Quanteda to assemble text corpuses easily. Deviationalizes text vectors prior to clustering using technique described by Sherin (Sherin, B. [2013]. A computational study of commonsense science: An exploration in the automated analysis of clinical interview data. Journal of the Learning Sciences, 22(4), 600-638. Chicago. http://…/10508406.2013.836654 ). Uses cosine similarity as distance metric for two stage clustering process, involving Ward’s algorithm hierarchical agglomerative clustering, and k-means clustering. Selects optimal number of clusters to maximize ‘variance explained’ by clusters, adjusted by the number of clusters. Provides plotted output of clustering results as well as printed output. Assesses ‘model fit’ of clustering solution to a set of preexisting groups in dataset.
clustree Visualise Clusterings at Different Resolutions
Deciding what resolution to use can be a difficult question when approaching a clustering analysis. One way to approach this problem is to look at how samples move as the number of clusters increases. This package allows you to produce clustering trees, a visualisation for interrogating clusterings as resolution increases.
clustringr Cluster Strings by Edit-Distance
Returns an edit-distance based clusterization of an input vector of strings. Each cluster will contain a set of strings w/ small mutual edit-distance (e.g., Levenshtein, optimum-sequence-alignment, Damerau-Levenshtein), as computed by stringdist::stringdist(). The set of all mutual edit-distances is then used by graph algorithms (from package ‘igraph’) to single out subsets of high connectivity.
CLUSTShiny Interactive Document for Working with Cluster Analysis
An interactive document on the topic of cluster analysis using ‘rmarkdown’ and ‘shiny’ packages. Runtime examples are provided in the package function as well as at <https://…/>.
ClustVarLV Clustering of Variables Around Latent Variables
The clustering of variables is a strategy for deciphering the underlying structure of a data set. Adopting an exploratory data analysis point of view, the Clustering of Variables around Latent Variables (CLV) approach has been proposed by Vigneau and Qannari (2003). Based on a family of optimization criteria, the CLV approach is adaptable to many situations. In particular, constraints may be introduced in order to take account of additional information about the observations and/or the variables. In this paper, the CLV method is depicted and the R package ClustVarLV including a set of functions developed so far within this framework is introduced. Considering successively different types of situations, the underlying CLV criteria are detailed and the various functions of the package are illustrated using real case studies.
cmaesr Covariance Matrix Adaption Evolutionary Strategy
Pure R implementation of the Covariance Matrix Adaption – Evolution Strategy (CMA-ES) with optional restarts (IPOP-CMA-ES).
CMatching Matching Algorithms for Causal Inference with Clustered Data
Provides functions to perform matching algorithms for causal inference with clustered data, as described in B. Arpino and M. Cannas (2016) <doi:10.1002/sim.6880>. Pure within-cluster and preferential-within cluster matching are implemented. Both algorithms provide causal estimates with cluster-adjusted estimates of standard errors.
cmce Computer Model Calibration for Deterministic and Stochastic Simulators
Implements the Bayesian calibration model described in Pratola and Chkrebtii (2018) <DOI:10.5705/ss.202016.0403> for stochastic and deterministic simulators. Additive and multiplicative discrepancy models are currently supported. See <http://…/software> for more information and examples.
cmenet Bi-Level Selection of Conditional Main Effects
Provides functions for implementing cmenet – a bi-level variable selection method for conditional main effects (see Mak and Wu (2018) <doi:10.1080/01621459.2018.1448828>). CMEs are reparametrized interaction effects which capture the conditional impact of a factor at a fixed level of another factor. Compared to traditional two-factor interactions, CMEs quantify more interpretable interaction effects in many problems of interest (e.g., genomics, molecular engineering, personalized medicine). The current implementation performs variable selection on only binary CMEs, but we are working on an extension for the continuous setting. This work was supported by USARO grant W911NF-14-1-0024.
cmfilter Coordinate-Wise Mediation Filter
Functions to discover, plot, and select multiple mediators from an x -> M -> y linear system. This exploratory mediation analysis is performed using the Coordinate-wise Mediation Filter as introduced by Van Kesteren and Oberski (2019) <arXiv: 1810.06334>.
CMLS Constrained Multivariate Least Squares
Solves multivariate least squares (MLS) problems subject to constraints on the coefficients, e.g., non-negativity, orthogonality, equality, inequality, monotonicity, unimodality, smoothness, etc. Includes flexible functions for solving MLS problems subject to user-specified equality and/or inequality constraints, as well as a wrapper function that implements 24 common constraint options. Also does k-fold or generalized cross-validation to tune constraint options for MLS problems. See ten Berge (1993, ISBN:9789066950832) for an overview of MLS problems, and see Goldfarb and Idnani (1983) <doi:10.1007/BF02591962> for a discussion of the underlying quadratic programming algorithm.
CMplot Circle Manhattan Plot
To visualize the results of Genome-Wide Association Study, Manhattan plot was born. However, it will take much time to draw an elaborate one. Here, this package gives a function named ‘CMplot’ can easily solve the problem. Inputting the results of GWAS and adjusting certain parameters, users will obtain the desired Manhattan plot. Also, a circle Manhattan plot is first put forward, which demonstrates multiple traits in one circle plot. A more visualized figure can spare the length of a paper and lift the paper to a higher level.
cmprskQR Analysis of Competing Risks Using Quantile Regressions
Estimation, testing and regression modeling of subdistribution functions in competing risks using quantile regressions, as described in Peng and Fine (2009) <DOI:10.1198/jasa.2009.tm08228>.
cna A Package for Coincidence Analysis (CNA)
Provides functions for performing Coincidence Analysis (CNA).
cnbdistr Conditional Negative Binomial Distribution
Provided R functions for working with the Conditional Negative Binomial distribution.
CNLTreg Complex-Valued Wavelet Lifting for Signal Denoising
Implementations of recent complex-valued wavelet shrinkage procedures for smoothing irregularly sampled signals.
CNVScope A Versatile Toolkit for Copy Number Variation Relationship Data Analysis and Visualization
Provides the ability to create interaction maps, discover CNV map domains (edges), gene annotate interactions, and create interactive visualizations of these CNV interaction maps.
coalitions Coalition Probabilities in Multi-Party Democracies
An implementation of a MCMC method to calculate probabilities for a coalition majority based on survey results, see Bender and Bauer (2018) <doi:10.21105/joss.00606>.
cobalt Covariate Balance Tables and Plots
Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with ‘MatchIt’, ‘twang’, ‘Matching’, and ‘CBPS’ for assessing balance on the output of their preprocessing functions. Users can also specify their data not generated through the above packages.
cobiclust Biclustering via Latent Block Model Adapted to Overdispersed Count Data
Implementation of a probabilistic method for biclustering adapted to overdispersed count data. It is a Gamma-Poisson Latent Block Model. It also implements two selection criteria in order to select the number of biclusters.
cocor Comparing Correlations
Statistical tests for the comparison between two correlations based on either independent or dependent groups. Dependent correlations can either be overlapping or nonoverlapping. A web interface is available on the website A plugin for the R GUI and IDE RKWard is included. Please install RKWard from to use this feature. The respective R package ‘rkward’ cannot be installed directly from a repository, as it is a part of RKWard.
cocoreg Extracts Shared Variation in Collections of Datasets Using Regression Models
The cocoreg algorithm extracts shared variation from a collection of datasets using regression models.
coda.base A Basic Set of Functions for Compositional Data Analysis
A minimum set of functions to perform compositional data analysis using the log-ratio approach introduced by John Aitchison in 1982. Main functions have been implemented in c++ for better performance.
cOde Automated C Code Generation for Use with the ‘deSolve’ and ‘bvpSolve” Packages
Generates all necessary C functions allowing the user to work with the compiled-code interface of ode() and bvptwp(). The implementation supports “forcings” and “events”. The package also provides functions to symbolically compute Jacobians, sensitivity equations and adjoint sensitivities being the basis for sensitivity analysis.
codebook Automatic Codebooks from Survey Metadata Encoded in Attributes
Easily automate the following tasks to describe data frames: computing reliabilities (internal consistencies, retest, multilevel) for psychological scales, summarise the distributions of scales and items graphically and using descriptive statistics, combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on ‘rmarkdown’ partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.).
CodeDepends Analysis of R Code for Reproducible Research and Code Comprehension
Tools for analyzing R expressions or blocks of code and determining the dependencies between them. It focuses on R scripts, but can be used on the bodies of functions. There are many facilities including the ability to summarize or get a high-level view of code, determining dependencies between variables, code improvement suggestions.
codemetar Generate ‘CodeMeta’ Metadata for R Packages
The ‘Codemeta’ Project defines a ‘JSON-LD’ format for describing software metadata, as detailed at <>. This package provides utilities to generate, parse, and modify ‘codemeta.json’ files automatically for R packages, as well as tools and examples for working with ‘codemeta.json’ ‘JSON-LD’ more generally.
codified Produce Standard/Formalized Demographics Tables
Augment clinical data with metadata to create output used in conventional publications and reports.
CoDiNA Co-Expression Differential Network Analysis
Categorize links from multiple networks in 3 categories: Common links (alpha) specific links (gamma), and different links (beta). Also categorizes the links into sub-categories and groups. The package includes a visualization tool for the networks. More information about the methodology can be found at: Gysi et. al., 2018 <arXiv:1802.00828>.
codingMatrices Alternative Factor Coding Matrices for Linear Model Formulae
A collection of coding functions as alternatives to the standard functions in the stats package, which have names starting with ‘contr.’. Their main advantage is that they provide a consistent method for defining marginal effects in multi-way factorial models. In a simple one-way ANOVA model the intercept term is always the simple average of the class means.
codyn Community Dynamics Metrics
A toolbox of ecological community dynamics metrics that are explicitly temporal. Functions fall into two categories: temporal diversity indices and community stability metrics. The diversity indices are temporal analogs to traditional diversity indices such as richness and rank-abundance curves. Specifically, functions are provided to calculate species turnover, mean rank shifts, and lags in community similarity between time points. The community stability metrics calculate overall stability and patterns of species covariance and synchrony over time.
cofeatureR Generate Cofeature Matrices
Generate cofeature (feature by sample) matrices. The package utilizies ggplot2::geom_tile to generate the matrix allowing for easy additions from the base matrix.
CoFRA Complete Functional Regulation Analysis
Calculates complete functional regulation analysis and visualize the results in a single heatmap. The provided example data is for biological data but the methodology can be used for large data sets to compare quantitative entities that can be grouped. For example, a store might divide entities into cloth, food, car products etc and want to see how sales changes in the groups after some event. The theoretical background for the calculations are provided in New insights into functional regulation in MS-based drug profiling, Ana Sofia Carvalho, Henrik Molina & Rune Matthiesen, Scientific Reports, <doi:10.1038/srep18826>.
coga Convolution of Gamma Distributions
Convolution of gamma distributions in R. The convolution of gamma distributions is the sum of series of gamma distributions and all gamma distributions here can have different parameters. This package can calculate density, distribution function and do simulation work.
cogmapr Cognitive Mapping Tools Based on Coding of Textual Sources
Functions for building cognitive maps based on qualitative data. Inputs are textual sources (articles, transcription of qualitative interviews of agents,…). These sources have been coded using relations and are linked to (i) a table describing the variables (or concepts) used for the coding and (ii) a table describing the sources (typology of agents, …). Main outputs are Individual Cognitive Maps (ICM), Social Cognitive Maps (all sources or group of sources) and a list of quotes linked to relations. This package is linked to the work done during the PhD of Frederic M. Vanwindekens (CRA-W / UCL) hold the 13 of May 2014 at University of Louvain in collaboration with the Walloon Agricultural Research Centre (project MIMOSA, MOERMAN fund).
coindeskr Access ‘CoinDesk’ Bitcoin Price Index API
Extract real-time Bitcoin price details by accessing ‘CoinDesk’ Bitcoin price Index API <https://…/>.
cointmonitoR Consistent Monitoring of Stationarity and Cointegrating Relationships
We propose a consistent monitoring procedure to detect a structural change from a cointegrating relationship to a spurious relationship. The procedure is based on residuals from modified least squares estimation, using either Fully Modified, Dynamic or Integrated Modified OLS. It is inspired by Chu et al. (1996) <DOI:10.2307/2171955> in that it is based on parameter estimation on a pre-break ‘calibration’ period only, rather than being based on sequential estimation over the full sample. See the discussion paper <DOI:10.2139/ssrn.2624657> for further information. This package provides the monitoring procedures for both the cointegration and the stationarity case (while the latter is just a special case of the former one) as well as printing and plotting methods for a clear presentation of the results.
cointReg Parameter Estimation and Inference in a Cointegrating Regression
Cointegration methods are widely used in empirical macroeconomics and empirical finance. It is well known that in a cointegrating regression the ordinary least squares (OLS) estimator of the parameters is super-consistent, i.e. converges at rate equal to the sample size T. When the regressors are endogenous, the limiting distribution of the OLS estimator is contaminated by so-called second order bias terms, see e.g. Phillips and Hansen (1990) <DOI:10.2307/2297545>. The presence of these bias terms renders inference difficult. Consequently, several modifications to OLS that lead to zero mean Gaussian mixture limiting distributions have been proposed, which in turn make standard asymptotic inference feasible. These methods include the fully modified OLS (FM-OLS) approach of Phillips and Hansen (1990) <DOI:10.2307/2297545>, the dynamic OLS (D-OLS) approach of Phillips and Loretan (1991) <DOI:10.2307/2298004>, Saikkonen (1991) <DOI:10.1017/S0266466600004217> and Stock and Watson (1993) <DOI:10.2307/2951763> and the new estimation approach called integrated modified OLS (IM-OLS) of Vogelsang and Wagner (2014) <DOI:10.1016/j.jeconom.2013.10.015>. The latter is based on an augmented partial sum (integration) transformation of the regression model. IM-OLS is similar in spirit to the FM- and D-OLS approaches, with the key difference that it does not require estimation of long run variance matrices and avoids the need to choose tuning parameters (kernels, bandwidths, lags). However, inference does require that a long run variance be scaled out. This package provides functions for the parameter estimation and inference with all three modified OLS approaches. That includes the automatic bandwidth selection approaches of Andrews (1991) <DOI:10.2307/2938229> and of Newey and West (1994) <DOI:10.2307/2297912> as well as the calculation of the long run variance.
colf Constrained Optimization on Linear Function
Performs least squares constrained optimization on a linear objective function. It contains a number of algorithms to choose from and offers a formula syntax similar to lm().
CollapsABEL Generalized CDH (GCDH) Analysis
Implements a generalized version of the CDH test <DOI:10.1371/journal.pone.0028145> for detecting compound heterozygosity on a genome-wide level, due to usage of generalized linear models it allows flexible analysis of binary and continuous traits with covariates.
CollapseLevels Collapses Levels, Computes Information Value and WoE
Provides functions to collapse levels of an attribute based on response rates. It also provides functions to compute and display information value, and weight of evidence (WoE) for the attributes, and to convert numeric variables to categorical ones by binning. These functions only work for binary classification problems.
collapsibleTree Interactive Collapsible Tree Diagrams using ‘D3.js’
Interactive Reingold-Tilford tree diagrams created using ‘D3.js’, where every node can be expanded and collapsed by clicking on it. Tooltips and color gradients can be mapped to nodes using a numeric column in the source data frame. See ‘collapsibleTree’ website for more information and examples.
collectArgs Quickly and Neatly Collect Arguments from One Environment to Pass to Another
We often want to take all (or most) of the objects in one environment (such as the parameter values of a function) and pass them to another. This might be calling a second function, or iterating over a list, calling the same function. These functions wrap often repeated code. Current stable version (committed on October 14, 2017).
collections High Performance Container Data Types
Provides high performance container data types such as Queue, Stack, Deque, Dict and OrderedDict. Benchmarks <https://…/benchmark.html> have shown that these containers are asymptotically more efficient than those offered by other packages.
collector Quantified Risk Assessment Data Collection
An open source process for collecting quantified data inputs from subject matter experts. Intended for feeding into an OpenFAIR analysis <https://…/C13K> using a tool such as ‘evaluator’ <>.
collidr Check for Namespace Collisions with Other Packages and Functions on CRAN
Check for namespace collisions between a string input (your function or package name) and a quarter of a million packages and functions on CRAN.
collpcm Collapsed Latent Position Cluster Model for Social Networks
Markov chain Monte Carlo based inference routines for collapsed latent position cluster models or social networks, which includes searches over the model space (number of clusters in the latent position cluster model). The label switching algorithm used is that of Nobile and Fearnside (2007) <doi:10.1007/s11222-006-9014-7> which relies on the algorithm of Carpaneto and Toth (1980) <doi:10.1145/355873.355883>.
collUtils Auxiliary Package for Package ‘CollapsABEL’
Provides some low level functions for processing PLINK input and output files.
coloredICA Implementation of Colored Independent Component Analysis and Spatial Colored Independent Component Analysis
It implements colored Independent Component Analysis (Lee et al., 2011) and spatial colored Independent Component Analysis (Shen et al., 2014). They are two algorithms to perform ICA when sources are assumed to be temporal or spatial stochastic processes, respectively.
colorednoise Simulate Temporally Autocorrelated Population Time Series
Temporally autocorrelated populations are correlated in their vital rates (growth, death, etc.) from year to year. It is very common for populations, whether they be bacteria, plants, or humans, to be temporally autocorrelated. This poses a challenge for stochastic population modeling, because a temporally correlated population will behave differently from an uncorrelated one. This package provides tools for simulating populations with white noise (no temporal autocorrelation), red noise (positive temporal autocorrelation), and blue noise (negative temporal autocorrelation). The algebraic formulation for autocorrelated noise comes from Ruokolainen et al. (2009) <doi:10.1016/j.tree.2009.04.009>. The simulations are based on an assumption of an asexually reproducing population, but it can also be used to simulate females of a sexually reproducing species.
colorfindr Extract Colors from Windows BMP, JPEG, PNG, TIFF, and SVG Format Images
Extracts colors from various image types, returns customized reports and plots treemaps of image compositions. Selected colors and color ranges can be excluded from the analysis.
ColorPalette Color Palettes Generator
Different methods to generate a color palette based on a specified base color and a number of colors that should be created.
colorpatch Optimized Rendering of Fold Changes and Confidence Values
Shows color patches for encoding fold changes (e.g. log ratios) together with confidence values within a single diagram. This is especially useful for rendering gene expression data as well as other types of differential experiments. In addition to different rendering methods (ggplot extensions) functionality for perceptually optimizing color palettes are provided. Furthermore the package provides extension methods of the colorspace color-class in order to simplify the work with palettes (a.o. length, as.list, and append are supported).
colorplaner A ggplot2 Extension to Visualize Two Variables per Color Aesthetic Through Color Space Projections
A ggplot2 extension to visualize two variables through one color aesthetic via mapping to a color space projection. With this technique for 2-D color mapping, one can create a dichotomous choropleth in R as well as other visualizations with bivariate color scales. Includes two new scales and a new guide for ggplot2.
colorscience Color Science Methods and Data
Methods and data for color science – color conversions by observer, illuminant and gamma. Color matching functions and chromaticity diagrams. Color indices, color differences and spectral data conversion/analysis.
colorspace Color Space Manipulation
Carries out mapping between assorted color spaces including RGB, HSV, HLS, CIEXYZ, CIELUV, HCL (polar CIELUV), CIELAB and polar CIELAB. Qualitative, sequential, and diverging color palettes based on HCL colors are provided.
colorSpec Color Calculations with Emphasis on Spectral Data
Calculate with spectral properties of light sources, materials, cameras, eyes, and scanners. Build complex systems from simpler parts using a spectral product algebra. For light sources, compute CCT and CRI. For object colors, compute optimal colors and Logvinenko coordinates. Work with the standard CIE illuminants and color matching functions, and read spectra from text files, including CGATS files. Sample text files, and 4 vignettes are included.
colourpicker A Colour Picker Widget for Shiny Apps, RStudio, R-markdown, and ‘htmlwidgets’
A colour picker that can be used as an input in Shiny apps or R-markdown documents. A colour picker RStudio addin is provided to let you select colours for use in your R code. The colour picker is also availble as an ‘htmlwidgets’ widget.
colr Functions to Select and Rename Data
Powerful functions to select and rename columns in dataframes, lists and numeric types by ‘Perl’ regular expression. Regular expression (‘regex’) are a very powerful grammar to match strings, such as column names.
Combine Game-Theoretic Probability Combination
Suite of R functions for combination of probabilities using a game-theoretic method.
combiter Combinatorics Iterators
Provides iterators for combinations, permutations, and subsets, which allow one to go through all elements without creating a huge set of all possible values.
cometExactTest Exact Test from the Combinations of Mutually Exclusive Alterations (CoMEt) Algorithm
An algorithm for identifying combinations of mutually exclusive alterations in cancer genomes. CoMEt represents the mutations in a set M of k genes with a 2^k dimensional contingency table, and then computes the tail probability of observing T(M) exclusive alterations using an exact statistical test.
commonmark Bindings to the ‘CommonMark’ Reference Implementation
The ‘CommonMark’ spec is a rationalized version of Markdown syntax. This package converts markdown text to various formats including a parse tree in XML format.
commonsMath JAR Files of the Apache Commons Mathematics Library
Java JAR files for the Apache Commons Mathematics Library for use by users and other packages.
COMMUNAL Robust Selection of Cluster Number K
Facilitates optimal clustering of a data set. Provides a framework to run a wide range of clustering algorithms to determine the optimal number (k) of clusters in the data. Then analyzes the cluster assignments from each clustering algorithm to identify samples that repeatedly classify to the same group. We call these ‘core clusters’, providing a basis for later class discovery.
comorbidity Computing Comorbidity Scores
Computing comorbidity scores such as the weighted Charlson score (Charlson, 1987 <doi:10.1016/0021-9681(87)90171-8>) and the Elixhauser comorbidity score (Elixhauser, 1998 <doi:10.1097/00005650-199801000-00004>) using ICD-10 codes (Quan, 2005 <doi:10.1097/01.mlr.0000182534.19832.83>).
CompareCausalNetworks Interface to Diverse Estimation Methods of Causal Networks
Unified interface for the estimation of causal networks, including the methods ‘backShift’ (from package ‘backShift’), ‘bivariateANM’ (bivariate additive noise model), ‘bivariateCAM’ (bivariate causal additive model), ‘CAM’ (causal additive model) (from package ‘CAM’), ‘hiddenICP’ (invariant causal prediction with hidden variables), ‘ICP’ (invariant causal prediction) (from package ‘InvariantCausalPrediction’), ‘GES’ (greedy equivalence search), ‘GIES’ (greedy interventional equivalence search), ‘LINGAM’, ‘PC’ (PC Algorithm), ‘RFCI’ (really fast causal inference) (all from package ‘pcalg’) and regression.
compareDF Do a Git Style Diff of the Rows Between Two Dataframes with Similar Structure
Compares two dataframes which have the same column structure to show the rows that have changed. Also gives a git style diff format to quickly see what has changes in addition to summary statistics.
compareGroups Descriptive Analysis by Groups
Create data summaries for quality control, extensive reports for exploring data, as well as publication-ready univariate or bivariate tables in several formats (plain text, HTML,LaTeX, PDF, Word or Excel. Create figures to quickly visualise the distribution of your data (boxplots, barplots, normality-plots, etc.). Display statistics (mean, median, frequencies, incidences, etc.). Perform the appropriate tests (t-test, Analysis of variance, Kruskal-Wallis, Fisher, log-rank, …) depending on the nature of the described variable (normal, non-normal or qualitative). Summarize genetic data (Single Nucleotide Polymorphisms) data displaying Allele Frequencies and performing Hardy-Weinberg Equilibrium tests among other typical statistics and tests for these kind of data.
comparer Compare Output and Run Time
Makes comparisons quickly for different functions or code blocks performing the same task with the function mbc(). Can be used to compare model fits to the same data or see which function runs faster.
compboost C++ Implementation of Component-Wise Boosting
C++ implementation of component-wise boosting implementation of component-wise boosting written in C++ to obtain high runtime performance and full memory control. The main idea is to provide a modular class system which can be extended without editing the source code. Therefore, it is possible to use R functions as well as C++ functions for custom base-learners, losses, logging mechanisms or stopping criteria.
CompDist Multisection Composite Distributions
Computes density function, cumulative distribution function, quantile function and random numbers for a multisection composite distribution specified by the user. Also fits the user specified distribution to a given data set. More details of the package can be found in the following paper submitted to the R journal Wiegand M and Nadarajah S (2017) CompDist: Multisection composite distributions.
comperes Manage Competition Results
Tools for storing and managing competition results. Competition is understood as a set of games in which players gain some abstract scores. There are two ways for storing results: in long (one row per game-player) and wide (one row per game with fixed amount of players) formats. This package provides functions for creation and conversion between them. Also there are functions for computing their summary and Head-to-Head values for players. They leverage grammar of data manipulation from ‘dplyr’.
compete Analyzing Social Hierarchies
Organizing and Analyzing Social Dominance Hierarchy Data.
CompetingRisk The Semi-Parametric Cumulative Incidence Function
Computing the point estimator and pointwise confidence interval of the cumulative incidence function from the cause-specific hazards model.
Compind Composite indicators functions
Compind package contains several functions to enhance approaches to the Composite Indicators (http://…/detail.asp?ID=6278 , ) methods, focusing, in particular, on the normalisation and weighting-aggregation steps.
compLasso Implements the Component Lasso Method Functions
Implements the Component lasso method for linear regression using the sample covariance matrix connected-components structure, described in A Component Lasso, by Hussami and Tibshirani (2013)
complexity Calculate the Proportion of Permutations in Line with an Informative Hypothesis
Allows for the easy computation of complexity: the proportion of the parameter space in line with the hypothesis by chance.
Compositional Compositional Data Analysis
A collection of R functions for compositional data analysis.
compositions Compositional Data Analysis
The package provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by Aitchison and Pawlowsky-Glahn.
CompR Paired Comparison Data Analysis
Different tools for describing and analysing paired comparison data are presented. Main methods are estimation of products scores according Bradley Terry Luce model. A segmentation of the individual could be conducted on the basis of a mixture distribution approach. The number of classes can be tested by the use of Monte Carlo simulations. This package deals also with multi-criteria paired comparison data.
Conake Continuous Associated Kernel Estimation
Continuous smoothing of probability density function on a compact or semi-infinite support is performed using four continuous associated kernels: extended beta, gamma, lognormal and reciprocal inverse Gaussian. The cross-validation technique is also implemented for bandwidth selection.
concatenate Human-Friendly Text from Unknown Strings
Simple functions for joining strings. Construct human-friendly messages whose elements aren’t known in advance, like in stop, warning, or message, from clean code.
concaveman A Very Fast 2D Concave Hull Algorithm
The concaveman function ports the ‘concaveman’ (<https://…/concaveman> ) library from ‘mapbox’. It computes the concave polygon(s) for one or several set of points.
conclust Pairwise Constraints Clustering
There are 3 main functions in this package: ckmeans(), lcvqe() and mpckm(). They take an unlabeled dataset and two lists of must-link and cannot-link constraints as input and produce a clustering as output.
concordance Product Concordance
A set of utilities for matching products in different classification codes used in international trade research. It supports concordance between HS (Combined), ISIC Rev. 2,3, and SITC1,2,3,4 product classification codes, as well as BEC, NAICS, and SIC classifications. It also provides code nomenclature / descriptions look-up, Rauch classification look-up (via concordance to SITC2) and trade elasticity look-up (via concordance to SITC2/3 or
condformat Conditional Formatting in Data Frames
Apply and visualize conditional formatting to data frames in R. It presents a data frame as an HTML table with cells CSS formatted according to criteria defined by rules, using a syntax similar to ‘ggplot2’. The table is printed either opening a web browser or within the ‘RStudio’ viewer if available. The conditional formatting rules allow to highlight cells matching a condition or add a gradient background to a given column based on a column values.
CondIndTests Nonlinear Conditional Independence Tests
Code for a variety of nonlinear conditional independence tests: Kernel conditional independence test (Zhang et al., UAI 2011, <arXiv:1202.3775>), Residual Prediction test (based on Shah and Buehlmann, <arXiv:1511.03334>), Invariant environment prediction, Invariant target prediction, Invariant residual distribution test, Invariant conditional quantile prediction (all from Heinze-Deml et al., <arXiv:1706.08576>).
condir Computation of P Values and Bayes Factors for Conditioning Data
Set of functions for the easy analyses of conditioning data.
conditions Standardized Conditions for R
Implements specialized conditions, i.e., typed errors, warnings and messages. Offers a set of standardized conditions (value error, deprecated warning, io message, …) in the fashion of Python’s built-in exceptions.
conditionz Control How Many Times Conditions are Thrown
Provides ability to control how many times in function calls conditions are thrown (shown to the user). Includes control of warnings and messages.
condSURV Estimation of the Conditional Survival Function for Ordered Multivariate Failure Time Data
Method to implement some newly developed methods for the estimation of the conditional survival function.
condusco Query-Driven Pipeline Execution and Query Templates
Runs a function iteratively over each row of either a dataframe or the results of a query. Use the ‘BigQuery’ and ‘DBI’ wrappers to iteratively pass each row of query results to a function. If a field contains a ‘JSON’ string, it will be converted to an object. This is helpful for queries that return ‘JSON’ strings that represent objects. These fields can then be treated as objects by the pipeline.
condvis Conditional Visualization for Statistical Models
Exploring fitted model structures by interactively taking 2-D and 3-D sections in data space.
conf Plotting Two-Dimensional Confidence Regions
Plots the two-dimensional confidence region for probability distribution (Weibull or inverse Gaussian) parameters corresponding to a user given dataset and level of significance. The crplot() algorithm plots more points in areas of greater curvature to ensure a smooth appearance throughout the confidence region boundary. An alternative heuristic plots a specified number of points at roughly uniform intervals along its boundary. Both heuristics build upon the radial profile log-likelihood ratio technique for plotting two-dimensional confidence regions given by Jaeger (2016) <doi:10.1080/00031305.2016.1182946>.
ConfigParser Package to Parse an INI File, Including Variable Interpolation
Enhances the ‘ini’ package by adding the ability to interpolate variables. The INI configuration file is read into an R6 ConfigParser object (loosely inspired by Pythons ConfigParser module) and the keys can be read, where ‘%(….)s’ instances are interpolated by other included options or outside variables.
configr An Implementation of Parsing and Writing Configuration File (JSON/INI/YAML)
Implements the YAML parser, JSON parser and INI parser for R setting and writing of configuration file. The functionality of this package is similar to that of package ‘config’.
configural Multivariate Profile Analysis
R functions for criterion profile analysis, Davison and Davenport (2002) <doi:10.1037/1082-989X.7.4.468> and meta-analytic criterion profile analysis, Wiernik, Wilmot, Davison, and Ones (2019). Sensitivity analyses to aid in interpreting criterion profile analysis results are also included.
confinterpret Descriptive Interpretations of Confidence Intervals
Produces descriptive interpretations of confidence intervals. Includes (extensible) support for various test types, specified as sets of interpretations dependent on where the lower and upper confidence limits sit.
ConfIntVariance Confidence Interval for the Univariate Population Variance without Normality Assumption
Surrounds the usual sample variance of a univariate numeric sample with a confidence interval for the population variance. This has been done so far only under the assumption that the underlying distribution is normal. Under the hood, this package implements the unique least-variance unbiased estimator of the variance of the sample variance, in a formula that is equivalent to estimating kurtosis and square of the population variance in an unbiased way and combining them according to the classical formula into an estimator of the variance of the sample variance. Both the sample variance and the estimator of its variance are U-statistics. By the theory of U-statistic, the resulting estimator is unique. See Fuchs, Krautenbacher (2016) <doi:10.1080/15598608.2016.1158675> and the references therein for an overview of unbiased estimation of variances of U-statistics.
conformal Conformal Prediction for Regression and Classification
Implementation of conformal prediction using caret models for classification and regression
ConfoundedMeta Sensitivity Analyses for Unmeasured Confounding in Meta-Analyses
Conducts sensitivity analyses for unmeasured confounding in random-effects meta-analysis per Mathur & VanderWeele (in preparation). Given output from a random-effects meta-analysis with a relative risk outcome, computes point estimates and inference for: (1) the proportion of studies with true causal effect sizes more extreme than a specified threshold of scientific significance; and (2) the minimum bias factor and confounding strength required to reduce to less than a specified threshold the proportion of studies with true effect sizes of scientifically significant size. Creates plots and tables for visualizing these metrics across a range of bias values.
confSAM Estimates and Bounds for the False Discovery Proportion, by Permutation
For multiple testing. Computes estimates and confidence bounds for the False Discovery Proportion (FDP), the fraction of false positives among all rejected hypotheses. The methods in the package use permutations of the data. Doing so, they take into account the dependence structure in the data.
Conigrave Flexible Tools for Multiple Imputation
Provides a set of tools that can be used across ‘data.frame’ and ‘imputationList’ objects.
connect3 A Tool for Reproducible Research by Converting ‘LaTeX’ Files Generated by R Sweave to Rich Text Format Files#
Converts ‘LaTeX’ files (with extension ‘.tex’) generated by R Sweave using package ‘knitr’ to Rich Text Format files (with extension ‘.rtf’). Rich Text Format files can be read and written by most word processors.
conover.test Conover-Iman Test of Multiple Comparisons Using Rank Sums
Computes the Conover-Iman test (1979) for stochastic dominance and reports the results among multiple pairwise comparisons after a Kruskal-Wallis test for stochastic dominance among k groups (Kruskal and Wallis, 1952). The interpretation of stochastic dominance requires an assumption that the CDF of one group does not cross the CDF of the other. conover.test makes k(k-1)/2 multiple pairwise comparisons based on Conover-Iman t-test-statistic of the rank differences. The null hypothesis for each pairwise comparison is that the probability of observing a randomly selected value from the first group that is larger than a randomly selected value from the second group equals one half; this null hypothesis corresponds to that of the Wilcoxon-Mann-Whitney rank-sum test. Like the rank-sum test, if the data can be assumed to be continuous, and the distributions are assumed identical except for a difference in location, Conover-Iman test may be understood as a test for median difference. conover.test accounts for tied ranks. The Conover-Iman test is strictly valid if and only if the corresponding Kruskal-Wallis null hypothesis is rejected.
ConSpline Partial Linear Least-Squares Regression using Constrained Splines
Given response y, continuous predictor x, and covariate matrix, the relationship between E(y) and x is estimated with a shape-constrained regression spline. Function outputs fits and various types of inference.
ConsRank Compute the Median Ranking(s) According to the Kemeny’s Axiomatic Approach
Compute the median ranking according the Kemeny’s axiomatic approach. Rankings can or cannot contain ties, rankings can be both complete or incomplete.
constants Reference on Constants, Units and Uncertainty
CODATA internationally recommended values of the fundamental physical constants, provided as symbols for direct use within the R language. Optionally, the values with errors and/or the values with units are also provided if the ‘errors’ and/or the ‘units’ packages are installed. The Committee on Data for Science and Technology (CODATA) is an interdisciplinary committee of the International Council for Science which periodically provides the internationally accepted set of values of the fundamental physical constants. This package contains the ‘2014 CODATA’ version, published on 25 June 2015: Mohr, P. J., Newell, D. B. and Taylor, B. N. (2016) <DOI:10.1103/RevModPhys.88.035009>, <DOI:10.1063/1.4954402>.
constellation Identify Event Sequences Using Time Series Joins
Examine any number of time series data frames to identify instances in which various criteria are met within specified time frames. In clinical medicine, these types of events are often called ‘constellations of signs and symptoms’, because a single condition depends on a series of events occurring within a certain amount of time of each other. This package was written to work with any number of time series data frames and is optimized for speed to work well with data frames with millions of rows.
ContaminatedMixt Model-Based Clustering and Classification with the Multivariate Contaminated Normal Distribution
Fits mixtures of multivariate contaminated normal distributions (with eigen-decomposed scale matrices) via the expectation conditional- maximization algorithm under a clustering or classification paradigm.
ContourFunctions Create Contour Plots from Data or a Function
Provides functions for making contour plots. The contour plot can be created from grid data, a function, or a data set. If non-grid data is given, then a Gaussian process is fit to the data and used to create the contour plot.
controlTest Median Comparison for Two-Sample Right-Censored Survival Data
Nonparametric two-sample procedure for comparing the median survival time.
ConvergenceClubs Finding Convergence Clubs
Functions for clustering regions that form convergence clubs, according to the definition of Phillips and Sul (2009) <doi:10.1002/jae.1080>.
convertGraph Convert Graphical Files Format
Converts graphical file formats (SVG, PNG, JPEG, BMP, GIF, PDF, etc) to one another. The exceptions are the SVG file format that can only be converted to other formats and in contrast, PDF format, which can only be created from others graphical formats. The main purpose of the package was to provide a solution for converting SVG file format to PNG which is often needed for exporting graphical files produced by R widgets.
convertr Convert Between Units
Provides conversion functionality between a broad range of scientific, historical, and industrial unit types.
convexjlr Disciplined Convex Programming in R using Convex.jl
Package convexjlr provides a simple high-level wrapper for Julia package ‘Convex.jl’ (see <https://…/Convex.jl> for more information), which makes it easy to describe and solve convex optimization problems in R. The problems can be dealt with include: linear programs, second-order cone programs, semidefinite programs, exponential cone programs.
convey Income Concentration Analysis with Complex Survey Samples
Variance estimation on indicators of income concentration and poverty using linearized or replication-based survey designs. Wrapper around the survey package.
convoSPAT Convolution-Based Nonstationary Spatial Modeling
Fits convolution-based nonstationary Gaussian process models to point-referenced spatial data. The nonstationary covariance function allows the user to specify the underlying correlation structure and which spatial dependence parameters should be allowed to vary over space: the anisotropy, nugget variance, and process variance. The parameters are estimated via maximum likelihood, using a local likelihood approach. Also provided are functions to fit stationary spatial models for comparison, calculate the kriging predictor and standard errors, and create various plots to visualize nonstationarity.
coop Co-Operation: Fast Covariance, Correlation, and Cosine Similarity Operations
Fast implementations of the co-operations: covariance, correlation, and cosine similarity. The implementations are fast and memory-efficient and their use is resolved automatically based on the input data, handled by R’s S3 methods. Full descriptions of the algorithms and benchmarks are available in the package vignettes.
CoopGame Important Concepts of Cooperative Game Theory
The theory of cooperative games with transferable utility offers useful insights into the way parties can share gains from cooperation and secure sustainable agreements, see e.g. one of the books by Chakravarty, Mitra and Sarkar (2015, ISBN:978-1107058798) or by Driessen (1988, ISBN:978-9027727299) for more details. A comprehensive set of tools for cooperative game theory with transferable utility is provided. Users can create special families of cooperative games, like e.g. bankruptcy games, cost sharing games and weighted voting games. There are functions to check various game properties and to compute five different set-valued solution concepts for cooperative games. A large number of point-valued solution concepts is available reflecting the diverse application areas of cooperative game theory. Some of these point-valued solution concepts can be used to analyze weighted voting games and measure the influence of individual voters within a voting body. There are routines for visualizing both set-valued and point-valued solutions in the case of three or four players.
coopProductGame Cooperative Aspects of Linear Production Programming Problems
Computes cooperative game and allocation rules associated with linear production programming problems.
copCAR Fitting the copCAR Regression Model for Discrete Areal Data
Provides tools for fitting the copCAR regression model for discrete areal data. Three types of estimation are supported: continuous extension, composite marginal likelihood, and distributional transform.
coprimary Sample Size Calculation for Two Primary Time-to-Event Endpoints in Clinical Trials
Computes the required number of patients for two time-to-event end-points as primary endpoint in phase III clinical trial.
coRanking Co-Ranking Matrix
Calculates the co-ranking matrix to assess the quality of a dimensionality reduction.
Corbi Collection of Rudimentary Bioinformatics Tools
Provides a bundle of basic and fundamental bioinformatics tools, such as network querying and alignment.
cord Community Estimation in G-Models via CORD
Partition data points (variables) into communities/clusters, similar to clustering algorithms, such as k-means and hierarchical clustering. This package implements a clustering algorithm based on a new metric CORD, defined for high dimensional parametric or semi-parametric distributions. Read http://…/1508.01939 for more details.
cordillera Calculation of the OPTICS Cordillera
Functions for calculating the OPTICS Cordillera. The OPTICS Cordillera measures the amount of ‘clusteredness’ in a numeric data matrix within a distance-density based framework for a given minimum number of points comprising a cluster, as described in Rusch, Hornik, Mair (2017) <doi:10.1080/10618600.2017.1349664>. There is an R native version and a version that uses ‘ELKI’, with methods for printing, summarizing, and plotting the result. There also is an interface to the reference implementation of OPTICS in ‘ELKI’.
CORE Cores of Recurrent Events
given a collection of intervals with integer start and end positions, find recurrently targeted regions and estimate the significance of finding. Randomization is implemented by parallel methods, either using local host machines, or submitting grid engine jobs.
corehunter Fast and Flexible Core Subset Selection
Interface to the Core Hunter software for core subset selection. Cores can be constructed based on genetic marker data, phenotypic traits, a precomputed distance matrix, or any combination of these. Various measures are included such as Modified Rogers’ distance and Shannon’s diversity index (for genotypes) and Gower’s distance (for phenotypes). Core Hunter can also optimize a weighted combination of multiple measures, to bring the different perspectives closer together.
CORElearn Classification, Regression and Feature Evaluation
This is a suite of machine learning algorithms written in C++ with R interface. It contains several machine learning model learning techniques in classification and regression, for example classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. It is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, DKM. These methods can be used for example to discretize numeric attributes. Its additional strength is OrdEval algorithm and its visualization used for evaluation of data sets with ordinal features and class enabling analysis according to the Kano model. Several algorithms support parallel multithreaded execution via OpenMP. The top-level documentation is reachable through ?CORElearn.
coreSim Core Functionality for Simulating Quantities of Interest from Generalised Linear Models
Core functions for simulating quantities of interest from generalised linear models (GLM). This package will form the backbone of a series of other packages that improve the interpretation of GLM estimates.
corkscrew Preprocessor for Data Modeling
Includes binning categorical variables into lesser number of categories based on t-test, converting categorical variables into continuous features using the mean of the response variable for the respective categories, understanding the relationship between the response variable and predictor variables using data transformations.
corlink Record Linkage, Incorporating Imputation for Missing Agreement Patterns, and Modeling Correlation Patterns Between Fields
A matrix of agreement patterns and counts for record pairs is the input for the procedure. An EM algorithm is used to impute plausible values for missing record pairs. A second EM algorithm, incorporating possible correlations between per-field agreement, is used to estimate posterior probabilities that each pair is a true match – i.e. constitutes the same individual.
CornerstoneR Collection for ‘CornerstoneR’ Interface
Collection of scripts for interface between ‘Cornerstone’ and ‘R’. ‘Cornerstone’ (<https://…/> ) as a software for engineering analytics supports an interface to ‘R’. The scripts are designed to support an easy usage of this interface.
cornet Elastic Net with Dichotomised Outcomes
Implements lasso and ridge regression for dichotomised outcomes (Rauschenberger et al. 2019). Such outcomes are not naturally but artificially binary. They indicate whether an underlying measurement is greater than a threshold.
CorporaCoCo Corpora Co-Occurrence Comparison
A set of functions used to compare co-occurrence between two corpora.
CoRpower Power Calculations for Assessing Correlates of Risk in Clinical Efficacy Trials
Calculates power for assessment of intermediate biomarker responses as correlates of risk in the active treatment group in clinical efficacy trials, as described in Gilbert, Janes, and Huang, Power/Sample Size Calculations for Assessing Correlates of Risk in Clinical Efficacy Trials (2016, Statistics in Medicine). The methods differ from past approaches by accounting for the level of clinical treatment efficacy overall and in biomarker response subgroups, which enables the correlates of risk results to be interpreted in terms of potential correlates of efficacy/protection. The methods also account for inter-individual variability of the observed biomarker response that is not biologically relevant (e.g., due to technical measurement error of the laboratory assay used to measure the biomarker response), which is important because power to detect a specified correlate of risk effect size is heavily affected by the biomarker’s measurement error. The methods can be used for a general binary clinical endpoint model with a univariate dichotomous, trichotomous, or continuous biomarker response measured in active treatment recipients at a fixed timepoint after randomization, with either case-cohort Bernoulli sampling or case-control without-replacement sampling of the biomarker (a baseline biomarker is handled as a trivial special case). In a specified two-group trial design, the computeN() function can initially be used for calculating additional requisite design parameters pertaining to the target population of active treatment recipients observed to be at risk at the biomarker sampling timepoint. Subsequently, the power calculation employs an inverse probability weighted logistic regression model fitted by the tps() function in the ‘osDesign’ package. Power results as well as the relationship between the correlate of risk effect size and treatment efficacy can be visualized using various plotting functions.
corpus Text Corpus Analysis
Text corpus data analysis, with full support for UTF8-encoded Unicode text. The package provides the ability to seamlessly read and process text from large JSON files without holding all of the data in memory simultaneously.
corpustools Managing, Querying and Analyzing Tokenized Text
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.
corr2D Implementation of 2D Correlation Analysis
Implementation of two-dimensional (2D) correlation analysis based on the Fourier-transformation approach described by Isao Noda (I. Noda (1993) <DOI:10.1366/0003702934067694>). Additionally there are two plot functions for the resulting correlation matrix: The first one creates coloured 2D plots, while the second one generates 3D plots.
correctedAUC Correcting AUC for Measurement Error
Correcting area under ROC (AUC) for measurement error based on probit-shift model.
CorrectedFDR Correcting False Discovery Rates
There are many estimators of false discovery rate. In this package we compute the Nonlocal False Discovery Rate (NFDR) and the estimators of local false discovery rate: Corrected False discovery Rate (CFDR), Re-ranked False Discovery rate (RFDR) and the blended estimator. Bickel, D. R. (2016) <http://…/34277>.
corregp Functions and Methods for Correspondence Regression
A collection of tools for correspondence regression, i.e. the correspondence analysis of the crosstabulation of a categorical variable Y in function of another one X, where X can in turn be made up of the combination of various categorical variables. Consequently, correspondence regression can be used to analyze the effects for a polytomous or multinomial outcome variable.
corrr Correlations in R
A tool for exploring correlations. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, or rearranging and visualising the matrix in terms of the strength of the correlations.
CorrToolBox Modeling Correlational Magnitude Transformations in Discretization Contexts
Modeling the correlation transitions under specified distributional assumptions within the realm of discretization in the context of the latency and threshold concepts.
corset Arbitrary Bounding of Series and Time Series Objects
Set of methods to constrain numerical series and time series within arbitrary boundaries.
CorShrink Adaptive Shrinkage of Correlation Vectors and Matrices
Performs adaptive shrinkage of correlation and covariance matrices using a mixture model prior over the Fisher z-transformation of the correlations, Stephens (2016) <doi:10.1093/biostatistics/kxw041> with the method flexible in choosing a separate shrinkage intensity for each cell of the correlation or covariance matrices: it is particularly efficient in handling missing data in the data matrix.
cosa Constrained Optimal Sample Allocation
Implements generalized constrained optimal sample allocation framework for multilevel regression discontinuity studies and multilevel randomized trials with continuous outcomes. Bulus, M. (2017). Design Considerations in Three-level Regression Discontinuity Studies (Doctoral dissertation). University of Missouri, Columbia, MO.
cosinor2 Extended Tools for Cosinor Analysis of Rhythms
Statistical procedures for calculating population-mean cosinor, non-stationary cosinor, estimation of best-fitting period, tests of population rhythm differences and more. See Cornélissen, G. (2014). <doi:10.1186/1742-4682-11-16>.
CoSMoS Complete Stochastic Modelling Solution
A single framework, unifying, extending, and improving a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific ‘parent’ Gaussian process Papalexiou (2018) <doi:10.1016/j.advwatres.2018.02.013>.
costsensitive Cost-Sensitive Multi-Class Classification
Reduction-based techniques for cost-sensitive multi-class classification, in which each observation has a different cost for classifying it into one class, and the goal is to predict the class with the minimum expected cost for each new observation. Implements Weighted All-Pairs (Beygelzimer, A., Langford, J., & Zadrozny, B., 2008, <doi:10.1007/978-0-387-79361-0_1>), Weighted One-Vs-Rest (Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B., 2005, <https://…/citation.cfm?id=1102358> ) and Regression One-Vs-Rest. Works with arbitrary classifiers taking observation weights, or with regressors. Also implements cost-proportionate rejection sampling for working with classifiers that don’t accept observation weights.
CosW The CosW Distribution
Density, distribution function, quantile function, random generation and survival function for the Cosine Weibull Distribution as defined by SOUZA, L. New Trigonometric Class of Probabilistic Distributions. 219 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2015 (available at <http://…obabilistic-distributions-602633.html> ) and BRITO, C. C. R. Method Distributions generator and Probability Distributions Classes. 241 p. Thesis (Doctorate in Biometry and Applied Statistics) – Department of Statistics and Information, Federal Rural University of Pernambuco, Recife, Pernambuco, 2014 (available upon request).
Counterfactual Estimation and Inference Methods for Counterfactual Analysis
Implements the estimation and inference methods for counterfactual analysis described in Chernozhukov, Fernandez-Val and Melly (2013) <DOI:10.3982/ECTA10582> ‘Inference on Counterfactual Distributions,’ Econometrica, 81(6). The counterfactual distributions considered are the result of changing either the marginal distribution of covariates related to the outcome variable of interest, or the conditional distribution of the outcome given the covariates. They can be applied to estimate quantile treatment effects and wage decompositions.
countHMM Penalized Estimation of Flexible Hidden Markov Models for Time Series of Counts
Provides tools for penalized estimation of flexible hidden Markov models for time series of counts w/o the need to specify a (parametric) family of distributions. These include functions for model fitting, model checking, and state decoding. For details, see Adam, T., Langrock, R., and Wei, C.H. (2019): Penalized Estimation of Flexible Hidden Markov Models for Time Series of Counts. <arXiv:1901.03275>.
Countr Flexible Univariate and Bivariate Count Process Probability
Flexible univariate and bivariate count models based on the Weibull distribution. The models may include covariates and can be specified with familiar formula syntax.
COUSCOus A Residue-Residue Contact Detecting Method
Contact prediction using shrinked covariance (COUSCOus). COUSCOus is a residue-residue contact detecting method approaching the contact inference using the glassofast implementation of Matyas and Sustik (2012, The University of Texas at Austin UTCS Technical Report 2012:1-3. TR-12-29.) that solves the L_1 regularised Gaussian maximum likelihood estimation of the inverse of a covariance matrix. Prior to the inverse covariance matrix estimation we utilise a covariance matrix shrinkage approach, the empirical Bayes covariance estimator, which has been shown by Haff (1980) <DOI:10.1214/aos/1176345010> to be the best estimator in a Bayesian framework, especially dominating estimators of the form aS, such as the smoothed covariance estimator applied in a related contact inference technique PSICOV.
covafillr Local Polynomial Regression of State Dependent Covariates in State-Space Models
Facilitates local polynomial regression for state dependent covariates in state-space models. The functionality can also be used from ‘C++’ based model builder tools such as ‘Rcpp’/’inline’, ‘TMB’, or ‘JAGS’.
covatest Tests on Properties of Space-Time Covariance Functions
Tests on properties of space-time covariance functions. Tests on symmetry, separability and for assessing different forms of non-separability are available. Moreover tests on some classes of covariance functions, such that the classes of product-sum models, Gneiting models and integrated product models have been provided.
covequal Test for Equality of Covariance Matrices
Computes p-values using the largest root test using an approximation to the null distribution by Johnstone (2008) <DOI:10.1214/08-AOS605>.
COveR Clustering with Overlaps
Provide functions for overlaps clustering, fuzzy clustering and interval-valued data manipulation. The package implement the following algorithms: OKM (Overlapping Kmeans) from Cleuziou, G. (2007) <doi:10.1109/icpr.2008.4761079> ; NEOKM (Non-exhaustive overlapping Kmeans) from Whang, J. J., Dhillon, I. S., and Gleich, D. F. (2015) <doi:10.1137/1.9781611974010.105> ; Fuzzy Cmeans from Bezdek, J. C. (1981) <doi:10.1007/978-1-4757-0450-1> ; Fuzzy I-Cmeans from de A.T. De Carvalho, F. (2005) <doi:10.1016/j.patrec.2006.08.014>.
covmat Covariance Matrix Estimation
We implement a collection of techniques for estimating covariance matrices. Covariance matrices can be built using missing data. Stambaugh Estimation and FMMC methods can be used to construct such matrices. Covariance matrices can be built by denoising or shrinking the eigenvalues of a sample covariance matrix. Such techniques work by exploiting the tools in Random Matrix Theory to analyse the distribution of eigenvalues. Covariance matrices can also be built assuming that data has many underlying regimes. Each regime is allowed to follow a Dynamic Conditional Correlation model. Robust covariance matrices can be constructed by multivariate cleaning and smoothing of noisy data.
covr Test Coverage for Packages
Track and report code coverage for your package and (optionally) upload the results to a coverage service like Codecov ( ) or Coveralls ( ). Code coverage is a measure of the amount of code being exercised by the tests. It is an indirect measure of test quality. This package is compatible with any testing methodology or framework and tracks coverage of both R code and compiled C/C++/Fortran code.
CovSelHigh Model-Free Covariate Selection in High Dimensions
Model-free selection of covariates in high dimensions under unconfoundedness for situations where the parameter of interest is an average causal effect. This package is based on model-free backward elimination algorithms proposed in de Luna, Waernbaum and Richardson (2011) <DOI:10.1093/biomet/asr041> and VanderWeele and Shpitser (2011) <DOI:10.1111/j.1541-0420.2011.01619.x>. Confounder selection can be performed via either Markov/Bayesian networks, random forests or LASSO.
covTestR Covariance Matrix Tests
Testing functions for Covariance Matrices. These tests include high-dimension homogeneity of covariance matrix testing described by Schott (2007) <doi:10.1016/j.csda.2007.03.004> and high-dimensional one-sample tests of covariance matrix structure described by Fisher, et al. (2010) <doi:10.1016/j.jmva.2010.07.004>. Covariance matrix tests use C++ to speed performance and allow larger data sets.
CovTools Statistical Tools for Covariance Analysis
Covariance is of universal prevalence across various disciplines within statistics. We provide a rich collection of geometric and inferential tools for convenient analysis of covariance structures, topics including distance measures, mean covariance estimator, covariance hypothesis test for one-sample and two-sample cases, and covariance estimation. For an introduction to covariance in multivariate statistical analysis, see Schervish (1987) <doi:10.1214/ss/1177013111>.
cowbell Performs Segmented Linear Regression on Two Independent Variables
Implements a specific form of segmented linear regression with two independent variables. The visualization of that function looks like a quarter segment of a cowbell giving the package its name. The package has been specifically constructed for the case where minimum and maximum value of the dependent and two independent variables are known a prior, which is usually the case when those values are derived from Likert scales.
cowplot Streamlined Plot Theme and Plot Annotations for ‘ggplot2’
Some helpful extensions and modifications to the ‘ggplot2’ library. In particular, this package makes it easy to combine multiple ‘ggplot2’ plots into one and label them with letters, e.g. A, B, C, etc., as is often required for scientific publications. The package also provides a streamlined and clean theme that is used in the Wilke lab, hence the package name, which stands for Claus O. Wilke’s plot library.
coxed Duration-Based Quantities of Interest for the Cox Proportional Hazards Model
Functions for generating, simulating, and visualizing expected durations and marginal changes in duration from the Cox proportional hazards model.
Coxnet Regularized Cox Model
Cox model regularized with net (L1 and Laplacian), elastic-net (L1 and L2) or lasso (L1) penalty. In addition, it efficiently solves an approximate L0 variable selection based on truncated likelihood function. Moreover, it can also handle the adaptive version of these regularization forms, such as adaptive lasso and net adjusting for signs of linked coefficients. The package uses one-step coordinate descent algorithm and runs extremely fast by taking into account the sparsity structure of coefficients.
coxphMIC Sparse estimation method for Cox Proportional Hazards
coxphMIC, which implements the sparse estimation method for Cox proportional hazards models via approximated information criterion (Su et al., 2016 Biometrics). The developed methodology is named MIC which stands for ‘Minimizing approximated Information Criteria’. A reparameterization step is introduced to enforce sparsity while at the same time keeping the objective function smooth. As a result, MIC is computationally fast with a superior performance in sparse estimation.
CoxPlus Cox Regression (Proportional Hazards Model) with Multiple Causes and Mixed Effects
A high performance package estimating Proportional Hazards Model when an even can have more than one causes, including support for random and fixed effects, tied events, and time-varying variables.
coxrt Cox Proportional Hazards Regression for Right Truncated Data
Fits Cox regression based on retrospectively ascertained times-to-event. The method uses Inverse-Probability-Weighting estimating equations.
CP Conditional Power Calculations
Functions for calculating the conditional power for different models in survival time analysis within randomized clinical trials with two different treatments to be compared and survival as an endpoint.
cplots Plots for Circular Data
Provides functions to produce some circular plots for circular data, in a height- or area-proportional manner. They include barplots, smooth density plots, stacked dot plots, histograms, multi-class stacked smooth density plots, and multi-class stacked histograms. The new methodology for general area-proportional circular visualization is described in an article submitted (after revision) to Journal of Computational and Graphical Statistics.
cpm Sequential and Batch Change Detection Using Parametric and Nonparametric Methods
Sequential and batch change detection for univariate data streams, using the change point model framework. Functions are provided to allow the parametric monitoring of sequences of Gaussian, Bernoulli and Exponential random variables, along with functions implementing more general nonparametric methods for monitoring sequences which have an unspecified or unknown distribution.
CPP Composition of Probabilistic Preferences (CPP)
CPP is a multiple criteria decision method to evaluate alternatives on complex decision making problems, by a probabilistic approach. The CPP was created and expanded by Sant’Anna, Annibal P. (2015) <doi:10.1007/978-3-319-11277-0>.
cpr Control Polygon Reduction
Implementation of the Control Polygon Reduction and Control Net Reduction methods for finding parsimonious B-spline regression models.
CPsurv Nonparametric Change Point Estimation for Survival Data
Nonparametric change point estimation for survival data based on p-values of exact binomial tests.
cpt Classification Permutation Test
Non-parametric test for equality of multivariate distributions. Trains a classifier to classify (multivariate) observations as coming from one of two distributions. If the classifier is able to classify the observations better than would be expected by chance (using permutation inference), then the null hypothesis that the two distributions are equal is rejected.
cptcity cpt-city’ Colour Gradients
Incorporates colour gradients from the ‘cpt-city’ web archive available at <http://…/>.
cpumemlog Monitor CPU and RAM usage of a process (and its children) is a Bash shell script that monitors CPU and RAM usage of a given process and its children. The main aim for writing this script was to get insight about the behaviour of a process and to spot bottlenecks without GUI tools, e.g., it is very useful to spot that the computationally intensive process on a remote server died due to hitting RAM limit or something of that sort. The statistics about CPU, RAM, and all that are gathered from the system utility ps. While the utility top can be used for this interactively, it is tedious to stare at its dynamic output and quite hard to spot consumption at the peak and follow the trends etc. Yet another similar utility is time, which though only gives consumption of resources at the peak. cpumemlogplot.R is a companion R script to used to summarize and plot the gathered data.
cqrReg Quantile, Composite Quantile Regression and Regularized Versions
Estimate quantile regression(QR) and composite quantile regression (cqr) and with adaptive lasso penalty using interior point (IP), majorize and minimize(MM), coordinate descent (CD), and alternating direction method of multipliers algorithms(ADMM).
cquad Conditional Maximum Likelihood for Quadratic Exponential Models for Binary Panel Data
Estimation, based on conditional maximum likelihood, of the quadratic exponential model proposed by Bartolucci, F. & Nigro, V. (2010, Econometrica) and of a simplified and a modified version of this model. The quadratic exponential model is suitable for the analysis of binary longitudinal data when state dependence (further to the effect of the covariates and a time-fixed individual intercept) has to be taken into account. Therefore, this is an alternative to the dynamic logit model having the advantage of easily allowing conditional inference in order to eliminate the individual intercepts and then getting consistent estimates of the parameters of main interest (for the covariates and the lagged response). The simplified version of this model does not distinguish, as the original model does, between the last time occasion and the previous occasions. The modified version formulates in a different way the interaction terms and it may be used to test in a easy way state dependence as shown in Bartolucci, F., Nigro, V. & Pigini, C. (2013, Econometric Reviews). The package also includes estimation of the dynamic logit model by a pseudo conditional estimator based on the quadratic exponential model, as proposed by Bartolucci, F. & Nigro, V. (2012, Journal of Econometrics).
cr17 Testing Differences Between Competing Risks Models and Their Visualisations
Tool for analyzing competing risks models. The main point of interest is testing differences between groups (as described in R.J Gray (1988) <doi:10.1214/aos/1176350951> and J.P. Fine, R.J Gray (1999) <doi:10.2307/2670170>) and visualizations of survival and cumulative incidence curves.
CramTest Univariate Cramer Test on Two Samples of Data
Performs the univariate two-sample Cramer test to identify differences between two groups. This package provides a faster method for calculating the p-value. For further information, refer to ‘Properties, Advantages and a Faster p-value Calculation of the Cramer test’ by Telford et al. (submitted for review).
crandatapkgs Find Data-Only Packages on CRAN
Provides a data.frame listing of known data-only and data-heavy packages available on CRAN.
crandb Access to the CRAN Database API
The CRAN database provides an API for programatically accessing all meta-data of CRAN R packages. This API can be used for various purposes, here are three examples I am woking on right now:
• Writing a package manager for R. The package manager can use the CRAN DB API to query dependencies, or other meta data.
• Building a search engine for CRAN packages. The DB itself does not provide a search API, but it can be (easily) mirrored in a search engine.
• Creating an RSS feed for the new, updated or archived packages on CRAN.
cranlike Tools for ‘CRAN’-Like Repositories
A set of functions to manage ‘CRAN’-like repositories efficiently.
cranlogs Download Logs from the RStudio CRAN Mirror
API to the database of CRAN package downloads from the RStudio CRAN mirror. The database itself is at , see https://…/ for the raw API.
cranly Package Directives and Collaboration Networks in CRAN
Provides core visualisations and summaries for the CRAN package database. The package provides comprehensive methods for cleaning up and organising the information in the CRAN package database, for building package directives networks (depends, imports, suggests, enhances) and collaboration networks, and for computing summaries and producing interactive visualisations from the resulting networks. Network visualisation is through the ‘visNetwork’ <https://…/package=visNetwork> R package. The package also provides functions to coerce the networks to ‘igraph’ <https://…/package=igraph> objects for further analyses and modelling.
CRANsearcher RStudio Addin for Searching Packages in CRAN Database Based on Keywords
One of the strengths of R is its vast package ecosystem. Indeed, R packages extend from visualization to Bayesian inference and from spatial analyses to pharmacokinetics (<https://…/> ). There is probably not an area of quantitative research that isn’t represented by at least one R package. At the time of this writing, there are more than 10,000 active CRAN packages. Because of this massive ecosystem, it is important to have tools to search and learn about packages related to your personal R needs. For this reason, we developed an RStudio addin capable of searching available CRAN packages directly within RStudio.
crblocks Categorical Randomized Block Data Analysis
Implements a statistical test for comparing bar plots or histograms of categorical data derived from a randomized block repeated measures layout.
credentials Tools for Managing SSH and Git Credentials
Setup and retrieve HTTPS and SSH credentials for use with ‘git’ and other services. For HTTPS remotes the package interfaces the ‘git-credential’ utility which ‘git’ uses to store HTTP usernames and passwords. For SSH remotes we provide convenient functions to find or generate appropriate SSH keys. The package both helps the user to setup a local git installation, and also provides a back-end for git/ssh client libraries to authenticate with existing user credentials.
creditmodel Build Binary Classification Models in One Integrated Offering
Provides a toolkit for building predictive models in one integrated offering. Contains infrastructure functionalities such as data exploration and preparation, missing values treatment, outliers treatment, variable derivation, variable selection, dimensionality reduction, grid search for hyperparameters, data mining and visualization, model evaluation, strategy analysis etc. ‘creditmodel’ is designed to make the development of binary classification models (machine learning based models as well as credit scorecard) simpler and faster.
CreditRisk Evaluation of Credit Risk with Structural and Reduced Form Models
Evaluation of default probability of sovereign and corporate entities based on structural or intensity based models and calibration on market Credit Default Swap quotes. Damiano Brigo, Massimo Morini, Andrea Pallavicini (2013): ‘Counterparty Credit Risk, Collateral and Funding. With Pricing Cases for All Asset Classes’.
credsubs Credible Subsets
Functions for constructing simultaneous credible bands and identifying subsets via the ‘credible subsets’ (also called ‘credible subgroups’) method.
crfsuite Conditional Random Fields for Labelling Sequential Data in Natural Language Processing
Wraps the ‘CRFsuite’ library <https://…/crfsuite> allowing users to fit a Conditional Random Field model and to apply it on existing data. The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for named entity recognition, text chunking, part of speech tagging, intent recognition or classification of any category you have in mind. Next to training, a small web application is included in the package to allow you to easily construct training data.
crisp Fits a Model that Partitions the Covariate Space into Blocks in a Data- Adaptive Way
Implements convex regression with interpretable sharp partitions (CRISP), which considers the problem of predicting an outcome variable on the basis of two covariates, using an interpretable yet non-additive model. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 <http://…/15-344.pdf>.
crminer Fetch ‘Scholary’ Full Text from ‘Crossref’
Text mining client for ‘Crossref’ (<> ). Includes functions for getting getting links to full text of articles, fetching full text articles from those links or Digital Object Identifiers (‘DOIs’), and text extraction from ‘PDFs’.
crmPack Object-Oriented Implementation of CRM Designs
Implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules.
crochet Implementation Helper for [ and [<- Of Custom Matrix-Like Types
Functions to help implement the extraction / subsetting / indexing function [ and replacement function [<- of custom matrix-like types (based on S3, S4, etc.), modeled as closely to the base matrix class as possible (with tests to prove it).
cromwellDashboard A Dashboard to Visualize Scientific Workflows in ‘Cromwell’
A dashboard supports the usage of ‘cromwell’. ‘Cromwell’ is a scientific workflow engine for command line users. This package utilizes ‘cromwell’ REST APIs and provides these convenient functions: timing diagrams for running workflows, ‘cromwell’ engine status, a tabular workflow list. For more information about ‘cromwell’, visit <>.
cronR Schedule R Scripts and Processes with the ‘cron’ Job Scheduler
Create, edit, and remove ‘cron’ jobs on your unix-alike system. The package provides a set of easy-to-use wrappers to ‘crontab’. It also provides an RStudio add-in to easily launch and schedule your scripts.
crop Graphics Cropping Tool
A device closing function which is able to crop graphics (e.g., PDF, PNG files) on Unix-like operating systems with the required underlying command-line tools installed.
CrossClustering A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters and Identification of Outliers
Computes a partial clustering algorithm that combines the Ward’s minimum variance and Complete Linkage algorithms, providing automatic estimation of a suitable number of clusters and identification of outlier elements.
crossdes Construction of Crossover Designs
Contains functions for the construction of carryover balanced crossover designs. In addition contains functions to check given designs for balance.
Crossover Analysis and Search of Crossover Designs
Package Crossover provides different crossover designs from combinatorial or search algorithms as well as from literature and a GUI to access them.
crossrun Joint Distribution of Number of Crossings and Longest Run
Joint distribution of number of crossings and the longest run in a series of independent Bernoulli trials. The computations uses an iterative procedure where computations are based on results from shorter series. The procedure conditions on the start value and partitions by further conditioning on the position of the first crossing (or none).
crosstalk Inter-Widget Interactivity for HTML Widgets
Provides building blocks for allowing HTML widgets to communicate with each other, with Shiny or without (i.e. static .html files). Currently supports linked brushing and filtering.
CrossValidate Classes and Methods for Cross Validation of ‘Class Prediction’ Algorithms
Defines classes and methods to cross-validate various binary classification algorithms used for ‘class prediction’ problems.
crosswalkr Rename and Encode Data Frames Using External Crosswalk Files
A pair of functions for renaming and encoding data frames using external crosswalk files. It is especially useful when constructing master data sets from multiple smaller data sets that do not name or encode variables consistently across files. Based on similar commands in ‘Stata’.
crossword.r Generating Crosswords from Word Lists
Generate crosswords from a list of words.
crov Constrained Regression Model for an Ordinal Response and Ordinal Predictors
Fits a constrained regression model for an ordinal response with ordinal predictors and possibly others, Espinosa and Hennig (2018) <arXiv:1804.08715>. The parameter estimates associated with an ordinal predictor are constrained to be monotonic. If a monotonicity direction (isotonic or antitonic) is not specified for an ordinal predictor by the user, then the monotonicity direction classification procedure establishes it. A monotonicity test is also available to test the null hypothesis of monotonicity over a set of parameters associated with an ordinal predictor.
crqanlp Cross-Recurrence Quantification Analysis for Dynamic Natural Language Processing
Cross-recurrence quantification analysis for word series, from text, known as categorical recurrence analysis. Uses the ‘crqa’ R package by Coco and Dale (2014) <doi:10.3389/fpsyg.2014.00510>. Functions are wrappers to facilitate exploration of the sequential properties of text.
crrp Penalized Variable Selection in Competing Risks Regression
In competing risks regression, the proportional subdistribution hazards(PSH) model is popular for its direct assessment of covariate effects on the cumulative incidence function. This package allows for penalized variable selection for the PSH model. Penalties include LASSO, SCAD, MCP, and their group versions.
crseEventStudy A Robust and Powerful Test of Abnormal Stock Returns in Long-Horizon Event Studies
Based on Dutta et al. (2018) <doi:10.1016/j.jempfin.2018.02.004>, this package provides their standardized test for abnormal returns in long-horizon event studies. The methods used improve the major weaknesses of size, power, and robustness of long-run statistical tests described in Kothari/Warner (2007) <doi:10.1016/B978-0-444-53265-7.50015-9>. Abnormal returns are weighted by their statistical precision (i.e., standard deviation), resulting in abnormal standardized returns. This procedure efficiently captures the heteroskedasticity problem. Clustering techniques following Cameron et al. (2011) <10.1198/jbes.2010.07136> are adopted for computing cross-sectional correlation robust standard errors. The statistical tests in this package therefore accounts for potential biases arising from returns’ cross-sectional correlation, autocorrelation, and volatility clustering without power loss.
crskdiag Diagnostics for Fine and Gray Model
Provides the implementation of analytical and graphical approaches for checking the assumptions of the Fine and Gray model.
crsnls Nonlinear Regression Parameters Estimation by ‘CRS4HC’ and ‘CRS4HCe’
Functions for nonlinear regression parameters estimation by algorithms based on Controlled Random Search algorithm. Both functions (crs4hc(), crs4hce()) adapt current search strategy by four heuristics competition. In addition, crs4hce() improves adaptability by adaptive stopping condition.
crtests Classification and Regression Tests
Provides wrapper functions for running classification and regression tests using different machine learning techniques, such as Random Forests and decision trees. The package provides standardized methods for preparing data to suit the algorithm’s needs, training a model, making predictions, and evaluating results. Also, some functions are provided to run multiple instances of a test.
CRTgeeDR Doubly Robust Inverse Probability Weighted Augmented GEE Estimator
Implements a semi-parametric GEE estimator accounting for missing data with Inverse-probability weighting (IPW) and for imbalance in covariates with augmentation (AUG). The estimator IPW-AUG-GEE is Doubly robust (DR).
crul HTTP Client
A simple HTTP client, with tools for making HTTP requests, and mocking HTTP requests. The package is built on R6, and takes inspiration from Ruby’s ‘faraday’ gem (<https://…/faraday> ). The package name is a play on curl, the widely used command line tool for HTTP, and this package is built on top of the R package ‘curl’, an interface to ‘libcurl’ (<https://…/libcurl> ).
crunch Data Tools
The service ( ) provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.
crunchy Shiny Apps on Crunch
To facilitate building custom dashboards on the Crunch data platform <http://…/>, the ‘crunchy’ package provides tools for working with ‘shiny’. These tools include utilities to manage authentication and authorization automatically and custom stylesheets to help match the look and feel of the Crunch web application.
CRWRM Changing the Reference Group without Re-Running the Model
To re-calculate the coefficients and the standard deviation when changing the reference group.
csabounds Bounds on Distributional Treatment Effect Parameters
The joint distribution of potential outcomes is not typically identified under standard identifying assumptions such as selection on observables or even when individuals are randomly assigned to being treated. This package contains methods for obtaining tight bounds on distributional treatment effect parameters when panel data is available and under a Copula Stability Assumption as in Callaway (2017) <>.
CsChange Testing for Change in C-Statistic
Calculate the confidence interval and p value for change in C-statistic. The adjusted C-statistic is calculated by using formula as ‘Somers’ Dxy rank correlation’/2+0.5. The confidence interval was calculated by using the bootstrap method. The p value was calculated by using the Z testing method. Please refer to the article of Peter Ganz et al. (2016) <doi:10.1001/jama.2016.5951>.
CSeqpat Frequent Contiguous Sequential Pattern Mining of Text
Mines contiguous sequential patterns in text.
CSFA Connectivity Scores with Factor Analysis
Applies factor analysis methodology to microarray data in order to derive connectivity scores between compounds. The package also contains an implementation of the connectivity score algorithm by Zhang and Gant (2008) <doi:10.1186/1471-2105-9-258>.
csn Closed Skew-Normal Distribution
Provides functions for computing the density and the log-likelihood function of closed-skew normal variates, and for generating random vectors sampled from this distribution. See Gonzalez-Farias, G., Dominguez-Molina, J., and Gupta, A. (2004). The closed skew normal distribution, Skew-elliptical distributions and their applications: a journey beyond normality, Chapman and Hall/CRC, Boca Raton, FL, pp. 25-42.
csp Correlates of State Policy Data Set in R
Provides the Correlates of State Policy data set for easy use in R.
csrplus Methods to Test Hypotheses on the Distribution of Spatial Point Processes
Includes two functions to evaluate the hypothesis of complete spatial randomness (csr) in point processes. The function ‘mwin’ calculates quadrat counts to estimate the intensity of a spatial point process through the moving window approach proposed by Bailey and Gatrell (1995). Event counts are computed within a window of a set size over a fine lattice of points within the region of observation. The function ‘pielou’ uses the nearest neighbor test statistic and asymptotic distribution proposed by Pielou (1959) to compare the observed point process to one generated under csr. The value can be compared to that given by the more widely used test proposed by Clark and Evans (1954).
cssTools Cognitive Social Structure Tools
A collection of tools for estimating a network from a random sample of cognitive social structure (CSS) slices. Also contains functions for evaluating a CSS in terms of various error types observed in each slice.
cstab Selection of Number of Clusters via Normalized Clustering Instability
Selection of the number of clusters in cluster analysis using stability methods.
CSTools Assessing Skill of Climate Forecasts on Seasonal-to-Decadal Timescales
Exploits dynamical seasonal forecasts in order to provide information relevant to stakeholders at the seasonal timescale. The package contains process-based methods for forecast calibration, bias correction, statistical and stochastic downscaling, optimal forecast combination and multivariate verification, as well as basic and advanced tools to obtain tailored products. Doblas-Reyes et al. (2005) <doi:10.1111/j.1600-0870.2005.00104.x>. Mishra et al. (2018) <doi:10.1007/s00382-018-4404-z>. Terzago et al. (2018) <doi:10.5194/nhess-18-2825-2018>. Torralba et al. (2017) <doi:10.1175/JAMC-D-16-0204.1>. D’Onofrio et al. (2014) <doi:10.1175/JHM-D-13-096.1>.
csv Read and Write CSV Files with Selected Conventions
Reads and writes CSV with selected conventions. Uses the same generic function for reading and writing to promote consistent formats.
CTAShiny Interactive Application for Working with Contingency Tables
An interactive application for working with contingency Tables. The application has a template for solving contingency table problems like chisquare test of independence,association plot between two categorical variables. Runtime examples are provided in the package function as well as at <https://…/>.
cthreshER Continuous Threshold Expectile Regression
Estimation and inference methods for the continuous threshold expectile regression. It can fit the continuous threshold expectile regression and test the existence of change point, for the paper, ‘Feipeng Zhang and Qunhua Li (2016). A continuous threshold expectile regression, submitted.’
CTM A Text Mining Toolkit for Chinese Document
The CTM package is designed to solve problems of text mining and is specific for Chinese document.
ctmcd Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data
Functions for estimating Markov generator matrices from discrete-time observations. The implemented approaches comprise diagonal adjustment, weighted adjustment and quasi-optimization of matrix logarithm based candidate solutions, an expectation-maximization algorithm as well as a Gibbs sampler.
ctmle Collaborative Targeted Maximum Likelihood Estimation
Implements the general template for collaborative targeted maximum likelihood estimation. It also provides several commonly used C-TMLE instantiation, like the vanilla/scalable variable-selection C-TMLE (Ju et al. (2017) <doi:10.1177/0962280217729845>) and the glmnet-C-TMLE algorithm (Ju et al. (2017) <arXiv:1706.10029>).
ctqr Censored and Truncated Quantile Regression
Estimation of quantile regression models for survival data.
ctsem Continuous Time Structural Equation Modelling
An easily accessible continuous (and discrete) time dynamic modelling package for panel and time series data, reliant upon the OpenMx. package ( ) for computation. Most dynamic modelling approaches to longitudinal data rely on the assumption that time intervals between observations are consistent. When this assumption is adhered to, the data gathering process is necessarily limited to a specific schedule, and when broken, the resulting parameter estimates may be biased and reduced in power. Continuous time models are conceptually similar to vector autoregressive models (thus also the latent change models popularised in a structural equation modelling context), however by explicitly including the length of time between observations, continuous time models are freed from the assumption that measurement intervals are consistent. This allows: data to be gathered irregularly; the elimination of noise and bias due to varying measurement intervals; parsimonious structures for complex dynamics. The application of such a model in this SEM framework allows full-information maximum-likelihood estimates for both N = 1 and N > 1 cases, multiple measured indicators per latent process, and the flexibility to incorporate additional elements, including individual heterogeneity in the latent process and manifest intercepts, and time dependent and independent exogenous covariates. Furthermore, due to the SEM implementation we are able to estimate a random effects model where the impact of time dependent and time independent predictors can be assessed simultaneously, but without the classic problems of random effects models assuming no covariance between unit level effects and predictors.
ctsmr Continuous Time Stochastic Modelling for R
CTSM is a tool for estimating embedded parameters in a continuous time stochastic state space model. CTSM has been developed at DTU Compute (former DTU Informatics) over several years. CTSM-R provides a new scripting interface through the statistical language R. Mixing CTSM with R provides easy access to data handling and plotting tools required in any kind of modelling.
CTTinShiny Shiny Interface for the CTT Package
A Shiny interface developed in close coordination with the CTT package, providing a GUI that guides the user through CTT analyses.
CTTShiny Classical Test Theory via Shiny
Interactive shiny application for running classical test theory (item analysis).
CUB A Class of Mixture Models for Ordinal Data
Estimating and testing models for ordinal data within the family of CUB models and their extensions (where CUB stands for Combination of a discrete Uniform and a shifted Binomial distributions).
Cubist Rule- and Instance-Based Regression Modeling
Regression modeling using rules with added instance-based corrections.
CuCubes MultiDimensional Feature Selection (MDFS)
Functions for MultiDimensional Feature Selection (MDFS): * calculating multidimensional information gains, * finding interesting tuples for chosen variables, * scoring variables, * finding important variables, * plotting selection results. CuCubes is also known as CUDA Cubes and it is a library that allows fast CUDA-accelerated computation of information gains in binary classification problems. This package wraps CuCubes and provides an alternative CPU version as well as helper functions for building MultiDimensional Feature Selectors.
CUFF Charles’s Utility Function using Formula
Utility functions that provides wrapper to descriptive base functions like correlation, mean and table . It makes use of the formula interface to pass variables to functions. It also provides operators like to concatenate (%+%), to repeat and manage character vector for nice display.
cultevo Tools, Measures and Statistical Tests for Cultural Evolution
Provides tools for measuring the compositionality of signalling systems (in particular the information-theoretic measure due to Spike (2016) <http://…/25930> and the Mantel test for distance matrix correlation (after Dietz 1983) <doi:10.1093/sysbio/32.1.21>), functions for computing string and meaning distance matrices as well as an implementation of the Page test for monotonicity of ranks (Page 1963) <doi:10.1080/01621459.1963.10500843> with exact p-values up to k = 22.
curl A Modern and Flexible Web Client for R
The curl() and curl_download() functions provide highly configurable drop-in replacements for base url() and download.file() with better performance, support for encryption (https://, ftps://), ‘gzip’ compression, authentication, and other ‘libcurl’ goodies. The core of the package implements a framework for performing fully customized requests where data can be processed either in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of ‘libcurl’ is recommended; for a more-user-friendly web client see the ‘httr’ package which builds on this package with HTTP specific tools and logic.
The curl package: a modern R interface to libcurl
curlconverter Tools to Transform ‘cURL’ Command-Line Calls to ‘httr’ Requests
Deciphering web/’REST’ ‘API’ and ‘XHR’ calls can be tricky, which is one reason why internet browsers provide ‘Copy as cURL’ functionality within their ‘Developer Tools’ pane(s). These ‘cURL’ command-lines can be difficult to wrangle into an ‘httr’ ‘GET’ or ‘POST’ request, but you can now ‘straighten’ these ‘cURLs’ either from data copied to the system clipboard or by passing in a vector of ‘cURL’ command-lines and getting back a list of parameter elements which can be used to form ‘httr’ requests. You can also make a complete/working/callable ‘httr::VERB’ function right from the tools provided.
curry Partial Function Application with %<%, %-<%
Partial application is the process of reducing the arity of a function by fixing one or more arguments, thus creating a new function lacking the fixed arguments. The curry package provides three different ways of performing partial function application by fixing arguments from either end of the argument list (currying and tail currying) or by fixing multiple named arguments (partial application). This package provides this functionality through the %<%, %-<%, and %><% operators which allows for a programming style comparable to modern functional languages. Compared to other implementations such a purrr::partial() the operators in curry composes functions with named arguments, aiding in autocomplete etc.
curstatCI Confidence Intervals for the Current Status Model
Computes the maximum likelihood estimator, the smoothed maximum likelihood estimator and pointwise bootstrap confidence intervals for the distribution function under current status data. Groeneboom and Hendrickx (2017) <arXiv:1701.07359>.
curvecomp Multiple Curve Comparisons Using Parametric Bootstrap
Performs multiple comparison procedures on curve observations among different treatment groups. The methods are applicable in a variety of situations (such as independent groups with equal or unequal sample sizes, or repeated measures) by using parametric bootstrap. References to these procedures can be found at Konietschke, Gel, and Brunner (2014) <doi:10.1090/conm/622/12431> and Westfall (2011) <doi:10.1080/10543406.2011.607751>.
CustomerScoringMetrics Evaluation Metrics for Customer Scoring Models Depending on Binary Classifiers
Functions for evaluating and visualizing predictive model performance (specifically: binary classifiers) in the field of customer scoring. These metrics include lift, lift index, gain percentage, top-decile lift, F1-score, expected misclassification cost and absolute misclassification cost. See Berry & Linoff (2004, ISBN:0-471-47064-3), Witten and Frank (2005, 0-12-088407-0) and Blattberg, Kim & Neslin (2008, ISBN:978-0-387-72578-9) for details. Visualization functions are included for lift charts and gain percentage charts. All metrics that require class predictions offer the possibility to dynamically determine cutoff values for transforming real-valued probability predictions into class predictions.
customizedTraining Customized Training for Lasso and Elastic-Net Regularized Generalized Linear Models
Customized training is a simple technique for transductive learning, when the test covariates are known at the time of training. The method identifies a subset of the training set to serve as the training set for each of a few identified subsets in the training set. This package implements customized training for the glmnet() and cv.glmnet() functions.
customLayout Extended Version of Layout Functionality for ‘Base’ and ‘Grid’ Graphics Systems
Create complicated drawing areas for multiple plots by combining much simpler layouts. It is an extended version of layout function from the ‘graphics’ package, but it also works with ‘grid’ graphics.
customsteps Customizable Higher-Order Recipe Step Functions
Customizable higher-order recipe step functions for the ‘recipes’ package. These step functions take ‘prep’ and ‘bake’ helper functions as inputs and create specifications of customized recipe steps as output.
cusum CUSUM Charts for Monitoring of Hospital Performance
Provides functions for constructing and evaluating CUSUM charts and RA-CUSUM charts with focus on false signal probability.
CUSUMdesign Compute Decision Interval and Average Run Length for CUSUM Charts
Computation of decision intervals (H) and average run lengths (ARL) for CUSUM charts.
cutpointr Determine and Evaluate Optimal Cutpoints in Binary Classification Tasks
Estimate cutpoints that optimize a specified metric in binary classification tasks and validate performance using bootstrapping. Some methods for more robust cutpoint estimation and various plotting functions are included.
CutpointsOEHR Optimal Equal-HR Method to Find Two Cutpoints for U-Shaped Relationships in Cox Model
Use optimal equal-HR method to determine two optimal cutpoints of a continuous predictor that has a U-shaped relationship with survival outcomes based on Cox regression model. The optimal equal-HR method estimates two optimal cut-points that have approximately the same log hazard value based on Cox regression model and divides individuals into different groups according to their HR values.
cvar Compute Expected Shortfall and Value at Risk for Continuous Distributions
Compute expected shortfall (ES) and Value at Risk (VaR) from a quantile function, distribution function, random number generator or probability density function. ES is also known as Conditional Value at Risk (CVaR). Virtually any continuous distribution can be specified. The functions are vectorized over the arguments. The computations are done directly from the definitions, see e.g. Acerbi and Tasche (2002) <doi:10.1111/1468-0300.00091>.
cvcrand Efficient Design and Analysis of Cluster Randomized Trials
Constrained randomization by Raab and Butcher (2001) <doi:10.1002/1097-0258(20010215)20:3%3C351::AID-SIM797%3E3.0.CO;2-C> is suitable for cluster randomized trials (CRTs) with a small number of clusters (e.g., 20 or fewer). The procedure of constrained randomization is based on the baseline values of some cluster-level covariates specified. The intervention effect on the individual outcome can then be analyzed through clustered permutation test introduced by Gail, et al. (1996) <doi:10.1002/(SICI)1097-0258(19960615)15:11%3C1069::AID-SIM220%3E3.0.CO;2-Q>. Motivated from Li, et al. (2016) <doi:10.1002/sim.7410>, the package performs constrained randomization on the baseline values of cluster-level covariates and cluster permutation test on the individual-level outcome for cluster randomized trials.
cvequality Tests for the Equality of Coefficients of Variation from Multiple Groups
Contains functions for testing for significant differences between multiple coefficients of variation. Includes Feltz and Miller’s (1996) <DOI:10.1002/(SICI)1097-0258(19960330)15:6%3C647::AID-SIM184%3E3.0.CO;2-P> asymptotic test and Krishnamoorthy and Lee’s (2014) <DOI:10.1007/s00180-013-0445-2> modified signed-likelihood ratio test. See the vignette for more, including full details of citations.
cvmgof Cramer-von Mises Goodness-of-Fit Tests
It is devoted to Cramer-von Mises goodness-of-fit tests. It implements three statistical methods based on Cramer-von Mises statistics to estimate and test a regression model.
CVR Canonical Variate Regression
Perform canonical variate regression (CVR) for two sets of covariates and a univariate response, with regularization and weight parameters tuned by cross validation.
cvxbiclustr Convex Biclustering Algorithm
An iterative algorithm for solving a convex formulation of the biclustering problem.
CVXR Disciplined Convex Optimization
An object-oriented modeling language for disciplined convex programming (DCP). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution.
cxhull Convex Hull
Computes the convex hull in arbitrary dimension, based on the Qhull library (<> ). The package provides a complete description of the convex hull: edges, ridges, facets, adjacencies. Triangulation is optional.
cyclocomp Cyclomatic Complexity of R Code
Cyclomatic complexity is a software metric (measurement), used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program’s source code. It was developed by Thomas J. McCabe, Sr. in 1976.
Cyclops Cyclic Coordinate Descent for Logistic, Poisson and Survival Analysis
This model fitting tool incorporates cyclic coordinate descent and majorization-minimization approaches to fit a variety of regression models found in large-scale observational healthcare data. Implementations focus on computational optimization and fine-scale parallelization to yield efficient inference in massive datasets.
cyphr High Level Encryption Wrappers
Encryption wrappers, using low-level support from ‘sodium’ and ‘openssl’. ‘cyphr’ tries to smooth over some pain points when using encryption within applications and data analysis by wrapping around differences in function names and arguments in different encryption providing packages. It also provides high-level wrappers for input/output functions for seamlessly adding encryption to existing analyses.
cytofan Plot Fan Plots for Cytometry Data using ‘ggplot2’
An implementation of Fan plots for cytometry data in ‘ggplot2’. For reference see Britton, E.; Fisher, P. & J. Whitley (1998) The Inflation Report Projections: Understanding the Fan Chart <https://…ojections-understanding-the-fan-chart> ).


d3heatmap A D3.js-based heatmap htmlwidget for R
This is an R package that implements a heatmap htmlwidget. It has the following features:
• Highlight rows/columns by clicking axis labels
• Click and drag over colormap to zoom in (click on colormap to zoom out)
• Optional clustering and dendrograms, courtesy of base::heatmap
Interactive heat maps
D3M Two Sample Test with Wasserstein Metric
Two sample test based on Wasserstein metric. This is motivated from detection of differential DNA-methylation sites based on underlying distributions.
D3partitionR Plotting D3 Hierarchical Plots in R and Shiny
Plotting hierarchical plots in R such as Sunburst, Treemap, Circle Treemap and Partition Chart.
d3plus Seamless ‘D3Plus’ Integration
Provides functions that offer seamless ‘D3Plus’ integration. The examples provided here are taken from the official ‘D3Plus’ website <>.
d3r d3.js’ Utilities for R
Helper functions for using ‘d3.js’ in R.
d3Tree Create Interactive Collapsible Trees with the JavaScript ‘D3’ Library
Create and customize interactive collapsible ‘D3’ trees using the ‘D3’ JavaScript library and the ‘htmlwidgets’ package. These trees can be used directly from the R console, from ‘RStudio’, in Shiny apps and R Markdown documents. When in Shiny the tree layout is observed by the server and can be used as a reactive filter of structured data.
DA.MRFA Dimensionality Assessment using Minimum Rank Factor Analysis
Performs Parallel Analysis for assessing the dimensionality of a set of variables using Minimum Rank Factor Analysis (see Timmerman & Lorenzo-Seva (2011) <DOI:10.1037/a0023353> and ten Berge & Kiers (1991) <DOI:10.1007/BF02294464> for more information). The package also includes the option to compute Minimum Rank Factor Analysis by itself, as well as the Greater Lower Bound calculation.
daarem Damped Anderson Acceleration with Epsilon Monotonicity for Accelerating EM-Like Monotone Algorithms
Implements the DAAREM method for accelerating the convergence of slow, monotone sequences from smooth, fixed-point iterations such as the EM algorithm. For further details about the DAAREM method. see Henderson, N.C. and Varadhan, R. (2018) <arXiv:1803.06673>.
DAC Calculating Data Agreement Criterion Scores to Rank Experts Based on Their Beliefs
Allows to calculate Data Agreement Criterion (DAC) scores. This can be done to determine prior-data conflict or to evaluate and compare multiple priors, which can be experts’ predictions. Bousquet (2008) <>.
dad Three-Way Data Analysis Through Densities
The three-way data consists of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides functional methods (principal component analysis, multidimensional scaling, discriminant analysis…) for such probability densities.
daff Diff, Patch and Merge for Data.frames
Diff, patch and merge for data frames. Document changes in data sets and use them to apply patches. Changes to data can be made visible by using render_diff. Daff uses the V8 package to wrap the ‘daff.js’ javascript library which is included in the package. Daff exposes a subset of ‘daff.js’ functionality, tailored for usage within R.
dagitty Graphical Analysis of Structural Causal Models
A port of the web-based software “DAGitty” for analyzing structural causal models (also known as directed acyclic graphs or DAGs). The package computes covariate adjustment sets for estimating causal effects, enumerates instrumental variables, derives testable implications (d-separation and vanishing tetrads), generates equivalent models, and includes a simple facility for data simulation.
DALEX Descriptive mAchine Learning EXplanations
Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance, but such black-box models usually lack of interpretability. ‘DALEX’ package contains various explainers that help to understand the link between input variables and model output. The single_variable() explainer extracts conditional response of a model as a function of a single selected variable. It is a wrapper over packages ‘pdp’ and ‘ALEPlot’. The single_prediction() explainer attributes arts of model prediction to articular variables used in the model. It is a wrapper over ‘breakDown’ package. The variable_dropout() explainer assess variable importance based on consecutive permutations. All these explainers can be plotted with generic plot() function and compared across different models.
dalmatian Automating the Fitting of Double Linear Mixed Models in ‘JAGS’
Automates fitting of double GLM in ‘JAGS’. Includes automatic generation of ‘JAGS’ scripts, running ‘JAGS’ via ‘rjags’, and summarizing the resulting output.
dang Dang’ Associated New Goodies
A collection of utility functions.
DAP Discriminant Analysis via Projections
An implementation of Discriminant Analysis via Projections (DAP) method for high-dimensional binary classification in the case of unequal covariance matrices. See Irina Gaynanova and Tianying Wang (2018) <arXiv:1711.04817v2>.
dapr purrr’-Like Apply Functions Over Input Elements
An easy-to-use, dependency-free set of functions for iterating over elements of various input objects. Functions are wrappers around base apply()/lapply()/vapply() functions but designed to have similar functionality to the mapping functions in the ‘purrr’ package <https://…/>. Specifically, function names more explicitly communicate the expected class of the output and functions also allow for the convenient shortcut of ‘~ .x’ instead of the more verbose ‘function(.x) .x’.
DarkDiv Estimating Probabilistic Dark Diversity
Estimation of dark diversity using species co-occurrences. It includes implementations of probabilistic dark diversity based on the Hypergeometric distribution, as well as estimations based on the Beals index, which can be transformed to binary predictions using different thresholds, or transformed into a favorability index. All methods include the possibility of using a calibration dataset that is used to estimate the indication matrix between pairs of species, or to estimate dark diversity directly on a single dataset. See De Caceres and Legendre (2008) <doi:10.1007/s00442-008-1017-y>, Lewis et al. (2016) <doi:10.1111/2041-210X.12443>, Partel et al. (2011) <doi:10.1016/j.tree.2010.12.004>, Real et al. (2017) <doi:10.1093/sysbio/syw072> for further information.
dashboard Interactive Data Visualization with D3.js
The dashboard package allows users to create web pages which display interactive data visualizations working in a standard modern browser. It displays them locally using the Rook server. Nor knowledge about web technologies nor Internet connection are required. D3.js is a JavaScript library for manipulating documents based on data. D3 helps the dashboard package bring data to life using HTML, SVG and CSS.
dat Tools for Data Manipulation
An implementation of common higher order functions with syntactic sugar for anonymous function. Provides also a link to ‘dplyr’ for common transformations on data frames to work around non standard evaluation by default.
data.table Extension of data.frame
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread). Offers a natural and flexible syntax, for faster development.
data.tree Hierarchical Data Structures
Create tree structures from hierarchical data, and use the utility methods to traverse the tree in various orders. Aggregate, print, convert to and from data.frame, and apply functions to your tree data. Useful for decision trees, machine learning, finance, and many other applications. Main Package for Working with ‘’ Data Sets
High-level tools for working with data sets. is a community where you can find interesting data, store and showcase your own data and data projects, and find and collaborate with other members. In addition to exploring, querying and charting data on the site, you can access data via ‘API’ endpoints and integrations. Use this package to access, query and explore data sets, and to integrate data into R projects. Visit <>, for additional information.
Data2LD Functional Data Analysis with Linear Differential Equations
Package ‘Data2LD’ was developed to support functional data analysis using the functions in package ‘fda’. The functions in this package are designed for the use of differential equations as modelling objects as described in J. Ramsay G. and Hooker (2017,ISBN 978-1-4939-7188-6) Dynamic Data Analysis, New York: Springer. The package includes data sets and script files for analyzing many of the examples in this book. ‘Matlab’ versions of the code and sample analyses are available by ftp from <http://…/>. There you find a set of .zip files containing the functions and sample analyses, as well as two .txt files giving instructions for installation and some additional information.
DatabaseConnector Connecting to Various Database Platforms
An R ‘DataBase Interface’ (‘DBI’) compatible interface to various database platforms (‘PostgreSQL’, ‘Oracle’, ‘Microsoft SQL Server’, ‘Amazon Redshift’, ‘Microsoft Parallel Database Warehouse’, ‘IBM Netezza’, ‘Apache Impala’, and ‘Google BigQuery’). Also includes support for fetching data as ‘ffdf’ objects. Uses ‘Java Database Connectivity’ (‘JDBC’) to connect to databases.
DatabaseConnectorJars JAR Dependencies for the ‘DatabaseConnector’ Package
Provides external JAR dependencies for the ‘DatabaseConnector’ package.
DatabionicSwarm Swarm Intelligence for Self-Organized Clustering
Algorithms implementing populations of agents which interact with one another and sense their environment may exhibit emergent behavior such as self-organization and swarm intelligence. Here a swarm system, called databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data such as natural clusters characterized by distance and/or density based structures in the data space. The first module is the parameter-free projection method Pswarm, which exploits the concepts of self-organization and emergence, game theory, swarm intelligence and symmetry considerations. The second module is a parameter-free high-dimensional data visualization technique, which generates projected points on a topographic map with hypsometric colors based on the generalized U-matrix. The third module is the clustering method itself with non-critical parameters. The clustering can be verified by the visualization and vice versa. The term DBS refers to the method as a whole. DBS enables even a non-professional in the field of data mining to apply its algorithms for visualization and/or clustering to data sets with completely different structures drawn from diverse research fields.
datacheckr Data Frame Column Name, Class and Value Checking
The primary function check_data() checks a data frame for column presence, column class and column values. If the user-defined conditions are met the function returns the an invisible copy of the original data frame, otherwise the function throws an informative error.
DataClean Data Cleaning
Includes functions that researchers or practitioners may use to clean raw data, transferring html, xlsx, txt data file into other formats. And it also can be used to manipulate text variables, extract numeric variables from text variables and other variable cleaning processes. It is originated from a author’s project which focuses on creative performance in online education environment. The resulting paper of that study will be published soon.
dataCompareR Compare Two Data Frames and Summarise the Difference
Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn’t intended to replace all.equal() as a way to test for equality.
datadr Divide and Recombine for Large, Complex Data
Methods for dividing data into subsets, applying analytical methods to the subsets, and recombining the results. Comes with a generic MapReduce interface as well. Works with key-value pairs stored in memory, on local disk, or on HDFS, in the latter case using the R and Hadoop Integrated Programming Environment (RHIPE).
DataEntry Make it Easier to Enter Questionnaire Data
This is a GUI application for defining attributes and setting valid values of variables, and then, entering questionnaire data in a data.frame.
DataExplorer Data Explorer
Data exploration process for data analysis and model building, so that users could focus on understanding data and extracting insights. The package automatically scans through each variable and does data profiling. Typical graphical techniques will be performed for both discrete and continuous features.
datafsm Estimating Finite State Machine Models from Data
Our method automatically generates models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple deterministic approximations that explain most of the structure of complex stochastic processes. We have applied the software to empirical data, and demonstrated it’s ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity.
DataLoader Import Multiple File Types
Functions to import multiple files of multiple data file types (‘.xlsx’, ‘.xls’, ‘.csv’, ‘.txt’) from a given directory into R data frames.
dataMaid A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Cleaning Process
Data cleaning is an important first step of any statistical analysis. dataMaid provides an extendable suite of test for common potential errors in a dataset. It produces a document with a thorough summary of the checks and the results that a human can use to identify possible errors.
dataMeta Create and Append a Data Dictionary for an R Dataset
Designed to create a basic data dictionary and append to the original dataset’s attributes list. The package makes use of a tidy dataset and creates a data frame that will serve as a linker that will aid in building the dictionary. The dictionary is then appended to the list of the original dataset’s attributes. The user will have the option of entering variable and item descriptions by writing code or use alternate functions that will prompt the user to add these.
datapack A Flexible Container to Transport and Manipulate Data and Associated Resources
Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI-ORE standard is described at <https://…/ore>. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at <https://…/draft-kunze-bagit-08>.
datapackage.r Data Package ‘Frictionless Data’
Work with ‘Frictionless Data Packages’ (<https://…/> ). Allows to load and validate any descriptor for a data package profile, create and modify descriptors and provides expose methods for reading and streaming data in the package. When a descriptor is a ‘Tabular Data Package’, it uses the ‘Table Schema’ package (<https://…/package=tableschema.r> ) and exposes its functionality, for each resource object in the resources field.
DataPackageR Construct Reproducible Analytic Data Sets as R Packages
A framework to help construct R data packages in a reproducible manner. Potentially time consuming processing of raw data sets into analysis ready data sets is done in a reproducible manner and decoupled from the usual R CMD build process so that data sets can be processed into R objects in the data package and the data package can then be shared, built, and installed by others without the need to repeat computationally costly data processing. The package maintains data provenance by turning the data processing scripts into package vignettes, as well as enforcing documentation and version checking of included data objects. Data packages can be version controlled in github, and used to share data for manuscripts, collaboration and general reproducibility.
datarobot DataRobot Predictive Modeling API
For working with the DataRobot predictive modeling platform’s API.
datasauRus Datasets from the Datasaurus Dozen
The Datasaurus Dozen is a set of datasets with the same summary statistics. They retain the same summary statistics despite having radically different distributions. The datasets represent a larger and quirkier object lesson that is typically taught via Anscombe’s Quartet (available in the ‘datasets’ package). Anscombe’s Quartet contains four very different distributions with the same summary statistics and as such highlights the value of visualisation in understanding data, over and above summary statistics. As well as being an engaging variant on the Quartet, the data is generated in a novel way. The simulated annealing process used to derive datasets from the original Datasaurus is detailed in ‘Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing’ <http://…/3025453.3025912>.
datasets.load Interface for Loading Datasets
Visual interface for loading datasets in RStudio from all installed (unloaded) packages.
Datasmith Tools to Complete Euclidean Distance Matrices
Implements several algorithms for Euclidean distance matrix completion, Sensor Network Localization, and sparse Euclidean distance matrix completion using the minimum spanning tree.
datastepr An Implementation of a SAS-Style Data Step
Based on a SAS data step. This allows for row-wise dynamic building of data, iteratively importing slices of existing dataframes, conducting analyses, and exporting to a results frame. This is particularly useful for differential or time-series analyses, which are often not well suited to vector-based operations.
datastructures Implementation of Core Data Structures
Implementation of advanced data structures such as hashmaps, heaps, or queues. Advanced data structures are essential in many computer science and statistics problems, for example graph algorithms or string analysis. The package uses ‘Boost’ and ‘STL’ data types and extends these to R with ‘Rcpp’ modules.
DataVisualizations Visualizations of High-Dimensional Data
Various visualizations of high-dimensional data such as heat map and silhouette plot for grouped data, visualizations of the distribution of distances, the scatter-density plot for two variables, the Shepard density plot and many more are presented here. Additionally, ‘DataVisualizations’ makes it possible to inspect the distribution of each feature of a dataset visually through the combination of four methods. More detailed explanations can be found in the book of Thrun, M.C.:’Projection-Based Clustering through Self-Organization and Swarm Intelligence’ (2018) <DOI:10.1007/978-3-658-20540-9>.
DataViz Data Visualisation Using an HTML Page and ‘D3.js’
Gives access to data visualisation methods that are relevant from the statistician’s point of view. Using ‘D3”s existing data visualisation tools to empower R language and environment. The throw chart method is a line chart used to illustrate paired data sets (such as before-after, male-female).
datetimeutils Utilities for Dates and Times
Utilities for handling dates and times, such as selecting particular days of the week or month, formatting timestamps as required by RSS feeds, or converting timestamp representations of other software (such as ‘MATLAB’ and ‘Excel’) to R. The package is lightweight (no dependencies, pure R implementations) and relies only on R’s standard classes to represent dates and times (‘Date’ and ‘POSIXt’); it aims to provide efficient implementations, through vectorisation and the use of R’s native numeric representations of timestamps where possible.
datr Dat’ Protocol Interface
Interface with the ‘Dat’ p2p network protocol <>. Clone archives from the network, share your own files, and install packages from the network.
dawai Discriminant Analysis with Additional Information
In applications it is usual that some additional information is available. This package dawai (an acronym for Discriminant Analysis With Additional Information) performs linear and quadratic discriminant analysis with additional information expressed as inequality restrictions among the populations means. It also computes several estimations of the true error rate.
dbarts Discrete Bayesian Additive Regression Trees Sampler
Fits Bayesian additive regression trees (BART) while allowing the updating of predictors or response so that BART can be incorporated as a conditional model in a Gibbs/MH sampler. Also serves as a drop-in replacement for package ‘BayesTree’.
dbfaker A Tool to Ensure the Validity of Database Writes
A tool to ensure the validity of database writes. It provides a set of utilities to analyze and type check the properties of data frames that are to be written to databases with SQL support.
DBfit A Double Bootstrap Method for Analyzing Linear Models with Autoregressive Errors
Computes the double bootstrap as discussed in McKnight, McKean, and Huitema (2000) <doi:10.1037/1082-989X.5.1.87>. The double bootstrap method provides a better fit for a linear model with autoregressive errors than ARIMA when the sample size is small.
DBHC Sequence Clustering with Discrete-Output HMMs
Provides an implementation of a mixture of hidden Markov models (HMMs) for discrete sequence data in the Discrete Bayesian HMM Clustering (DBHC) algorithm. The DBHC algorithm is an HMM Clustering algorithm that finds a mixture of discrete-output HMMs while using heuristics based on Bayesian Information Criterion (BIC) to search for the optimal number of HMM states and the optimal number of clusters.
dbparser DrugBank’ Database XML Parser
This tool is for parsing the ‘DrugBank’ XML database <http://…/>. The parsed data are then returned in a proper ‘R’ dataframe with the ability to save them in a given database.
dbplyr A ‘dplyr’ Back End for Databases
A ‘dplyr’ back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a ‘DBI’ back end; more advanced features require ‘SQL’ translation to be provided by the package author.
dbscan Density Based Clustering of Applications with Noise (DBSCAN)
A fast reimplementation of the DBSCAN clustering algorithm using the kd-tree data structure for speedup.
dbx A Fast, Easy-to-Use Database Interface
Provides select, insert, update, upsert, and delete database operations. Supports ‘PostgreSQL’, ‘MySQL’, ‘SQLite’, and more, and plays nicely with the ‘DBI’ package.
dc3net Inferring Condition-Specific Networks via Differential Network Inference
Performs differential network analysis to infer disease specific gene networks.
DCA Dynamic Correlation Analysis for High Dimensional Data
Finding dominant latent signals that regulate dynamic correlation between many pairs of variables.
DCEM Clustering for Multivariate and Univariate Data Using Expectation Maximization Algorithm
Implements the Expectation Maximisation (EM) algorithm for clustering finite gaussian mixture models for both multivariate and univariate datasets. The initialization is done by randomly selecting the samples from the dataset as the mean of the Gaussian(s). Future versions will improve the parameter initialization and execution on big datasets. The algorithm returns a set of Gaussian parameters-posterior probabilities, mean, co-variance matrices (multivariate data)/standard-deviation (for univariate datasets) and priors. Reference: Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic (2016) <doi:10.1007/s41060-017-0062-1>. This work is partially supported by NCI Grant 1R01CA213466-01.
DChaos Chaotic Time Series Analysis
Provides several algorithms for the purpose of detecting chaotic signals inside univariate time series. We focus on methods derived from chaos theory which estimate the complexity of a dataset through exploring the structure of the attractor. We have taken into account the Lyapunov exponents as an ergodic measure. We have implemented the Jacobian method by a fit through neural networks in order to estimate both the largest and the spectrum of Lyapunov exponents. We have considered the full sample and three different methods of subsampling by blocks (non-overlapping, equally spaced and bootstrap) to estimate them. In addition, it is possible to make inference about them and know if the estimated Lyapunov exponents values are or not statistically significant. This library can be used with time series whose time-lapse is fixed or variable. That is, it considers time series whose observations are sampled at fixed or variable time intervals. For a review see David Ruelle and Floris Takens (1971) <doi:10.1007/BF01646553>, Ramazan Gencay and W. Davis Dechert (1992) <doi:10.1016/0167-2789(92)90210-E>, Jean-Pierre Eckmann and David Ruelle (1995) <doi:10.1103/RevModPhys.57.617>, Mototsugu Shintani and Oliver Linton (2004) <doi:10.1016/S0304-4076(03)00205-7>, Jeremy P. Huke and David S. Broomhead (2007) <doi:10.1088/0951-7715/20/9/011>.
DClusterm Model-Based Detection of Disease Clusters
Model-based methods for the detection of disease clusters using GLMs, GLMMs and zero-inflated models.
DCM Data Converter Module
Data Converter Module (DCM) converts the dataset format from split into stack and to the reverse.
dcminfo Information Matrix for Diagnostic Classification Models
A set of asymptotic methods that can be used to directly estimate the expected (Fisher) information matrix by Liu, Tian, and Xin (2016) <doi:10.3102/1076998615621293> in diagnostic classification models or cognitive diagnostic models are provided when marginal maximum likelihood estimation is used. For these methods, both the item and structural model parameters are considered simultaneously. Specifically, the observed information matrix, the empirical cross-product information matrix and the sandwich-type co-variance matrix that can be used to estimate the asymptotic co-variance matrix (or the model parameter standard errors) within the context of diagnostic classification models are provided.
dcmodify Modify Data Using Externally Defined Modification Rules
Data cleaning scripts typically contain a lot of ‘if this change that’ type of statements. Such statements are typically condensed expert knowledge. With this package, such ‘data modifying rules’ are taken out of the code and become in stead parameters to the work flow. This allows one to maintain, document, and reason about data modification rules separately from the workflow.
dCovTS Distance Covariance and Correlation for Time Series Analysis
Computing and plotting the distance covariance and correlation function of a univariate or a multivariate time series. Test statistics for testing pairwise independence are also implemented. Some data sets are also included.
dcurver Utility Functions for Davidian Curves
A Davidian curve defines a seminonparametric density, whose flexibility can be tuned by a parameter. Since a special case of a Davidian curve is the standard normal density, Davidian curves can be used for relaxing normality assumption in statistical applications (Zhang & Davidian, 2001) <doi:10.1111/j.0006-341X.2001.00795.x>. This package provides the density function, the gradient of the loglikelihood and a random generator for Davidian curves.
DDM Death Registration Coverage Estimation
A set of three two-census methods to the estimate the degree of death registration coverage for a population. Implemented methods include the Generalized Growth Balance method (GGB), the Synthetic Extinct Generation method (SEG), and a hybrid of the two, GGB-SEG. Each method offers automatic estimation, but users may also specify exact parameters or use a graphical interface to guess parameters in the traditional way if desired.
DDoutlier Distance & Density-Based Outlier Detection
Outlier detection in multidimensional domains. Implementation of notable distance and density-based outlier algorithms. Allows users to identify local outliers by comparing observations to their nearest neighbors, reverse nearest neighbors, shared neighbors or natural neighbors. For distance-based approaches, see Knorr, M., & Ng, R. T. (1997) <doi:10.1145/782010.782021>, Angiulli, F., & Pizzuti, C. (2002) <doi:10.1007/3-540-45681-3_2>, Hautamaki, V., & Ismo, K. (2004) <doi:10.1109/ICPR.2004.1334558> and Zhang, K., Hutter, M. & Jin, H. (2009) <doi:10.1007/978-3-642-01307-2_84>. For density-based approaches, see Tang, J., Chen, Z., Fu, A. W. C., & Cheung, D. W. (2002) <doi:10.1007/3-540-47887-6_53>, Jin, W., Tung, A. K. H., Han, J., & Wang, W. (2006) <doi:10.1007/11731139_68>, Schubert, E., Zimek, A. & Kriegel, H-P. (2014) <doi:10.1137/1.9781611973440.63>, Latecki, L., Lazarevic, A. & Prokrajac, D. (2007) <doi:10.1007/978-3-540-73499-4_6>, Papadimitriou, S., Gibbons, P. B., & Faloutsos, C. (2003) <doi:10.1109/ICDE.2003.1260802>, Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000) <doi:10.1145/342009.335388>, Kriegel, H.-P., Kröger, P., Schubert, E., & Zimek, A. (2009) <doi:10.1145/1645953.1646195>, Zhu, Q., Feng, Ji. & Huang, J. (2016) <doi:10.1016/j.patrec.2016.05.007>, Huang, J., Zhu, Q., Yang, L. & Feng, J. (2015) <doi:10.1016/j.knosys.2015.10.014>, Tang, B. & Haibo, He. (2017) <doi:10.1016/j.neucom.2017.02.039> and Gao, J., Hu, W., Zhang, X. & Wu, Ou. (2011) <doi:10.1007/978-3-642-20847-8_23>.
ddpcr Analysis and Visualization of Droplet Digital PCR in R and on the Web
An interface to explore, analyze, and visualize droplet digital PCR (ddPCR) data in R. This is the first non-proprietary software for analyzing duplex ddPCR data. An interactive tool was also created and is available online to facilitate this analysis for anyone who is not comfortable with using R.
DDPGPSurv DDP-GP Survival Analysis
A nonparametric Bayesian approach to survival analysis. The functions perform inference via MCMC simulations from the posterior distributions for a Dependent Dirichlet Process-Gaussian Process prior. To maximize computational efficiency, some of the computations are performed in ‘Rcpp’.
ddR Distributed Data Structures in R
Provides distributed data structures and simplifies distributed computing in R.
DDRTree Learning Principal Graphs with DDRTree
Project data into a reduced dimensional space and construct a principal graph from the reduced dimension.
ddsPLS Multi-Data-Driven Sparse PLS Robust to Missing Samples
Allows to build Multi-Data-Driven Sparse PLS models. Multi-blocks with high-dimensional settings are particularly sensible to this.
deadband Statistical Deadband Algorithms Comparison
Statistical deadband algorithms are based on the Send-On-Delta concept as in Miskowicz(2006,<doi:10.3390/s6010049>). A collection of functions compare effectiveness and fidelity of sampled signals using statistical deadband algorithms.
deal Learning Bayesian Networks with Mixed Variables
Bayesian networks with continuous and/or discrete variables can be learned and compared from data. The method is described in Boettcher and Dethlefsen (2003), <doi:10.18637/jss.v008.i20>.
debugme Debug R Packages
Specify debug messages as special string constants, and control debugging of packages via environment variables.
debugr Debug Tool to Watch Objects/Expressions While Running an R Script
Tool to print out the value of R objects/expressions while running an R script. Outputs can be made dependent on user-defined conditions/criteria. Debug messages only appear when a global option for debugging is set. This way, ‘debugr’ code can even remain in the debugged code for later use without any negative effects during normal runtime.
decido Bindings for ‘Mapbox’ Ear Cutting Triangulation Library
Provides constrained triangulation of polygons. Ear cutting (or ear clipping) applies constrained triangulation by successively ‘cutting’ triangles from a polygon defined by path/s. Holes are supported by introducing a bridge segment between polygon paths. This package wraps the ‘header-only’ library ‘earcut.hpp’ <https://…/earcut.hpp.git> which includes a reference to the method used by Held, M. (2001) <doi:10.1007/s00453-001-0028-4>.
decision Statistical Decision Analysis
Contains a function called dmur() which accepts four parameters like possible values, probabilities of the values, selling cost and preparation cost. The dmur() function generates various numeric decision parameters like MEMV (Maximum (optimum) expected monitory value), best choice, EPPI (Expected profit with perfect information), EVPI (Expected value of the perfect information), EOL (Expected opportunity loss), which facilitate effective decision-making.
DecisionCurve Calculate and Plot Decision Curves
Decision curves are a useful tool to evaluate the population impact of adopting a risk prediction instrument into clinical practice. Given one or more instruments (risk models) that predict the probability of a binary outcome, this package calculates and plots decision curves, which display estimates of the standardized net benefit by the probability threshold used to categorize observations as ‘high risk.’ Curves can be estimated using data from an observational cohort, or from case-control studies when an estimate of the population outcome prevalence is available. Confidence intervals calculated using the bootstrap can be displayed and a wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided.
decisionSupport Quantitative Support of Decision Making under Uncertainty
Supporting the quantitative analysis of binary welfare based decision making processes using Monte Carlo simulations. Decision support is given on two levels: (i) The actual decision level is to choose between two alternatives under probabilistic uncertainty. This package calculates the optimal decision based on maximizing expected welfare. (ii) The meta decision level is to allocate resources to reduce the uncertainty in the underlying decision problem, i.e to increase the current information to improve the actual decision making process. This problem is dealt with using the Value of Information Analysis. The Expected Value of Information for arbitrary prospective estimates can be calculated as well as Individual and Clustered Expected Value of Perfect Information. The probabilistic calculations are done via Monte Carlo simulations. This Monte Carlo functionality can be used on its own.
DeclareDesign Declare and Diagnose Research Designs
Researchers can characterize and learn about the properties of research designs before implementation using `DeclareDesign`. Ex ante declaration and diagnosis of designs can help researchers clarify the strengths and limitations of their designs and to improve their properties, and can help readers evaluate a research strategy prior to implementation and without access to results. It can also make it easier for designs to be shared, replicated, and critiqued.
decoder Decode Coded Variables to Plain Text (and Vice Versa)
Main function ‘decode’ is used to decode coded key values to plain text. Function ‘code” can be used to code plain text to code if there is a 1:1 relation between the two. The concept relies on ‘keyvalue’ objects used for translation. There are several ‘keyvalue” objects included in the areas of geographical regional codes, administrative health care unit codes, diagnosis codes et cetera but it is also easy to extend the use by arbitrary code sets.
decomposedPSF Time Series Prediction with PSF and Decomposition Methods (EMD and EEMD)
Predict future values with hybrid combinations of Pattern Sequence based Forecasting (PSF), Autoregressive Integrated Moving Average (ARIMA), Empirical Mode Decomposition (EMD) and Ensemble Empirical Mode Decomposition (EEMD) methods based hybrid methods.
deconvolveR Empirical Bayes Estimation Strategies
Empirical Bayes methods for learning prior distributions from data. An unknown prior distribution (g) has yielded (unobservable) parameters, each of which produces a data point from a parametric exponential family (f). The goal is to estimate the unknown prior (‘g-modeling’) by deconvolution and Empirical Bayes methods.
DecorateR Fit and Deploy DECORATE Trees
DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) builds an ensemble of J48 trees by recursively adding artificial samples of the training data (‘Melville, P., & Mooney, R. J. (2005). Creating diversity in ensembles using artificial data. Information Fusion, 6(1), 99-111. <doi:10.1016/j.inffus.2004.04.001>’).
deductive Data Correction and Imputation Using Deductive Methods
Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data.
deepboost Deep Boosting Ensemble Modeling
Provides deep boosting models training, evaluation, predicting and hyper parameter optimising using grid search and cross validation. Based on Google’s Deep Boosting algorithm, and Google’s C++ implementation. Cortes, C., Mohri, M., & Syed, U. (2014) <URL: http://…/icml2014c2_cortesb14>.
deeplearning An Implementation of Deep Neural Network for Regression and Classification
An implementation of deep neural network with rectifier linear units trained with stochastic gradient descent method and batch normalization. A combination of these methods have achieved state-of-the-art performance in ImageNet classification by overcoming the gradient saturation problem experienced by many deep architecture neural network models in the past. In addition, batch normalization and dropout are implemented as a means of regularization. The deeplearning package is inspired by the darch package and uses its class DArch.
<a href="deeplr Interface to the ‘DeepL’ Translation API
A wrapper for the ‘DeepL’ API (see <https://…/translator> ), a web service that translates texts between different languages. Access to the API is subject to a monthly fee.
deepnet deep learning toolkit in R
Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
deepNN Deep Learning
Implementation of some Deep Learning methods. Includes multilayer perceptron, different activation functions, regularisation strategies, stochastic gradient descent and dropout. Thanks go to the following references for helping to inspire and develop the package: Ian Goodfellow, Yoshua Bengio, Aaron Courville, Francis Bach (2016, ISBN:978-0262035613) Deep Learning. Terrence J. Sejnowski (2018, ISBN:978-0262038034) The Deep Learning Revolution. Grant Sanderson (3brown1blue) <https://…st=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi> Neural Networks YouTube playlist. Michael A. Nielsen <http://…/> Neural Networks and Deep Learning.
default Change the Default Arguments in R Functions
A simple syntax to change the default values for function arguments, whether they are in packages or defined locally.
define Create FDA-Style Data and Program Definitions
Creates a directory of archived files with a descriptive ‘PDF’ document at the root level (i.e. ‘define.pdf’) containing tables of definitions of data items and relative-path hyperlinks to the documented files. Converts file extensions to ‘txt’ per FDA expectations and converts ‘CSV’ files to ‘SAS’ Transport format. Relies on data item descriptors stored as per R package ‘spec’. See ‘package?define’. See also ‘?define’. Requires a compatible installation of ‘pdflatex’, e.g. <https://…/>.
deformula Integration of One-Dimensional Functions with Double Exponential Formulas
Numerical quadrature of functions of one variable over a finite or infinite interval with double exponential formulas.
dejaVu Multiple Imputation for Recurrent Events
Performs reference based multiple imputation of recurrent event data based on a negative binomial regression model, as described by Keene et al (2014) <doi:10.1002/pst.1624>.
DelayedEffect.Design Sample Size and Power Calculations using the APPLE and SEPPLE Methods
Provides sample size and power calculations when the treatment time-lag effect is present and the lag duration is homogeneous across the individual subject. The methods used are described in Xu, Z., Zhen, B., Park, Y., & Zhu, B. (2017) <doi:10.1002/sim.7157>.
DeLorean Estimates Pseudotimes for Single Cell Expression Data
Implements the DeLorean model (Reid & Wernisch (2016) <doi:10.1093/bioinformatics/btw372>) to estimate pseudotimes for single cell expression data. The DeLorean model uses a Gaussian process latent variable model to model uncertainty in the capture time of cross-sectional data.
delt Estimation of Multivariate Densities Using Adaptive Partitions
We implement methods for estimating multivariate densities. We include a discretized kernel estimator, an adaptive histogram (a greedy histogram and a CART-histogram), stagewise minimization, and bootstrap aggregation.
deming Deming, Thiel-Sen and Passing-Bablock Regression
Generalized Deming regression, Theil-Sen regression and Passing-Bablock regression functions.
demu Optimal Design Emulators via Point Processes
Implements the Determinantal point process (DPP) based optimal design emulator described in Pratola, Lin and Craigmile (2018) <arXiv:1804.02089> for Gaussian process regression models. See <http://…/software> for more information and examples.
dendextend Extending R’s Dendrogram Functionality
Offers a set of functions for extending dendrogram objects in R, letting you visualize and compare trees of hierarchical clusterings. You can (1) Adjust a trees graphical parameters – the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different dendrograms to one another.
denoiSeq Differential Expression Analysis Using a Bottom-Up Model
Given count data from two conditions, it determines which transcripts are differentially expressed across the two conditions using Bayesian inference of the parameters of a bottom-up model for PCR amplification. This model is developed in Ndifon Wilfred, Hilah Gal, Eric Shifrut, Rina Aharoni, Nissan Yissachar, Nir Waysbort, Shlomit Reich Zeliger, Ruth Arnon, and Nir Friedman (2012), <http://…/15865.full>, and results in a distribution for the counts that is a superposition of the binomial and negative binomial distribution.
denoiseR Regularized low rank matrix estimation
Regularized low rank matrix estimation
denseFLMM Functional Linear Mixed Models for Densely Sampled Data
Estimation of functional linear mixed models for densely sampled data based on functional principal component analysis.
densityClust Clustering by fast search and find of density peaks
An implementation of the clustering algorithm described by Alex Rodriguez and Alessandro Laio (Science, 2014 vol. 344), along with tools to inspect and visualize the results.
DensParcorr Dens-Based Method for Partial Correlation Estimation in Large Scale Brain Networks
Provide a Dens-based method for estimating functional connection in large scale brain networks using partial correlation.
densratio Density Ratio Estimation
Density ratio estimation. The estimated density ratio function can be used in many applications such as the inlier-based outlier detection, covariate shift adaptation and etc.
DEoptim Global Optimization by Differential Evolution
Implements the differential evolution algorithm for global optimization of a real-valued function of a real-valued parameter vector.
Dependency Logo
Plots dependency logos from a set of input sequences.
depmixS4 Dependent Mixture Models – Hidden Markov Models of GLMs and Other Distributions in S4
Fit latent (hidden) Markov models on mixed categorical and continuous (time series) data, otherwise known as dependent mixture models
depth.plot Multivariate Analogy of Quantiles
Could be used to obtain spatial depths, spatial ranks and outliers of multivariate random variables. Could also be used to visualize DD-plots (a multivariate generalization of QQ-plots).
dequer An R ‘Deque’ Container
Offers a special data structure called a ‘deque’ (pronounced like ‘deck’), which is a list-like structure. However, unlike R’s list structure, data put into a ‘deque’ is not necessarily stored contiguously, making insertions and deletions at the front/end of the structure much faster. The implementation here is new and uses a doubly linked list, and whence does not rely on R’s environments. To avoid unnecessary data copying, most ‘deque’ operations are perform