• Home
  • About
  • Books
  • Courses
  • Documents
  • eBooks
  • Feeds
  • Images
  • Quotes
  • R Packages
  • What is …

AnalytiXon

~ Broaden your Horizon

Category Archives: R Packages

R Packages worth a look

13 Wednesday Nov 2019

Posted by Michael Laux in R Packages

≈ 2 Comments

Methods for Identification of Outliers in Environmental Data (envoutliers)
Three semi-parametric methods for detection of outliers in environmental data based on kernel regression and subsequent analysis of smoothing residuals. The first method (Campulova, Michalek, Mikuska and Bokal (2018) <DOI: 10.1002/cem.2997>) analyzes the residuals using changepoint analysis, the second method is based on control charts (Campulova, Veselik and Michalek (2017) <DOI: 10.1016/j.apr.2017.01.004>) and the third method (Holesovsky, Campulova and Michalek (2018) <DOI: 10.1016/j.apr.2017.06.005>) analyzes the residuals using extreme value theory (Holesovsky, Campulova and Michalek (2018) <DOI: 10.1016/j.apr.2017.06.005>).

Balanced Factorial Designs (designr)
Generate balanced factorial designs with crossed and nested random and fixed effects <https://…/designr>.

Piecewise Smooth Regression by Bootstrapped Binary Segmentation (BinSegBstrap)
Provides methods for piecewise smooth regression. A piecewise smooth signal is estimated by applying a bootstrapped test recursively (binary segmentation approach). Each bootstrapped test decides whether the underlying signal is smooth on the currently considered subsegment or contains at least one further change-point.

Create Regular Expressions Easily (RVerbalExpressions)
Build regular expressions using grammar and functionality inspired by <https://…/VerbalExpressions>. Usage of the %>% is encouraged to build expressions in a chain-like fashion.

R Packages worth a look

11 Monday Nov 2019

Posted by Michael Laux in R Packages

≈ Leave a comment

Access Google Sheets using the Sheets API V4 (googlesheets4)
Interact with Google Sheets through the Sheets API v4 <https://…/api>. ‘API’ is an acronym for ‘application programming interface’; the Sheets API allows users to interact with Google Sheets programmatically, instead of via a web browser. The ‘v4’ refers to the fact that the Sheets API is currently at version 4. This package helps the user to retrieve Sheet metadata and to read data out of specific worksheets or ranges into an R object, such as a data frame.

Work with Custom Distribution Functions (pdqr)
Create, transform, and summarize custom random variables with distribution functions (analogues of ‘p*()’, ‘d*()’, ‘q*()’, and ‘r*()’ functions from base R). Two types of distributions are supported: ‘discrete’ (random variable has finite number of output values) and ‘continuous’ (infinite number of values in the form of continuous random variable). Functions for distribution transformations and summaries are available. Implemented approaches often emphasize approximate and numerical solutions: all distributions assume finite support and finite values of density function; some methods implemented with simulation techniques.

Creating Contact and Social Networks (contact)
Process spatially- and temporally-discrete data into contact and social networks, and facilitate network analysis by randomizing individuals’ movement paths and/or related categorical variables. To use this package, users need only have a dataset containing spatial data (i.e., latitude/longitude, or planar x & y coordinates), individual IDs relating spatial data to specific individuals, and date/time information relating spatial locations to temporal locations. The functionality of this package ranges from data ‘cleaning’ via multiple filtration functions, to spatial and temporal data interpolation, and network creation and analysis. Functions within this package are not limited to describing interpersonal contacts. Package functions can also identify and quantify ‘contacts’ between individuals and fixed areas (e.g., home ranges, water bodies, buildings, etc.). As such, this package is an incredibly useful resource for facilitating epidemiological, ecological, ethological and sociological research.

Higher Criticism Test of Two Frequency Counts Tables (TableHC)
Higher Criticism (HC) test between two frequency tables. Test is based on an adaptation of the Tukey-Donoho-Jin HC statistic to testing frequency tables described in Kipnis (2019) <arXiv:1911.01208>.

R Packages worth a look

10 Sunday Nov 2019

Posted by Michael Laux in R Packages

≈ Leave a comment

Robust Bootstrap Forecast Densities for GARCH Models (RobGARCHBoot)
Bootstrap forecast densities for GARCH (Generalized Autoregressive Conditional Heteroskedastic) returns and volatilities using the robust residual-based bootstrap procedure of Trucios, Hotta and Ruiz (2017) <DOI:10.1080/00949655.2017.1359601>.

Easy Manipulation of Out of Memory Data Sets (hdd)
Hard drive data: Class of data allowing the easy importation/manipulation of out of memory data sets. The data sets are located on disk but look like in-memory, the syntax for manipulation is similar to ‘data.table’. Operations are performed ‘chunk-wise’ behind the scene.

Spatial Navigation Strategy Analysis (Rtrack)
A toolkit for the analysis of paths from spatial tracking experiments (such as the Morris water maze) and calculation of goal-finding strategies. This package is centered on an approach using machine learning for path classification.

Graphical Toolbox for Clustering and Classification of Data Frames (RclusTool)
Graphical toolbox for clustering and classification of data frames. It proposes a graphical interface to process clustering and classification methods on features data-frames, and to view initial data as well as resulted cluster or classes. According to the level of available labels, different approaches are proposed: unsupervised clustering, semi-supervised clustering and supervised classification. To assess the processed clusters or classes, the toolbox can import and show some supplementary data formats: either profile/time series, or images. These added information can help the expert to label clusters (clustering), or to constrain data frame rows (semi-supervised clustering), using Constrained spectral embedding algorithm by Wacquet et al. (2013) <doi:10.1016/j.patrec.2013.02.003> and the methodology provided by Wacquet et al. (2013) <doi:10.1007/978-3-642-35638-4_21>.

R Packages worth a look

09 Saturday Nov 2019

Posted by Michael Laux in R Packages

≈ Leave a comment

Estimates of Standard Errors for Risk and Performance Measures (RPESE)
Estimates of standard errors of popular risk and performance measures for asset or portfolio returns using methods as described in Chen and Martin (2019) <https://ssrn.com/abstract=3085672>.

A Dipping Sauce for Data Analysis and Visualizations (dipsaus)
Works as ‘add-ons’ to packages like ‘shiny’, ‘future’, as well as ‘rlang’, and provides utility functions. Just like dipping sauce adding flavors to potato chips or pita bread, ‘dipsaus’ for data analysis and visualizations adds handy functions and enhancements to popular packages. The goal is to provide simple solutions that are frequently asked for online, such as how to synchronize ‘shiny’ inputs without freezing the app, or how to get memory size on ‘Linux’ or ‘MacOS’ system. The enhancements roughly fall into these four categories: 1. ‘shiny’ input widgets; 2. high-performance computing using ‘RcppParallel’ and ‘future’ package; 3. modify R calls and convert among numbers, strings, and other objects. 4. utility functions to get system information such like CPU chipset, memory limit, etc.

Uncertainties of Climate Projections using Smoothing Splines (qualypsoss)
These functions use smoothing-splines, data augmentation and Bayesian techniques for the assessment of single-member and incomplete ensembles of climate projections. – Cheng, C.-I. and P. L. Speckman (2012) <doi:10.1016/j.csda.2012.05.020>. – Evin, G., B. Hingray, J. Blanchet, N. Eckert, S. Morin, and D. Verfaillie. (2019) <doi:10.1175/JCLI-D-18-0606.1>.

Identify Zero-Inflated Distributions (iZID)
Computes bootstrapped Monte Carlo estimate of p value of Kolmogorov-Smirnov (KS) test and likelihood ratio test for zero-inflated count data, based on the work of Aldirawi et al. (2019) <doi:10.1109/BHI.2019.8834661>. With the package, user can also find tools to simulate random deviates from zero inflated or hurdle models and obtain maximum likelihood estimate of unknown parameters in these models.

R Packages worth a look

08 Friday Nov 2019

Posted by Michael Laux in R Packages

≈ Leave a comment

Create 3D Barplots (barplot3d)
Creates 3D barplots. Includes a function for sequence context plots used in DNA sequencing analysis.

Distributional Differences Across Contingency Tables (DiffXTables)
Statistical methods for hypothesis testing of differences in the underlying distributions across two or more contingency tables. The package includes three tests: the comparative chi-squared test (Song et al, 2014) <doi:10.1093/nar/gku086> (Zhang et al, 2015) <doi:10.1093/nar/gkv358>, the Sharma-Song test, and the heterogeneity test. They all have an asymptotically chi-squared null distribution. These options allow one to test for patterns that differ in first and second orders, directly related to marginal and joint distributions of given contingency tables.

Probability Associator Time (PASS-T) (passt)
Simulates judgments of frequency and duration based on the Probability Associator Time (PASS-T) model. PASS-T is a memory model based on a simple competitive artificial neural network. It can imitate human judgments of frequency and duration, which have been extensively studied in cognitive psychology (e.g. Hintzman (1970) <doi:10.1037/h0028865>, Betsch et al. (2010) <https://…/2010-18204-003> ). The PASS-T model is an extension of the PASS model (Sedlmeier, 2002, ISBN:0198508638). The package provides an easy way to run simulations, which can then be compared with empirical data in human judgments of frequency and duration.

Obtaining and Estimating Unidimensional IRT Dual Models (InDisc)
Performs a unified approach for obtaining and estimating unidimensional Item Response Theory (IRT) Dual Models (DMs), proposed by Ferrando (2019 <doi:10.1177/0146621618817779>).

R Packages worth a look

08 Friday Nov 2019

Posted by Michael Laux in R Packages

≈ Leave a comment

Explore Probability Distributions for Bivariate Temporal Granularities (gravitas)
Provides tools for systematically exploring large quantities of temporal data across different temporal granularities (deconstructions of time) by visualizing probability distributions. ‘gravitas’ computes circular, aperiodic, single-order-up or multiple-order-up granularities and advises on which combinations of granularities to explore and through which distribution plots.

Bifactor Indices Calculator (BifactorIndicesCalculator)
The calculator computes bifactor indices such as explained common variance (ECV), hierarchical Omega (OmegaH), percentage of uncontaminated correlations (PUC), item explained common variance (I-ECV), and more. This package is an R version of the ‘Excel’ based ‘Bifactor Indices Calculator’ (Dueber, 2017) <doi:10.13023/edp.tool.01> with added convenience features for directly utilizing output from several programs that can fit confirmatory factor analysis or item response models.

Ridge-Type Penalized Estimation of a Potpourri of Models (porridge)
The name of the package is derived from the French, ‘pour’ ridge, and provides functionality for ridge-type estimation of a potpourri of models. Currently, this estimation concerns that of various Gaussian graphical models from different study designs. Among others it considers the regular Gaussian graphical model and a mixture of such models. The porridge-package implements the estimation of the former either from i) data with replicated observations by penalized loglikelihood maximization using the regular ridge penalty on the parameters (van Wieringen, Chen, 2019) or ii) from non-replicated data by means of the generalized ridge estimator that allows for both the inclusion of quantitative and qualitative prior information on the precision matrix via element-wise penalization and shrinkage (van Wieringen, 2019, <doi.org/10.1080/10618600.2019.1604374>). Additionally, the porridge-package facilitates the ridge penalized estimation of a mixture of Gaussian graphical models (Aflakparast et al., 2018, <doi.org/10.1002/bimj.201700102>).

Simulated Maximum Likelihood Estimation of Mixed Logit Models for Large Datasets (mixl)
Specification and estimation of multinomial logit models. Large datasets and complex models are supported, with an intuitive syntax. Multinomial Logit Models, Mixed models, random coefficients and Hybrid Choice are all supported. For more information, see Molloy et al. (2019) <doi:10.3929/ethz-b-000334289>.

R Packages worth a look

07 Thursday Nov 2019

Posted by Michael Laux in R Packages

≈ Leave a comment

sf’-Based Interface to the ‘HERE’ REST APIs (hereR)
Interface to the ‘HERE’ REST APIs <https://…/rest-apis>: (1) geocode addresses using the ‘Geocoder’ API; (2) routing directions, travel distance or time matrices and isolines using the ‘Routing’ API; (3) traffic flow and incident information from the ‘Traffic’ API; (4) weather forecasts, reports on current weather conditions, astronomical information and alerts at a specific location from the ‘Destination Weather’ API. Locations, routes and isolines are returned as ‘sf’ objects.

Random Projection Ensemble Clustering Algorithm (RPEClust)
Implements the methodology proposed by Anderlucci, Fortunato and Montanari (2019) <arXiv:1909.10832> for high-dimensional unsupervised classification. The random projection ensemble clustering algorithm applies a Gaussian Mixture Model to different random projections of the high-dimensional data and selects a subset of solutions accordingly to the Bayesian Information Criterion, computed here as discussed in Raftery and Dean (2006) <doi:10.1198/016214506000000113>. The clustering results obtained on the selected projections are then aggregated via consensus to derive the final partition.

Analyze Experimental High-Throughput (Omics) Data (wrMisc)
The efficient treatment and convenient analysis of experimental high-throughput (omics) data gets facilitated through this collection of diverse functions. Several functions address advanced object-conversions, like manipulating lists of lists or lists of arrays, reorganizing lists to arrays or into separate vectors, merging of multiple entries, etc. Another set of functions provides speed-optimized calculation of standard deviation (sd), coefficient of variance (CV) or standard error of the mean (SEM) for data in matrixes or means per line with respect to additional grouping (eg n groups of replicates). Other functions facilitate dealing with non-redundant information, by indexing unique, adding counters to redundant or eliminating lines with respect redundancy in a given reference-column, etc. Help is provided to identify very closely matching numeric values to generate (partial) distance matrixes for very big data in a memory efficient manner or to reduce the complexity of large data-sets by combining very close values. Many times large experimental datasets need some additional filtering, adequate functions are provided. Batch reading (or writing) of sets of files and combining data to arrays is supported, too. Convenient data normalization is supported in various different modes, parameter estimation via permutations or boot-strap as well as flexible testing of multiple pair-wise combinations using the framework of ‘limma’ is provided, too.

Functional Programming (funprog)
High-order functions for data manipulation : sort or group data, given one or more auxiliary functions. Functions are inspired by other pure functional programming languages (‘Haskell’ mainly). The package also provides built-in function operators for creating compact anonymous functions, as well as the possibility to use the ‘purrr’ package syntax.

R Packages worth a look

05 Tuesday Nov 2019

Posted by Michael Laux in R Packages

≈ Leave a comment

Low-Rank Methods for MVN and MVT Probabilities (tlrmvnmvt)
Implementation of the classic Genz algorithm and a novel tile-low-rank algorithm for computing relatively high-dimensional multivariate normal (MVN) and Student-t (MVT) probabilities. References used for this package: Foley, James, Andries van Dam, Steven Feiner, and John Hughes. ‘Computer Graphics: Principle and Practice’. Addison-Wesley Publishing Company. Reading, Massachusetts (1987, ISBN:0-201-84840-6 1); Genz, A., ‘Numerical computation of multivariate normal probabilities,’ Journal of Computational and Graphical Statistics, 1, 141-149 (1992) <doi:10.1080/10618600.1992.10477010>; Cao, J., Genton, M. G., Keyes, D. E., & Turkiyyah, G. M. ‘Exploiting Low Rank Covariance Structures for Computing High-Dimensional Normal and Student- t Probabilities’ (2019) <https://…/2019.CGKT.manuscript.pdf>.

Cramer von Mises Tests for Discrete or Grouped Distributions (cvmdisc)
Implements Cramer-von Mises Statistics for testing fit to (1) fully specified discrete distributions as described in Choulakian, Lockhart and Stephens (1994) <doi:10.2307/3315828> (2) discrete distributions with unknown parameters that must be estimated from the sample data, see Spinelli & Stephens (1997) <doi:10.2307/3315735> and Lockhart, Spinelli and Stephens (2007) <doi:10.1002/cjs.5550350111> (3) grouped continuous distributions with Unknown Parameters, see Spinelli (2001) <doi:10.2307/3316040>. Maximum likelihood estimation (MLE) is used to estimate the parameters. The package computes the Cramer-von Mises Statistics, Anderson-Darling Statistics and the Watson-Stephens Statistics and their p-values.

Hilbert Similarity Index for High Dimensional Data (hilbertSimilarity)
Quantifying similarity between high-dimensional single cell samples is challenging, and usually requires some simplifying hypothesis to be made. By transforming the high dimensional space into a high dimensional grid, the number of cells in each sub-space of the grid is characteristic of a given sample. Using a Hilbert curve each sample can be visualized as a simple density plot, and the distance between samples can be calculated from the distribution of cells using the Jensen-Shannon distance. Bins that correspond to significant differences between samples can identified using a simple bootstrap procedure.

Extra Binary Relational and Logical Operators (extraoperators)
Speed up common tasks, particularly logical or relational comparisons and routine follow up tasks such as finding the indices and subsetting. Inspired by mathematics, where something like: 3 < x < 6 is a standard, elegant and clear way to assert that x is both greater than 3 and less than 6 (see for example <https://…/Relational_operator> ), a chaining operator is implemented. The chaining operator, %c%, allows multiple relational operations to be used in quotes on the right hand side for the same object, on the left hand side. The %e% operator allows something like set-builder notation (see for example <https://…/Set-builder_notation> ) to be used on the right hand side. All operators have built in prefixes defined for all, subset, and which to reduce the amount of code needed for common tasks, such as return those values that are true.

R Packages worth a look

05 Tuesday Nov 2019

Posted by Michael Laux in R Packages

≈ Leave a comment

Generalized Tensor Regression with Covariates on Multiple Modes (tensorregress)
Implement the generalized tensor regression in Xu, Hu and Wang (2019) <arXiv:1910.09499>. Solve tensor-response regression given covariates on multiple modes with alternating updating algorithm.

R Wrappers for EXPOKIT; Other Matrix Functions (rexpokit)
Wraps some of the matrix exponentiation utilities from EXPOKIT (<http://…/> ), a FORTRAN library that is widely recommended for matrix exponentiation (Sidje RB, 1998. ‘Expokit: A Software Package for Computing Matrix Exponentials.’ ACM Trans. Math. Softw. 24(1): 130-156). EXPOKIT includes functions for exponentiating both small, dense matrices, and large, sparse matrices (in sparse matrices, most of the cells have value 0). Rapid matrix exponentiation is useful in phylogenetics when we have a large number of states (as we do when we are inferring the history of transitions between the possible geographic ranges of a species), but is probably useful in other ways as well.

Automatic Calculation of Effects for Piecewise Structural Equation Models (semEff)
Provides functionality to automatically calculate direct, indirect, and total effects from piecewise structural equation models, comprising lists of fitted models representing structured equations (Lefcheck 2016 <doi:10/f8s8rb>). Confidence intervals are provided via bootstrapping.

Cluster Circular Systematic Sampling (ccss)
Draws systematic samples from a population that follows linear trend. The function returns a matrix comprising of the required samples as its column vectors. The samples produced are highly efficient and the inter sampling variance is minimum. The scheme will be useful in various field like Bioinformatics where the samples are expensive and must be precise in reflecting the population by possessing least sampling variance.

R Packages worth a look

04 Monday Nov 2019

Posted by Michael Laux in R Packages

≈ Leave a comment

Methods for Transition Probabilities (presmTP)
Provides a function for estimating the transition probabilities in an illness-death model. The transition probabilities can be estimated from the unsmoothed landmark estimators developed by de Una-Alvarez and Meira-Machado (2015) <doi:10.1111/biom.12288>. Presmoothed estimates can also be obtained through the use of a parametric family of binary regression curves, such as logit, probit or cauchit. The additive logistic regression model and nonparametric regression are also alternatives which have been implemented. The idea behind the presmoothed landmark estimators is to use the presmoothing techniques developed by Cao et al. (2005) <doi:10.1007/s00180-007-0076-6> in the landmark estimation of the transition probabilities.

Alternating Direction Method of Multipliers to Solve Dense Dubmatrix Problem (admmDensestSubmatrix)
Solves the problem of identifying the densest submatrix in a given or sampled binary matrix, Bombina et al. (2019) <arXiv:1904.03272>.

Interface to ‘TensorFlow Probability’ (tfprobability)
Interface to ‘TensorFlow Probability’, a ‘Python’ library built on ‘TensorFlow’ that makes it easy to combine probabilistic models and deep learning on modern hardware (‘TPU’, ‘GPU’). ‘TensorFlow Probability’ includes a wide selection of probability distributions and bijectors, probabilistic layers, variational inference, Markov chain Monte Carlo, and optimizers such as Nelder-Mead, BFGS, and SGLD.

Cohort Data Analyses (cohorttools)
Functions to make lifetables and to calculate hazard function estimate using Poisson regression model with splines. Includes function to draw simple flowchart of cohort study. Function boxesLx() makes boxes of transition rates between states. It utilizes ‘Epi’ package ‘Lexis’ data.

Easily and Rapidly Generate Raster Image Data with Support for ‘Plotly.js’ (rasterly)
Easily and rapidly generate raster data in R, even for very large datasets, with an aesthetics-based mapping syntax that should be familiar to users of the ‘ggplot2’ package. While ‘rasterly’ does not attempt to reproduce the full functionality of the ‘Datashader’ graphics pipeline system for Python, the ‘rasterly’ API has several core elements in common with that software package.

← Older posts

Blogs by Category

  • arXiv
  • arXiv Papers
  • Blogs
  • Books
  • Causality
  • Distilled News
  • Documents
  • Ethics
  • Magister Dixit
  • Personal Productivity
  • Python Packages
  • R Packages
  • Uncategorized
  • What is …
  • WordPress

Blogs by Month

Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Follow AnalytiXon

Powered by WordPress.com.

 

Loading Comments...