Tools for Preprocessing Combinations (preprocomb)
Preprocessing is often the most time-consuming phase in knowledge discovery and preprocessing transformations interdependent in unexpected ways. This package helps to make preprocessing faster and more effective. It provides an S4 framework for creating and testing preprocessing combinations for classification, clustering and outlier detection. The framework supports user-defined and domain-specific preprocessors and preprocessing phases. Default preprocessors can be used for low variance removal, missing value imputation, scaling, outlier removal, noise smoothing, feature selection and class imbalance correction.
Sequential Input Selection Algorithm (sisal)
Implements the SISAL algorithm by Tikka and Hollmén. It is a sequential backward selection algorithm which uses a linear model in a cross-validation setting. Starting from the full model, one variable at a time is removed based on the regression coefficients. From this set of models, a parsimonious (sparse) model is found by choosing the model with the smallest number of variables among those models where the validation error is smaller than a threshold. Also implements extensions which explore larger parts of the search space and/or use ridge regression instead of ordinary least squares.
Gaussian Mixture Models (GMM) (AdaptGauss)
Multimodal distributions can be modelled as a mixture of components. The model is derived using the Pareto Density Estimation (PDE) for an estimation of the pdf. PDE has been designed in particular to identify groups/classes in a dataset. Precise limits for the classes can be calculated using the theorem of Bayes. Verification of the model is possible by QQ plot and Chi-squared test.
Sparse Learning Algorithms Using a LASSO-Type Penalty for Coefficient Estimation and Model Prediction (SparseLearner)
Performs the LASSO-type sparse learning algorithm and its improved versions such as Bolasso, bootstrap ranking LASSO, two-stage hybrid LASSO and so on for coefficient estimation and model prediction. These estimation procedures are applied in the fields of variable selection, graphical modeling and model prediction.
Import Multiple File Types (DataLoader)
Functions to import multiple files of multiple data file types (‘.xlsx’, ‘.xls’, ‘.csv’, ‘.txt’) from a given directory into R data frames.