Extensible Package for Cross-Validation-Based Integration of Base Learners (EnsembleCV)
This package extends the base classes and methods of EnsembleBase package for cross-validation-based integration of base learners. Default implementation calculates average of repeated CV errors, and selects the base learner / configuration with minimum average error. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. The package can be extended, e.g. by adding variants of the current implementation.
n-gram Text Regression, aka Concise Comparative Summarization (textreg)
Function for sparse regression on raw text, regressing a labeling vector onto a feature space consisting of all possible phrases.
Sample Size Determination (SSD) for Unordered Categorical Data (ssd)
ssd calculates the sample size needed to detect the differences between two sets of unordered categorical data.
Seeding the Default RNG with a Numeric Vector (rngSetSeed)
A function setVectorSeed() is provided. Its argument is a numeric vector of an arbitrary nonzero length, whose components have integer values from [0, 2^32-1]. The input vector is transformed using AES (Advanced Encryption Standard) algorithm into an initial state of Mersenne-Twister random number generator. The function provides a better alternative to the R base function set.seed(), if the input vector is a single integer. Initializing a stream of random numbers with a vector is a convenient way to obtain several streams, each of which is identified by several integer indices.
Expander Functions for Generating Full Gradient and Hessian from Single- and Multi-Slot Base Distributions (RegressionFactory)
The expander functions rely on the mathematics developed for the Hessian-definiteness invariance theorem for linear projection transformations of variables, described in authors’ paper, to generate the full, high-dimensional gradient and Hessian from the lower-dimensional derivative objects. This greatly relieves the computational burden of generating the regression-function derivatives, which in turn can be fed into any optimization routine that utilizes such derivatives. The theorem guarantees that Hessian definiteness is preserved, meaning that reasoning about this property can be performed in the low-dimensional space of the base distribution. This is often a much easier task than its equivalent in the full, high-dimensional space. Definiteness of Hessian can be useful in selecting optimization/sampling algorithms such as Newton-Raphson optimization or its sampling equivalent, the Stochastic Newton Sampler. Finally, in addition to being a computational tool, the regression expansion framework is of conceptual value by offering new opportunities to generate novel regression problems.