* Doubly Truncated Data Analysis, Non Iterative* (

**DTDA.ni**)

Non-iterative estimator for the cumulative distribution of a doubly truncated variable. de Uña-Álvarez J. (2018) <doi:10.1007/978-3-319-73848-2_37>.

*(*

**Optimal Stratification of Univariate Populations****stratifyR**)

This implements the stratification of univariate populations under stratified sampling designs using the method of Khan et al. (2002) <doi:10.1177/0008068320020518>, Khan et al. (2008) (<http://…/10761-eng.pdf> ) and Khan et al. (2015) <doi:10.1080/02664763.2015.1018674>. It determines the Optimum Strata Boundaries (OSB) and Optimum Sample Sizes (OSS) for the study variable, y, using the best-fit frequency distribution of a survey variable (if data is available) or a hypothetical distribution (if data is not available). The method formulates the problem of determining the OSB as mathematical programming problem which is solved by using a dynamic programming technique. If a dataset of the population is available to the surveyor, the method estimates its best-fit distribution and determines the OSB and OSS under Neyman allocation directly. When the dataset is not available, stratification is made based on the assumption that the values of the study variable, y, are available as hypothetical realizations of proxy values of y from recent surveys. Thus, it requires certain distributional assumptions about the study variable. At present, it handles stratification for the populations where the study variable follows a continuous distribution, namely, Pareto, Triangular, Right-triangular, Weibull, Gamma, Exponential, Uniform, Normal, Log-normal and Cauchy distributions.

*(*

**Vocabulary and Corpus Preprocessing for Natural Language Pipelines****mlvocab**)

Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.

*(*

**Binary Classification via GMDH-Type Neural Network Algorithm****GMDH2**)

Performs binary classification via Group Method of Data Handling (GMDH) – type neural network algorithm. Also, it produces a well-formatted table of descriptives for a binary response. Moreover, it returns confusion matrix and related statistics and scatter plot with classification labels of binary classes to assess the prediction performance. All ‘GMDH2’ functions are designed for a binary response. See Dag O. and Yozgatligil C. (2016, ISSN:2073-4859) and Kondo T. and Ueno J. (2016, ISSN:1349-4198) for the details of GMDH algorithms.

*(*

**Automated Fitting of Moderated Nonlinear Factor Analysis Through the ‘Mplus’ Program****aMNLFA**)

Automated generation, running, and interpretation of moderated nonlinear factor analysis models for obtaining scores from observed variables. This package creates ‘Mplus’ input files which may be run iteratively to test two different types of covariate effects on items: (1) latent variable impact (both mean and variance); and (2) differential item functioning. After sequentially testing for all effects, it also creates a final model by including all significant effects after adjusting for multiple comparisons. Finally, the package creates a scoring model which uses the final values of parameter estimates to generate latent variable scores.