Find the Modes and Assess the Modality of Complex and Mixture Distributions, Especially with Big Datasets (modes)
Designed with a dual purpose of accurately estimating the mode (or modes) as well as characterizing the modality of data. The specific application area includes complex or mixture distributions particularly in a big data environment. The heterogeneous nature of (big) data may require deep introspective statistical and machine learning techniques, but these statistical tools often fail when applied without first understanding the data. In small datasets, this often isn’t a big issue, but when dealing with large scale data analysis or big data thoroughly inspecting each dimension typically yields an O(n^n-1) problem. As such, dealing with big data require an alternative toolkit. This package not only identifies the mode or modes for various data types, it also provides a programmatic way of understanding the modality (i.e. unimodal, bimodal, etc.) of a dataset (whether it’s big data or not). See <http://…/modes_package> for examples and discussion.
Causal Inference in the Presence of Treatment Noncompliance Under the Binary Instrumental Variable Model (noncompliance)
A finite-population significance test of the ‘sharp’ causal null hypothesis that treatment exposure X has no effect on final outcome Y, within the principal stratum of Compliers. A generalized likelihood ratio test statistic is used, and the resulting p-value is exact. Currently, it is assumed that there are only Compliers and Never Takers in the population.
Programmatic Conversion of PDF Tables (pdftables)
Allows the user to convert PDF tables to formats more amenable to analysis (‘.csv’, ‘.xml’, or ‘.xlsx’) by wrapping the PDFTables API. In order to use the package, the user needs to sign up for an API account on the PDFTables website (<https://…/pdf-to-excel-api>). The package works by taking a PDF file as input, uploading it to PDFTables, and returning a file with the extracted data.