eXtraction of ENTity (x.ent)
Provides a tool for extracting information (entities and relations between them) in text datasets. It also emphasizes the results exploration with graphical displays. It is a rule-based system and works with hand-made dictionaries and local grammars defined by users. X.ent uses parsing with Perl functions and JavaScript to define user preferences through a browser and R to display and support analysis of the results extracted. Local grammars are defined and compiled with the tool Unitex, a tool developed by University Paris Est that supports multiple languages. See ?xconfig for an introduction.
Protocol Inspection and State Machine Analysis (PRISMA)
The PRISMA package is capable of loading and processing huge text corpora processed with the sally toolbox (http://www.mlsec.org/sally/). sally acts as a very fast preprocessor which splits the text files into tokens or n-grams. These output files can then be read with the PRISMA package which applies testing-based token selection and has some replicate-aware, highly tuned non-negative matrix factorization and principal component analysis implementation which allows the processing of very big data sets even on desktop machines.
Regularized Categorical Effects/Categorical Effect Modifiers/Continuous/Smooth Effects in GLMs (gvcm.cat)
Generalized structured regression models with regularized categorical effects, categorical effect modifiers, continuous effects and smooth effects.