10 Common NLP Terms Explained for the Text Mining Novice
If you’re relatively new to the Natural Language Processing and Text Mining world, you’ll more than likely have come across some pretty technical terms and acronyms, that are challenging to get your head around, especially, if you’re relying on scientific definitions for a plain and simple explanation.
Python NLTK Tools List for Natural Language Processing (NLP)
A collection of important python tools regarding natural language processing
Quality and correctness of classification models. Part 3 – Confusion Matrix
Confusion Matrix is an N x N matrix, in which rows correspond to correct decision classes and the columns to decisions made by the classifier. The number ni,j at the intersection of i-th row and j-th column is equal to the number of cases from the i-th class which have been classified as belonging to the j-th class.
About Feature Scaling and Normalization
and the effect of standardization for machine learning algorithms
10 Machine Learning Terms Explained in Simple English
If you’re relatively new to Machine Learning and it’s applications, you’ll more than likely have come across some pretty technical terms that are often difficult for the novice mathematician/scientist to get their head around.
‘Variable Importance Plot’ and Variable Selection
Classification trees are nice. They provide an interesting alternative to a logistic regression. I started to include them in my courses maybe 7 or 8 years ago. The question is nice (how to get an optimal partition), the algorithmic procedure is nice (the trick of splitting according to one variable, and only one, at each node, and then to move forward, never backward), and the visual output is just perfect (with that tree structure). But the prediction can be rather poor. The performance of that algorithme can hardly compete with a (well specified) logistic regression.
Confidence Intervals for prediction in GLMMs
With LM and GLM the predict function can return the standard error for the predicted values on either the observed data or on new data.