Qualitative Text Analysis in R with RQDA
Last Friday at the Davis R Users’ Group, Mallory Johnson gave a presentation on RQDA, an R-based GUI tool for doing coding on documents for use in qualitative text analysis. Here’s the video, and you can view the slides here.
RevoScaleR’s Naive Bayes Classifier rxNaiveBayes()
Because of its simplicity and good performance over a wide spectrum of classification problems the Naïve Bayes classifier ought to be on everyone’s short list of machine learning algorithms. Now, with version 7.4 we have a high performance Naïve Bayes classifier in Revolution R Enterprise too. Like all Parallel External Memory Algorithms (PEMAs) in the RevoScaleR package, rxNaiveBayes is an inherently parallel algorithm that may be distributed across Microsoft HPC, Linux and Hadoop clusters and may be run on data in Teradata databases. The following example shows how to get started with rxNaiveBayes() on a moderately sized data in your local environment. It uses the Mortgage data set which may be downloaded for the Revolution Analytics data set repository. The first block of code imports the .csv files for the years 2000 through 2008 and concatenates them into a single training file in the .XDF binary format. Then, the data for the year 2009 is imported to a test file that will be used for making predictions
An R Enthusiast Goes Pythonic!
What follows below takes advantage of a neat dataset from the UCI Machine Learning Repository. The data contain Math test performance of 649 students in 2 Portuguese schools. What’s neat about this data set is that in addition to grades on the students’ 3 Math tests, they managed to collect a whole whack of demographic variables (and some behavioural) as well. That lead me to the question of how well can you predict final math test performance based on demographics and behaviour alone. In other words, who is likely to do well, and who is likely to tank?
Live Earthquake Map with Shiny and Google Map API
This is an example of a practical approach for which that same system can be used to create a useful tool to visualize seismic events collected from USGS in the Google Maps API using R to do some basic data preparation. The procedure to complete this experiment is pretty much identical to what I presented in the post mentioned, so I will not bother you will additional details.
Static and moving circles
After the previous post on the packcircles package for R someone suggested it would be useful to be able to fix the position of selected circles. As a first attempt, I’ve added an optional weights argument to the circleLayout function. Weights can be in the range 0-1 inclusive, where a weight of 0 prevents a circle from moving, while a weight of 1 allows full movement.
4 Techniques to Get Data Frame Column as Vector
Following are four different technique/method using which one could retrieve a data frame column as a vector.
Amazon Machine Learning: use cases and a real example in Python
Here I would like to share my personal experience with this amazing technology, introduce some of the most important – and sometimes misleading – concepts of machine learning, and give this new AWS service a try with an open dataset in order to train and use a real-world AWS Machine Learning model.
Applied Statistics Is A Way Of Thinking, Not Just A Toolbox
The choice of tools in applied statistics is driven by the objective, the structure of the data, and the nature of the uncertainty in the numbers, whereas in academic statistics its driven by publishing or teaching. Here we provide some of common statistical tools and the overlapping genealogy.