Python: How to Write a Spelling Corrector
Two friends (Dean and Bill) independently told me they were amazed at how Google does spelling correction so well and quickly. Type in a search like and Google comes back in 0.1 seconds or so with Did you mean: spelling. (Yahoo and Microsoft are similar.) What surprised me is that I thought Dean and Bill, being highly accomplished engineers and mathematicians, would have good intuitions about statistical language processing problems such as spelling correction. But they didn’t, and come to think of it, there’s no reason they should: it was my expectations that were faulty, not their knowledge. I figured they and many others could benefit from an explanation. The full details of an industrial-strength spell corrector are quite complex (you can read a little about it here or here). What I wanted to do here is to develop, in less than a page of code, a toy spelling corrector that achieves 80 or 90% accuracy at a processing speed of at least 10 words per second.

Comprehensive Guide to Data Visualization in R
With ever increasing volume of data in today’s world, it is impossible to tell stories without these visualizations. While there are dedicated tools like Tableau, QlikView and d3.js, nothing can replace a modeling / statistics tools with good visualization capability. It helps tremendously in doing any exploratory data analysis as well as feature engineering. This is where R offers incredible help. R Programming offers a satisfactory set of inbuilt function and libraries (such as ggplot2, leaflet, lattice) to build visualizations and present data. In this article, I have covered the steps to create the common as well as advanced visualizations in R Programming. But, before we come to them, let us quickly look at brief history of data visualization. If you are not interested in history, you can safely skip to the next section.

PageRank meets vectorial representations – “Ranking on Data Manifolds”
I came across this paper when following up on ideas I had when reading about TextRank for summarising documents. It is short, well written and very interesting, and was authored by Zhou, Weston, Gretton, Bousquet and Schölkopf (all then at the Max Planck Institute for Biological Cybernetics, Tübingen) in 2004. (PDF).

Introduction to Network Analysis and Representation
Networks and network analysis has grown more prominent in both humanities scholarship and public discourse. In this context, networks–also known as graphs or node-link diagrams–are ‘a set of vertices (also called points or nodes) which represent the entities of research interest, and a set of lines (or ties) between these vertices which represent their relationships.’ This interactive application is designed to provide an overview of various network analysis principles used for analysis and representation. It also provides a few examples of untraditional networks used in digital humanities scholarship. Finally, along with the various methods described interactively here are links to related scholarship. Each network type is listed in the Models section, and can be paired with an analysis or representation method by simply clicking on a network type to load a new network, and then clicking on an analysis or visualization method. Networks are represented using traditional force-directed techniques or by plotting along the xy axis based on numerical attributes of the nodes (longitude and latitude in the case of nodes that represent geographic entities). For the force directed layout, you can adjust the various force principles to see how this affects the representation of the network you’re working with. This implementation will likely remain a work-in-progress for some time, and if you notice any flaws or discrepencies, or have a suggestion, please contact Elijah Meeks.

Hotlist of Training resources for Predictive Analytics
A hotlist of training resources for Predictive Analytics, Machine Learning, Data Science, R, Python, SAS, and Excel, including MOOCs, blogs, meetups, courses, videos, and more.