The unreasonable effectiveness of Character-level Language Models
RNNs, LSTMs and Deep Learning are all the rage, and a recent blog post by Andrej Karpathy is doing a great job explaining what these models are and how to train them. It also provides some very impressive results of what they are capable of. This is a great post, and if you are interested in natural language, machine learning or neural networks you should definitely read it.
My favorite personal R bug
In this note am going to recount ‘my favorite R bug.’ It isn’t a bug in R. It is a bug in some code I wrote in R. I call it my favorite bug, as it is easy to commit and (thanks to R’s overly helpful nature) takes longer than it should to find.
The Structure of Apache Lucene
The inestimably noble Apache Software Foundation produces many of the blockbuster products (Ant, CouchDB, Hadoop, JMeter, Maven, OpenOffice, Subversion, etc.) that help build our digital universe. One perhaps less well-known gem is Lucene, which, ‘ … provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.’ Despite its shying from headlines, Lucene forms a quiet but integral component of many Apache (and third-party) projects. Let us take a look at the structure the underlies this wonderful and highly successful product.