(RefCard) Apache Mahout is a library for scalable machine learning. Originally a subproject of Apache Lucene (a high-performance text search engine library), Mahout has progressed to be a top-level Apache project. While Mahout has only been around for a few years, it has established itself as a frontrunner in the field of machine learning technologies. This Refcard will present the basics of Mahout by studying two possible applications:
• Training and testing a Random Forest for handwriting recognition using Amazon Web Services EMR.
• Running a recommendation engine on a standalone Spark cluster.
Distributed Machine Learning with Apache Mahout