MapReduce as a programming paradigm provides a simple-to-use yet very powerful abstraction encapsulated in two second-order functions: Map and Reduce. As such, they allow defining single sequentially processed tasks while at the same time hiding many of the framework details about how those tasks are parallelized and scaled out. In this paper we discuss four processing patterns in the context of the distributed SAP HANA database that go beyond the classic MapReduce paradigm. We illustrate them using some typical Machine Learning algorithms and present experimental results that demonstrate how the data flows scale out with the number of parallel tasks. Advanced Analytics with the SAP HANA Database
Document worth reading: “Advanced Analytics with the SAP HANA Database”
09 Monday Nov 2015
Posted Documents
in