Just a few years ago it was common to develop a predictive analytic model using a single proprietary tool against a sample of structured data. This would then be applied in batch, storing scores for future use in a database or data warehouse. Recently this model has been disrupted. There is a move to real-time scoring, calculating the value of predictive analytic models when they are needed rather than looking for them in a database. At the same time the variety of model execution platforms has expanded with in-database execution, columnar and inmemory databases as well as MapReduce-based execution becoming increasingly common. Modeling too has changed: the open source analytic modeling language R has become extremely popular, with up to 70% of analytic professionals using it at least occasionally. The range of data types being used in models has expanded along with the approaches used for storage. Modelers increasingly want to analyze all their data, not just a sample, to build a model. This increasingly complex and multi-vendor environment has increased the value of standards, both published standards and open source standards. In this paper we will explore the growing role of standards for predictive analytics in expanding the analytic ecosystem, handling Big Data and supporting the move to real-time scoring. Standards in Predictive Analytics: The role of R, Hadoop and PMML in the mainstreaming of predictive analytics.
Document worth reading: “Standards in Predictive Analytics: The role of R, Hadoop and PMML in the mainstreaming of predictive analytics.”
18 Wednesday Nov 2015