Munging in R with SQL and MongoDB for Financial Applications
Data Wrangling is a pain management program for financial data analysts. Data analysts spend 90 percent of their work time wrangling (munging) data into a usable format. A corollary of this rule of thumb is that the tasks of gathering, cleaning, massaging, and managing data account for 80 percent of the costs of data warehousing projects. With the volume and complexity of financial data increasing exponentially, nothing is more crippling to efficient data analysis than inefficient data wrangling. Yet practical instruction for data analysts in how to deal with the data management headaches that fill up a majority of their working hours is scattered and disorganized. Nor is it specifically taught in programming courses, where the focus is on the language and not data management. Most students learn the programming language R, but they do not learn the concomitant data management skills that are essential to data analysis: in particular, how to read and write data from and between structured database technologies—notably SQL—and unstructured DBs—notably MongoDB . As a result, graduates are left ill-prepared for jobs in the real world, where financial companies seek analysts whose skill sets encompass data management in tandem with data analysis. Data Wrangling teaches practitioners and students of financial data analysis the SQL and MongoDB database management skills they need to succeed in their analytic work. The authors, who have deep experience in the financial industry as well as in teaching quantitative finance, take most of the operational and programming examples that enrich their book from the financial arena, including both market data and text-based data. The concepts presented through these examples are nonetheless applicable to a wide range of fields, so data analysts from all industries will profit from this book.