11 questions to unlock the power of Big Data and analytics – enterprise-wide
1. What’s your definition of a “high-performing organization”? How do you know when you’ve achieved this goal?
2. Which tools does your organization use to understand Big Data and analytics without additional training?
3. What’s your approach to making sure everyone on the team can gain the right insight at the right time and take the most-appropriate action?
4. How do you use Big Data and analytics to augment – or even replace – your responsibilities?
5. Do you acid-test all of your decisions – if so, how? Does it ever change the way you run your business area?
6. How does every team member access the information they need, when they need it? Does it ever backfire?
7. How does your management team use Big Data and analytics to answer our customers’ demand for widespread, real-time customization and personalization?
8. What’s your team’s approach for predicting new developments, capitalizing on future trends, and responding to challenges before they happen?
9. How does your organization know they’re picking the right person or offer to fulfill that a particular requirement?
10. What’s your secret to accurate financial analysis and forecasting, as well as understanding the true cost of opportunities and risks?
11. How does your organization stay up to date with regulatory requirements and track our performance in complying with them?

R 3.1.3 is released (+ easy upgrading for Windows users with the installr package)
R 3.1.3 (codename “Smooth Sidewalk”) was released today. You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of new features and bug fixes is provided below.

Going deeper with dplyr: New features in 0.3 and 0.4
This new tutorial covers the most useful new features in 0.3 and 0.4, as well as some advanced functionality from previous versions that I didn’t cover last time. (If you have not watched the previous tutorial, I recommend you do so first since it covers some dplyr basics that are not covered in this tutorial.)
http://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/

New R Package – ipapi (IP/Domain Geolocation)
I noticed that the @rOpenSci folks had an interface to ip-api.com on their ToDo list so I whipped up a small R package to fill said gap.

How to factorize a 700 GB matrix with Apache Flink
This article is a follow-up post to the earlier published article about Computing recommendations at extreme scale with Apache Flink. We discuss how we implemented the alternating least squares (ALS) algorithm in Apache Flink, starting from a straightforward implementation of the algorithm, and moving to a blocked ALS implementation optimizing performance on the way. Similar observations have been made by others, and the final algorithm we arrive to is also the one implemented in Apache Spark’s MLlib. Furthermore, we describe the improvements contributed to Flink in the wake of implementing ALS.