GDPR – stop delaying, it’s time to get to work

In the regulatory compliance world, a lot has been written over the past two years introducing organizations to the new European data privacy regulation and why they should be concerned. It got your attention when pundits warned that your organization was still impacted even if it doesn’t have operations in the European Union. You took notice of the hefty fines potentially ranging from 2 to 4% of your company’s annual revenue. And you’re very aware that the deadline for compliance is May 2018. You’ve watched the webinars, read the whitepapers, and attended the education events. You’ve done your homework. But now it’s the beginning of 2018. Enforcement for the General Data Protection Regulation is no longer “May 2018”. Now, it’s just “May”. And of some concern is Gartner’s prediction that by the end of 2018, over 50% of companies affected by the GDPR will not be in full compliance with its requirements (Gartner, Focus on Five High-Priority Changes to Tackle the EU GDPR).


Interpreting three-way interactions in R

A reader asked in a comment to my post on interpreting two-way interactions if I could also explain interaction between two categorical variables and one continuous variable. Rather than just dwelling on this particular case, here is a full blog post with all possible combination of categorical and continuous variables and how to interpret standard model outputs. Do note that three-way interactions can be (relatively) easy to understand conceptually but when it comes down to interpreting the meaning of individual coefficients it will get pretty tricky.


Mitigating known security risks in open source libraries

Finding out if you’re using vulnerable packages is an important step, but it’s not the real goal. The real goal is to fix those issues! This chapter focuses on all you should know about fixing vulnerable packages, including remediation options, tooling, and various nuances. Note that SCA tools traditionally focused on finding or preventing vulnerabilities, and most put little emphasis on fix beyond providing advisory information or logging an issue. Therefore, you may need to implement some of these remediations yourself, at least until more SCA solutions expand to include them. There are several ways to fix vulnerable packages, but upgrading is the best choice. If that is not possible, patching offers a good alternative. The following sections discuss each of these options, and we will later take a look at what you can do in situations where neither of these solutions is possible.


From big data to fast data

Enterprise data needs change constantly but at inconsistent rates, and in recent years change has come at an increasing clip. Tools once considered useful for big data applications are not longer sufficient. When batch operations predominated, Hadoop could handle most of an organization’s needs. Development in other IT areas (think IoT, geolocation, etc.) have changed the way data is collected, stored, distributed, processed and analyzed. Real-time decision needs complicate this scenario and new tools and architectures are needed to handle these challenges efficiently.


Time Series Analysis Using ARIMA Model In R

Time series data are data points collected over a period of time as a sequence of time gap. Time series data analysis means analyzing the available data to find out the pattern or trend in the data to predict some future values which will, in turn, help more effective and optimize business decisions.


Speed up R with Parallel Programming in the Cloud

This past weekend I attended the R User Day at Data Day Texas in Austin. It was a great event, mainly because so many awesome people from the R community came to give some really interesting talks. Lucy D’Agostino McGowan has kindly provided a list of the talks and links to slides, and I thoroughly recommend checking it out: you’re sure to find a talk (or two, or five or ten) that interests you. My own talk was on Speeding up R with Parallel Programming in the Cloud, where I talked about using the doAzureParallel package to launch clusters for use with the foreach function, and using aztk to launch Spark clusters for use with the sparklyr package. I’ve embdeded the slides below: it’s not quite the same without the demos (and sadly there was no video recording), but I’ve included screenshots in the slides to hopefully make it clear.