RStudio v1.1 Released

We’re excited to announce the general availability of RStudio 1.1. Highlights include:
• A connections tab which makes it easy to connect to, explore, and view data in a variety of databases.
• A terminal tab which provides fluid shell integration with the IDE, xterm emulation, and even support for full-screen terminal applications.
• An object explorer which can navigate deeply nested R data structures and objects.
• A new, modern dark theme and Retina-quality icons throughout.
• Dozens of other small improvements and bugfixes.

Changes to Internet Connectivity in R on Windows

This week we released version 3.0 of the curl R package to CRAN. You may have never used this package directly, but curl provides the foundation for most HTTP infrastructure in R, including httr, rvest, and all packages that build on it. If R packages need to go online, chances are traffic is going via curl.

Announcing dplyrXdf 1.0

I’m delighted to announce the release of version 1.0.0 of the dplyrXdf package. dplyrXdf began as a simple (relatively speaking) backend to dplyr for Microsoft Machine Learning Server/Microsoft R Server’s Xdf file format, but has now become a broader suite of tools to ease working with Xdf files.

Climate Change and Population Modeling in R

A recent paper in nature climate change: Less than 2°C warming by 2100 unlikely (Raftery et al. 2017), concludes that the goal of the Paris Agreement is unlikely to be met. Although the conclusion is disheartening, the paper advances the science of climate modeling by developing a joint Bayesian hierarchical model for Gross Domestic Product per capita and carbon intensity. This ensemble of models, in turn, depends on the availability of probabilistic population projections developed by the BayesPop Project at the University of Washington and available on CRAN.

Serving PyTorch Models on AWS Lambda with Caffe2 & ONNX

Having worked with PyTorch, I love the flexibility and ease of development of the framework versus other platforms. As PyTorch is still early in its development, I was unable to find good resources on serving trained PyTorch models, so I’ve written up a method here that utilizes ONNX, Caffe2 and AWS Lambda to serve predictions from a trained PyTorch model. I hope that you find it to be useful.

The State of Data Innovation in the EU – 2

Data innovation – the innovative use of data to create social and economic benefits – is making a significant mark in Europe. In economic terms, data innovation contributed about €300 billion to Europe’s economy in 2016 (or approximately 2 percent of GDP), and its value will likely more than double by 2020. Across society, data innovation is creating more responsive governments, better health care, and safer cities. But EU nations differ in the degree to which they are harnessing the benefits of data. This report uses a variety of indicators to rank EU member states and discusses why some countries are ahead and what others can do to catch up.

Natural Language Understanding – Application Notes with Context Discriminant

Natural Language Understanding is an important logical sequence to word search. It offers a more meaningful way for us to use AI in NLP to search for relevant context and subjects. At SiteFocus, we have successfully applied this technique in the implementation of automated context discovery and subject discovery on large textual data repositories with the CIF platform.

Feature Selection for Fantasy Premier League

My Crush for data analysis started from the days I used to calculate some numbers prior to setting up my team before every game week and watch my team suck like a pro. Those days I never knew to write even a small piece of code and all the math I used to do, can be done by a third grader. Many years have passed since then, I think my math and reasoning has improved but my team still suck. Writing this blog, as I am getting ready for another great season of fantasy football this time with little more stat, math and faith.

How I started with learning AI in the last 2 months

Everyone is very busy these days. There is just so much going on with our personal and professional lives. On top of it, lo and behold, something like artificial intelligence starts to gather steam and you learn that your skillset is getting terribly outdated over next two years.

Deep matrix factorization using Apache MXNet

Recommendation engines are widely used models that attempt to identify items that a person will like based on that person’s past behavior. We’re all familiar with Amazon’s recommendations based on your past purchasing history, and Netflix recommending shows to you based on your history and the ratings you’ve given other shows. Naturally, deep learning is behind many of these systems. In this tutorial, we will delve into how to use deep learning to build these recommender systems, and specifically how to implement a technique called matrix factorization using Apache MXNet. It presumes basic familiarity with MXNet.

Converging big data, AI, and business intelligence

Cognitive computing, which seeks to simulate human thought and reasoning in real time, could be considered the ultimate goal of business intelligence. With cognitive applications in health care, retail, financial services, manufacturing, and transportation, artificial intelligence is already transforming a variety of industries. Many of today’s applications in AI would not be practical—or even possible—were it not for the unprecedented price and performance afforded by the massively parallel processing power of the GPU. While steady advances in CPU, memory, storage, and networking have served as a foundation for high-performance data analytics, the increasing volume of data means that even CPUs containing as many as 32 cores are unable to deliver adequate performance for compute-intensive analytics. And scaling performance by creating large clusters of servers can make such sophisticated analytics unaffordable for many organizations.

Enterprises are the natural environment for AI deployments

One of the paradoxes of artificial intelligence is that the companies poised to make the most of it are those with the most data: enterprises; yet, they have the most institutional and organizational difficulties to overcome to do so. In this episode of the O’Reilly podcast, I had a chance to discuss the challenges with someone whose job it is to tackle these issues—Ron Bodkin, VP and general manager of artificial intelligence at Teradata.