Rblpapi: Connecting R to Bloomberg

Whit, John and I are thrilled to announce Rblapi, a new CRAN package which connects R to the Bloomberg backends. Rebuilt from scratch using only the Bloomberg C++ API and the Rcpp and BH packages, it offers efficient and direct access from R to a truly vast number of financial data series, pricing tools and more. The package has been tested on Windows, OS X and Linux. As is standard for CRAN packages, binaries for Windows and OS X are provided (or will be once the builders caught up). Needless to say, a working Bloomberg installation is required to use the package.

R 3.2.2 released

This just in from the R Core Team: R 3.2.2 has been released. With this update, data accessed over the Web — including files downloaded using download.file via URLS, and packages installed install.packages — will be transmitted using the secure HTTPS protocol. (This has always been an option with prior versions of R, but now it is the default configuration.) Also, when R presents a list of CRAN mirrors to choose from, HTTPS-enabled mirrors will be given precedence. This release also fixes a few bugs and improves accuracy in the extreme tails of the t and hypergeometric distributions. Source distributions of R 3.2.2 are available for download now from CRAN, and binaries for Windows, Mac and Linux will propagate through the CRAN mirror network in the next couple of days. Here at Revolution R Enterprise we’ve begun work on Revolution R Open 3.2.2, which will be available in September.

7 Types of Regression Techniques you should know!

1. Linear Regression
2. Logistic Regression
3. Polynomial Regression
4. Stepwise Regression
5. Ridge Regression
6. Lasso Regression
7. ElasticNet Regression

Using Google Analytics with R

For the most part, SMB’s tend to utilize free analytics solutions like Google Analytics for their web and digital strategy. A powerful platform in its own right, it can be combined with the R to create custom visualizations, deep dives into data, and statistical inferences. This article will focus on the usage of R and the Google Analytics API. We will go over connecting to the API, querying data and making a quick time series graph of a metric.

Out-of-Core Dataframes in Python: Dask and OpenStreetMap

In recent months, a host of new tools and packages have been announced for working with data at scale in Python. For an excellent and entertaining summary of these, I’d suggest watching Rob Story’s Python Data Bikeshed talk from the 2015 PyData Seattle conference. Many of these new scalable data tools are relatively heavy-weight, involving brand new data structures or interfaces to other computing environments, but Dask stands out for its simplicity. Dask is a light-weight framework for working with chunked arrays or dataframes across a variety of computational backends. Under the hood, Dask simply uses standard Python, NumPy, and Pandas commands on each chunk, and transparently executes operations and aggregates results so that you can work with datasets that are larger than your machine’s memory.

Diabetic Retinopathy Winners’ Interview: 4th place, Julian & Daniel

The Diabetic Retinopathy (DR) competition asked participants to identify different stages of the eye disease in color fundus photographs of the retina. The competition ran from February through July 2015 and the results were outstanding. By automating the early detection of DR, many more individuals will have access to diagnostic tools and treatment. Early detection of DR is key to slowing the disease’s progression to blindness. Fourth place finishers, Julian De Wit and Daniel Hammack, share their approach here (including a simple recipe for using ConvNets on a noisy dataset).

Recycling Deep Learning Models with Transfer Learning

Deep learning exploits gigantic datasets to produce powerful models. But what can we do when our datasets are comparatively small? Transfer learning by fine-tuning deep nets offers a way to leverage existing datasets to perform well on new tasks.

Data Science, Analytics, & Data Mining Online Degrees and Certificates

We present a comprehensive list of Online masters and graduate degree certificates in Data science, data mining, analytics & machine learning along with their curriculum & program costs.

Survival Analysis – 2

In my previous post, I went over basics of survival analysis, that included estimating Kaplan-Meier estimate for a given time-to-event data. In this post, I’m exploring on Cox’s proportional hazards model for survival data. KM estimator helps in figuring out whether survival function estimates for different groups are same or different. While survival models like Cox’s proportional hazards model help in finding relationship between different covariates to the survival function.

R News From JSM 2015

We can declare 2015 the year that R went mainstream at the JSM. There is no doubt about it, the calculations, visualizations and deep thinking of a great many of the world’s statisticians are rendered or expressed in R and the JSM is with the program. In 2013 I was happy to have stumbled into a talk where an FDA statistician confirmed that R was indeed a much used and trusted tool. Last year, while preparing to attend the conference, I was delighted to find a substantial list of R and data science related talks. This year, talks not only mentioned R: they were about R.

Showing a distribution over time: how many summary stats?

I saw this nice graph today on Twitter, by Thomas Forth, but the more I looked at it, the more I felt it was hard to understand the changes over time across the income distribution from the Gini coefficient and the median. People started asking online for other percentiles, so I thought I would smooth each of them from the source data and plot them side by side: