RStudio v0.99 Preview: Graphviz and DiagrammeR
Soon after the announcement of htmlwidgets, Rich Iannone released the DiagrammeR package, which makes it easy to generate graph and flowchart diagrams using text in a Markdown-like syntax. The package is very flexible and powerful, and includes: 1.Rendering of Graphviz graph visualizations (via viz.js) 2.Creating diagrams and flowcharts using mermaid.js 3.Facilities for mapping R objects into graphs, diagrams, and flowcharts. We’re very excited about the prospect of creating sophisticated diagrams using an easy to author plain-text syntax, and built some special authoring support for DiagrammeR into RStudio v0.99 (which you can download a preview release of now).

Hypothesis testing is only mostly useless

Text Analysis 101: Explicit Semantic Analysis (ESA) Explained
Explicit Semantic Analysis (ESA) works at the level of meaning rather than on the surface form vocabulary of a word or document. ESA represents the meaning of a piece text, as a combination of the concepts found in the text and is used in document classification, semantic relatedness calculation (i.e. how similar in meaning two words or pieces of text are to each other), and information retrieval.

Good, mediocre, and bad p-values

Revolution R Open 8.0.3 now available
Revolution R Open 8.0.3 is now available for download RRO logofor Windows, OS X, Red Hat, Ubuntu and OpenSUSE. This release includes seveal new features: it upgrades RRO to the R 3.1.3 engine, which adds several new features to the R language, adds support for Ubuntu 15.04, and updates the checkpoint package for reproducibility.

Survival Analysis With Generalized Additive Models : Part I (background and rationale)
After a really long break, I’d will resume my blogging activity. It is actually a full circle for me, since one of the first posts that kick started this blog, matured enough to be published in a peer-reviewed journal last week. In the next few posts I will use the R code included to demonstrate the survival fitting capabilities of Generalized Additive Models (GAMs) in real world datasets. The first post in this series will summarize the background, rationale and expected benefits to be realized by adopting GAMs from survival analysis.

rstanmulticore: A cross-platform R package to automatically run RStan MCMC chains in parallel
Running chains in parallel is possible, but only with platform dependent boilerplate code. For example, the RStan Quick Start Guide gives an mclapply example for Mac and Linux users and a parLapply example for Windows users. The boilerplate nature of the code makes it cumbersome to fit models several times, and the platform dependent nature of the examples makes it difficult to share code between platforms. To address this issue, I have implemented the boilerplate code from the Quick Start Guide in a cross-platform R package: rstanmulticore. .

Practical Text Analysis using Deep Learning
Deep Learning has become a household buzzword these days, and I have not stopped hearing about it. In the beginning, I thought it was another rebranding of Neural Network algorithms or a fad that will fade away in a year. But then I read Piotr Teterwak’s blog post on how Deep Learning can be easily utilized for various image analysis tasks. A powerful algorithm that is easy to use? Sounds intriguing. So I decided to give it a closer look. Maybe it will be a new hammer in my toolbox that can later assist me to tackle new sets of interesting problems.

WebDataCommons – the Data and Framework for Web-scale Mining
The WebDataCommons project extracts the largest publicly available hyperlink graph, large product-, address-, recipe-, and review corpora, as well as millions of HTML tables from the Common Crawl web corpus and provides the extracted data for public download.

How Data Science makes Better Products
This video defines adaptive software, shows how data science realizes these applications, and discusses how these new tools are addressing real world challenges across all industries.

Reproducibility and KNIME
Obviously advanced analytics starts with an intuitive, yet powerful interface that allows data scientists to quickly explore different ways to blend and analyze their data. Even better, if those analysis workflows can easily be handed to others, as templates for their own analysis needs. However, when the analysis is being deployed or the results are used for business critical purposes it becomes essential that we can repeat analytical process and guarantee that the results stay the same. In order to truly productionize advanced analytics, reproducibility is a key requirement.