RCall: Running an embedded R in Julia
I have used R (and S before it) for a couple of decades. In the last few years most of my coding has been in Julia, a language for technical computing that can provide remarkable performance for a dynamically typed language via Just-In-Time (JIT) compilation of functions and via multiple dispatch. Nonetheless there are facilities in R that I would like to have access to from Julia. I created the RCall package for Julia to do exactly that. This IJulia notebook provides an introduction to RCall.

Visualizing Clusters
Consider the following dataset, with (only) ten points …

Tuatara GS1 Machine Learning Algorithm JSON API v0.1 Context-controllable Content Summarization
The Tuatara GS1 algorithm relies on the more advanced Tuatara GS2 algorithm which generates relationships between objects based on principles in congnition related to Computational Theory of the Mind (CTM) (Pinker, S. 1997) and auto-association (Xijin Ge , Shuichi Iwata, 2002) and reinforced learning (Wenhuan, X., Nandi, A. K., Zhang, J., Evans, K. G. 2005) with exponential decays that follow the Golden Ratio F (Dunlap, Richard A. 1997).

RStudio v0.99 Preview: Data Viewer Improvements
RStudio’s data viewer provides a quick way to look at the contents of data frames and other column-based data in your R environment. You invoke it by clicking on the grid icon in the Environment pane, or at the console by typing View(mydata).

Monitoring Progress of a Foreach Parallel Job
R has strong support for parallel programming, both in base R and additional CRAN packages. For example, we have previously written about foreach and parallel programming in the articles Tutorial: Parallel programming with foreach and Intro to Parallel Random Number Generation with RevoScaleR. The foreach package provides simple looping constructs in R, similar to lapply() and friends, and makes it easy execute each element in the loops in parallel. You can find the packages at foreach: Foreach looping construct for R and doParallel.

Free eBook! Hadoop for Dummies
Download your FREE copy of “Hadoop For Dummies” today, compliments of IBM Platform Computing! This new learning resource can help enterprise thought leaders better understand the rising importance of big data, especially the Hadoop distributed computing platform. Plus, Hadoop for Dummies can help you kick-start your company’s big data initiative.

Can Advanced Analytics Help a Telecom Business Reduce Customer Churn?
If you’re a telecommunications (Telecom) service provider, customer churn isn’t new to you. It’s a plague that you’ve been dealing with for decades and it can never be completely brought to zero either. Customer churn costs you money, so the questions become, ”How much churn are you willing to live with and what strategies can you deploy to keep churn manageable?” – See more at: http://blogs.sap.com/analytics/2015/02/25/can-advanced-analytics-help-a-telecom-business-reduce-customer-churn/#sthash.fGieHybK.dpuf

Visualization of data science patterns
We’ve made a new interactive visualization of data science patterns.

Optimal Binning for Scoring Modeling (R Package)
The R Package smbinning categorizes a numeric variable into bins (intervals) for its ulterior usage in scoring modeling. The theory behind it falls within a branch of Machine Learning called Supervised Discretization, a categorization technique that divides a continuous variable into a small number of intervals mapped to a discrete target variable. For example, time since an account was open (Integer in Months) and the credit performance (Good/Bad), as shown in Table 1.

Beagli: Finding value in your personal data
Personal analytics products can help users extract value from their data. This post describes our development of Beagli, a platform for mining and auctioning personal data.

Topic Modeling for the Uninitiated
As more and more data are stored digitally and people have better and improved tools for publishing, we are witnessing more and more text data has been collected and published in various mediums.(e-books, blogs, newspaper websites, magazines and mobile applications) So-called big data era not only enable people to collect more and more data through different forms but also provide a set of tools to analyze, infer various structures in the data and interpret that knowledge in various forms.

The Believers
Magic has entered our world. In the pockets of many Americans today are thin black slabs that, somehow, understand and anticipate our desires. Linked to the digital cloud and satellites beyond, churning through personal data, these machines listen and assist, decoding our language, viewing and labeling reality with their cameras. This summer, as I walked to an appointment at the University of Toronto, stepping out of my downtown hotel into brisk hints of fall, my phone already had directions at hand. I asked where to find coffee on the way. It told me. What did the machine know? How did it learn? A gap broader than any we’ve known has opened between our use of technology and our understanding of it. How did the machine work? As I would discover, no one could say for certain. But as I walked with my coffee, I was on the way to meet the man most qualified to bridge the gap between what the machine knows and what you know.

Spark, Whistling Past the Data Platform Graveyard
We are in the Golden Age of Data. For those of us on the front-lines, it doesn’t feel that way. Every step forward this technology takes, the need for deeper analytics takes two. We’re constantly catching up. Necessity is the mother of invention, and mum’s been very busy. In the past decade we have seen ground-breaking after ground-breaking technological advancements in data processing technology and in 2015 we have arrived in the age in the Spark. Let’s take a look at the needs (and innovative responses) that brought us to where we are today.